CN117520492A - Intelligent question-answering optimization method, device and application based on document - Google Patents

Intelligent question-answering optimization method, device and application based on document Download PDF

Info

Publication number
CN117520492A
CN117520492A CN202311406253.4A CN202311406253A CN117520492A CN 117520492 A CN117520492 A CN 117520492A CN 202311406253 A CN202311406253 A CN 202311406253A CN 117520492 A CN117520492 A CN 117520492A
Authority
CN
China
Prior art keywords
question
recall
sub
answer
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311406253.4A
Other languages
Chinese (zh)
Inventor
郁强
葛俊
彭大蒙
陈思瑶
曹喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCI China Co Ltd
Original Assignee
CCI China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCI China Co Ltd filed Critical CCI China Co Ltd
Priority to CN202311406253.4A priority Critical patent/CN117520492A/en
Publication of CN117520492A publication Critical patent/CN117520492A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The application provides an intelligent question-answering optimization method, device and application based on a document, which comprises the following steps: constructing a document database and a question-answer model, acquiring a question in the same field as the document database, and decomposing the question into at least one sub-question; inputting each sub-question into the question-answer model, generating a sub-answer corresponding to each sub-question according to each sub-question by using the question-answer model, decomposing each sub-answer into a first recall result and a recall result, and carrying out recall in a document database based on the question-answer model by using the recall result to obtain a second recall result; and carrying out vector fusion on the first recall result, the second recall result and the question to obtain a fusion result, and obtaining a question answer based on the fusion result by a question-answering model. The scheme improves the accuracy of generating the answer by taking the recall result of part of the sub-questions as supplement.

Description

Intelligent question-answering optimization method, device and application based on document
Technical Field
The application relates to the field of natural language processing, in particular to an intelligent question-answering optimization method, device and application based on documents.
Background
Along with the development of artificial intelligence, a field-based document question-answering system is more and more focused by people, and the traditional document question-answering system mainly is based on keyword and rule matching, so that problems of low accuracy, narrow coverage and the like exist.
The prior art generally improves the accuracy of similar text retrieval by improving the upper limit of the number of recall results, but the effective information density input to a large model is greatly reduced, a lot of unimportant but similar document regulation noise is introduced, and excessive regulation information input is truncated due to the length limitation existing in the input of the model, so that possibly useful knowledge is lost, therefore, a more accurate and efficient method is needed for improving the accuracy and coverage of a field-based document question-answering system, and the scalability and maintainability of the system are considered, so that the field-based document question-answering system is suitable for the requirements of different fields and different scenes.
Disclosure of Invention
The embodiment of the application provides an intelligent question-answering optimization method, device and application based on a document, which are used for decomposing a question into a plurality of sub-questions, and improving the accuracy of generating answers by taking recall results of part of the sub-questions as supplements so as to finish the optimization of intelligent questions-answering.
In a first aspect, an embodiment of the present application provides a document-based intelligent question-answering optimization method, where the method includes:
constructing a document database and a question-answering model, obtaining questions belonging to the same field as documents in the document database, segmenting each question to obtain parts of speech, and decomposing each question into at least one sub-question according to the parts of speech;
inputting each sub-question into the question-answer model, wherein the question-answer model is trained to search and recall in the document database according to each sub-question to generate a sub-answer corresponding to each sub-question, decomposing each sub-answer into a first recall result and a recall result according to a set recall threshold, and carrying out recall in the document database based on the question-answer model by using the recall result to obtain a second recall result, wherein a part, with the generation probability of a word segment in the sub-answer being greater than or equal to the recall threshold, is used as the first recall result, and a part, with the generation probability of the word segment in the sub-answer being smaller than the recall threshold, is used as the recall result;
and carrying out vector fusion on the first recall result, the second recall result and the question to obtain a fusion result, and obtaining a question answer based on the fusion result by the question-answering model.
In a second aspect, an embodiment of the present application provides a document-based intelligent question-answering optimization device, including:
the construction module comprises: constructing a document database and a question-answering model, obtaining questions belonging to the same field as documents in the document database, segmenting each question to obtain parts of speech, and decomposing each question into at least one sub-question according to the parts of speech;
and a recall module: inputting each sub-question into the question-answer model, wherein the question-answer model is trained to search and recall in the document database according to each sub-question to generate a sub-answer corresponding to each sub-question, decomposing each sub-answer into a first recall result and a recall result according to a set recall threshold, and carrying out recall in the document database based on the question-answer model by using the recall result to obtain a second recall result, wherein a part, with the generation probability of a word segment in the sub-answer being greater than or equal to the recall threshold, is used as the first recall result, and a part, with the generation probability of the word segment in the sub-answer being smaller than the recall threshold, is used as the recall result;
and a response module: and carrying out vector fusion on the first recall result, the second recall result and the question to obtain a fusion result, and obtaining a question answer based on the fusion result by the question-answering model.
In a third aspect, embodiments of the present application provide an electronic device comprising a memory having a computer program stored therein and a processor configured to run the computer program to perform a document-based intelligent question-answering optimization method.
In a fourth aspect, embodiments of the present application provide a readable storage medium having a computer program stored therein, the computer program comprising program code for controlling a process to execute a process, the process comprising according to a document-based intelligent question-answer optimization method.
The main contributions and innovation points of the invention are as follows:
according to the method and the device, the noun phrases are constructed through the syntactic dependency relationship and the vocabulary attribute, the hierarchical structure of the syntactic dependency tree is utilized to construct information association among the parallel noun phrases, the problems of the user are decomposed in a grammar tree construction mode and then embedded into the question-answering model, so that the question-answering model is assisted to the greatest extent to understand the problems proposed by the user, the retrieved answers are more complete, and the retrieved knowledge loss is avoided; according to the scheme, each sub answer is decomposed by setting the recall threshold value, and a part of the decomposed recall results are recalled to supplement the original questions, so that answer errors caused by lack of partial knowledge of the question-answer model are made up.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a flow chart of a document-based intelligent question-answering optimization method according to an embodiment of the present application;
FIG. 2 is a block diagram of a document-based intelligent question-answering optimization device according to an embodiment of the present application;
fig. 3 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.
It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.
Example 1
The embodiment of the application provides an intelligent question-answering optimization method, device and application based on a document, and specifically, referring to fig. 1, the method comprises the following steps:
constructing a document database and a question-answering model, obtaining questions belonging to the same field as documents in the document database, segmenting each question to obtain parts of speech, and decomposing each question into at least one sub-question according to the parts of speech;
inputting each sub-question into the question-answer model, wherein the question-answer model is trained to search and recall in the document database according to each sub-question to generate a sub-answer corresponding to each sub-question, decomposing each sub-answer into a first recall result and a recall result according to a set recall threshold, and carrying out recall in the document database based on the question-answer model by using the recall result to obtain a second recall result, wherein a part, with the generation probability of a word segment in the sub-answer being greater than or equal to the recall threshold, is used as the first recall result, and a part, with the generation probability of the word segment in the sub-answer being smaller than the recall threshold, is used as the recall result;
and carrying out vector fusion on the first recall result, the second recall result and the question to obtain a fusion result, and obtaining a question answer based on the fusion result by the question-answering model.
In the scheme, in the step of word segmentation to the problem to obtain part of speech and disassembly of the problem into at least one sub-problem according to the part of speech, word segmentation is carried out to the problem to obtain part of speech and obtain syntactic structure units corresponding to a subject, a predicate and an object according to the syntactic dependency of the problem, wherein the syntactic dependency comprises a main-predicate relationship, a mid-state relationship and a guest-host relationship, the problem is divided into a plurality of layers according to different syntactic dependency of the problem, each layer corresponds to one syntactic dependency, a new syntactic structure unit is obtained by combining different vocabularies in the current syntactic structure unit according to the lexical relationship in each syntactic structure unit in each layer, and the new syntactic structure unit is combined according to the syntactic dependency as the sub-problem.
Regarding syntactic dependencies:
main and predicate relationship: the relation between a subject and a predicate in a sentence, wherein the subject is a main body of a statement, the predicate is a verb for stating the predicate, and in the structural relation, the predicate is directly placed behind the subject without an intermediary between the subject and the predicate; the main language and the predicates can be identified and acquired by identifying the main-predicate relation;
relationship in the shape: modifying predicate verbs through the idioms to express states or conditions of action occurrence; the relationship in the recognition state can be used for recognizing and acquiring the state and the predicate;
dynamic guest relation: the predicate and the object have a pairing relation, and the animal is a component with the object; the predicate and object may be identified by identifying the guest-to-predicate relationship.
In the step of word segmentation of the problem to obtain part of speech and obtaining a syntactic structure unit corresponding to a subject, a predicate and an object according to the syntactic dependency of the problem, the word segmentation of the problem is carried out to obtain the part of speech of each vocabulary in the problem, and the main-predicate relationship, the in-shape relationship and the moving object relationship are identified by combining the part of speech of the vocabulary and the syntactic dependency of the problem; the nouns or phrases in the main-predicate relationship are identified as subjects, the verbs in the identified relationships are predicates, and the nouns or phrases combined with the predicates in the moving-guest relationship are identified as objects.
In some embodiments, parts of speech include one or more of nouns, verbs, adjectives, adverbs, numbers, pronouns, articles, prepositions, conjunctions, and interjections.
It should be noted that each question is obtained by combining syntactic dependencies corresponding to a subject, a predicate and an object, the subject-to-predicate relationship corresponds to a noun phrase and a predicate, the in-state relationship corresponds to an adjective and a predicate, and the guest-to-predicate relationship corresponds to a predicate and a noun phrase. In the problem of how to evaluate whether the Hangzhou and the Anyang have conditions for determining ancient times, the "king chairman and the Lemonprofessor evaluate" is a dominant relationship, the "king chairman and Lemonprofessor" are recognized as subjects, the "how to evaluate" is a mid-state relationship, the "evaluation" is recognized as predicates, the "whether the Hangzhou and the Anyang have conditions for determining ancient times" is recognized as a dynamic guest relationship in the syntactic dependency, and the "Hangzhou and the Anyang have conditions for determining ancient times" is recognized as objects.
In the step of dividing the problem into a plurality of layers according to different syntactic dependencies in the problem, identifying whether each syntactic structural unit has syntactic dependencies, if so, obtaining clause structural units corresponding to a subject, a predicate and an object according to the parts of speech of different words in the syntactic structural unit and the syntactic dependencies of the syntactic structural unit, wherein at least one clause structural unit forms the next layer of the current syntactic structural unit. By way of example, if the conditions for identifying the ancient Chinese and the Anyang as the objects include the syntactic dependency relationship, the dynamic guest relationship is further split to obtain the subjects of the Hangzho and the Anyang, and the conditions for identifying the ancient Chinese and the Anyang.
The lexical relations include, but are not limited to, parallel relations, centering relations, and main-term relations. It should be noted that, each syntax structure unit includes at least one vocabulary, and the vocabulary form the current syntax structure unit through the lexical relationship, so the scheme is further split based on the vocabulary relationship after obtaining the syntax structure unit.
With respect to lexical relations:
parallel relation: the words connected before and after the parallel words are in parallel relation;
selection relation: the words connected before and after the selected word are in a selection relation;
centering relationship: the relationship between a noun and its intended object is expressed by a definite word modifying noun.
In the step of combining different vocabularies in the current syntax structure unit according to the lexical relation in each syntax structure unit in each hierarchy to obtain a new syntax structure unit, words are obtained according to the part of speech and the lexical relation of each vocabulary in each syntax structure unit, the different vocabularies in the current syntax structure unit are obtained by combining the words, and the different vocabularies form the new syntax structure unit.
Specifically, whether a parallel relation or a selection relation exists in each syntax structure unit is identified, if the parallel relation or the selection relation exists, at least two nouns and at least two fixed words selected by the parallel relation or the selection relation are combined to serve as words, and the corresponding nouns and the words of the fixed words corresponding to the nouns are combined to obtain different vocabularies.
By way of example, identifying "queen chairman" and "professor of plum" as parallel relationship, "queen chairman" and "professor of plum" as main language, "queen" and "plum" as dead language, then combining "queen" and "chairman" as queen chairman "and" plum "as professor of plum" as new syntax structural unit; the Hangzhou and Anyang are identified as parallel relations, the Hangzhou and Anyang are the subjects and no sign language exists, and the Hangzhou and Anyang are used as new syntactic structural units.
In the step of combining new syntax relation to form new syntax dependency relation as a sub-problem, new syntax structure units corresponding to the current level are obtained by combining new syntax structure units according to the syntax dependency relation in the lowest level, syntax structure units of different levels are combined according to the level from bottom to top, and the syntax structure unit of the uppermost level is combined according to the syntax dependency relation to form the sub-problem.
Specifically, according to the main-predicate relation, the in-state relation and the action-guest relation, the syntactic structure units corresponding to the main subject, the predicate and the object are combined to obtain a short sentence, and the short sentence is taken as the syntactic structure unit of the syntactic structure unit corresponding to the previous level and is combined in sequence to obtain the sub-problem.
Exemplary, there is a hierarchy under the object "Hangzhou and Anyang have conditions to identify ancient times": the method comprises the steps that a Hangzhou and an Anyang are syntactic structural units corresponding to subjects, a "comprise" syntactic structural unit corresponding to predicates, a "condition for identifying ancient times" is a syntactic structural unit corresponding to objects, the syntactic structural units of the layers are combined according to the syntactic dependency relationship to obtain a new syntactic structural unit corresponding to a "Hangzhou and an Anyang comprise a" condition for identifying ancient times "and a" condition for identifying ancient times or not ", a corresponding" king chairman "and a" Li professor "are new syntactic structural units, and a" evaluation "is a new syntactic structural unit, and then the new syntactic structural units of the current layer are combined to obtain a" condition for identifying ancient times by the king chairman, a "condition for identifying ancient times by the king chairman evaluation of the Anyang", "a condition for identifying ancient times by the Hangzhou professor evaluation of the Liu", "a condition for evaluating ancient times by the Li professor evaluation of the ancient times" as sub problems.
Specifically, when a problem cannot be disassembled, the sub-problem is identical to the problem.
Further, the hierarchical relation of the corresponding problem is expressed in the form of a grammar tree, and if the syntactic dependency relation exists in the syntactic structure unit of the current hierarchy, the next hierarchy of the current syntactic dependency relation is obtained through processing.
Specifically, the scheme utilizes the hierarchical structure of the grammar tree to construct information association between noun phrases in parallel relation and refine the problem.
Specifically, because the questions of the user may contain multiple query relationships, when the questions are directly embedded into the question-answering model, the relevant weight coefficients of certain words in the sentences are higher, and the retrieved knowledge is most likely to be lost.
In the scheme, in the step of training the question-answer model to retrieve and recall in the document database according to each sub-question to generate a sub-answer corresponding to each sub-question, the question-answer model retrieves and recalls in the document database according to each sub-question to obtain a corresponding recall result, generates at least one word segment according to each recall result, and decodes the generated word segment to obtain the sub-answer corresponding to each sub-question.
Specifically, in the scheme, when the question-answering model carries out retrieval recall according to each sub-question, recall can be carried out in a text recall mode or in a semantic recall mode, and the scheme is not limited.
Further, if the recall result is not correct, the question and answer model cannot answer correctly, so normally, when retrieving the recall, the recall rate is more concerned than the accuracy, that is, in order to ensure the recall rate, the recall number of a slightly larger point is set, so that the accuracy is lost, and in order to ensure the accuracy, too few recall numbers are set, so that the recall rate is reduced, so that in order to prevent the recall result from being too much, a large number of word fragments are generated, so that the answer of the question and answer model is too redundant.
In the scheme, the purpose of generating the sub-answers according to the recall result is to enable the model to understand the questions by utilizing the self understanding capability, although the generated sub-answers may have knowledge errors, the scheme is to screen out the part possibly containing the knowledge errors in the sub-answers by setting the recall threshold, but the part possibly has knowledge errors, but the part is still the recall result obtained according to the sub-questions, so that the possibility of generating wrong knowledge is furthest reduced, and the omission of useful knowledge is avoided.
In the step of carrying out recall in a document database based on a question and answer model by using the recall result to obtain a second recall result, a question and answer model is used for asking the question and answer model by using the question and answer model to carry out recall in the document database based on the question and answer to obtain the second recall result by using the question and answer model to generate a recall problem corresponding to the recall result.
Specifically, the knowledge retrieval recall optimization method designed by the scheme refines all knowledge points related to the problem by utilizing the intention understanding capability of the question-answering model, reduces the retrieval recall difficulty of the knowledge base, generates the recall problem corresponding to the recall result by utilizing the question-answering model, enhances the information of the retrieval problem, improves the recall accuracy, optimizes the whole question-answering system, and improves the problem that the accuracy of the question-answering model is low due to the lack of part of knowledge, thereby further improving the performance of the question-answering model.
In the scheme, in the step of carrying out vector fusion on a first recall result, a second recall result and a question to obtain a fusion result, and obtaining a question answer based on the fusion result by a question-answering model, each first recall result, each second recall result and the question are converted into vector representation through a vectorization model and fused to obtain an integration vector, and the question-answering model carries out knowledge reasoning based on the integration vector to obtain a question answer.
Specifically, the scheme utilizes a vectorization model to fuse each first recall result, each second recall result and the problem into a fusion result, and the fusion result is expressed as follows by a formula:
wherein v' is the fusion result, K is the total amount of the first recall result and the second recall result, t i And the i-th first recall result or the second recall result is obtained, v is a problem vector, and f is a vectorization model.
Specifically, the vectorization model in the scheme can be any model.
Example two
Based on the same conception, referring to fig. 2, the application also provides an intelligent question-answering optimizing device based on the document, which comprises:
the construction module comprises: constructing a document database and a question-answering model, obtaining questions belonging to the same field as documents in the document database, segmenting each question to obtain parts of speech, and decomposing each question into at least one sub-question according to the parts of speech;
and a recall module: inputting each sub-question into the question-answer model, wherein the question-answer model is trained to search and recall in the document database according to each sub-question to generate a sub-answer corresponding to each sub-question, decomposing each sub-answer into a first recall result and a recall result according to a set recall threshold, and carrying out recall in the document database based on the question-answer model by using the recall result to obtain a second recall result, wherein a part, with the generation probability of a word segment in the sub-answer being greater than or equal to the recall threshold, is used as the first recall result, and a part, with the generation probability of the word segment in the sub-answer being smaller than the recall threshold, is used as the recall result;
and a response module: and carrying out vector fusion on the first recall result, the second recall result and the question to obtain a fusion result, and obtaining a question answer based on the fusion result by the question-answering model.
Example III
The present embodiment also provides an electronic device, referring to fig. 3, comprising a memory 404 and a processor 402, the memory 404 having stored therein a computer program, the processor 402 being arranged to run the computer program to perform the steps of any of the above-mentioned document-based intelligent question-answer optimization methods.
In particular, the processor 402 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.
The memory 404 may include, among other things, mass storage 404 for data or instructions. By way of example, and not limitation, memory 404 may comprise a Hard Disk Drive (HDD), floppy disk drive, solid State Drive (SSD), flash memory, optical disk, magneto-optical disk, tape, or Universal Serial Bus (USB) drive, or a combination of two or more of these. Memory 404 may include removable or non-removable (or fixed) media, where appropriate. Memory 404 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 404 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 404 includes Read-only memory (ROM) and Random Access Memory (RAM). Where appropriate, the ROM may be a mask-programmed ROM, a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), an electrically rewritable ROM (EAROM) or FLASH memory (FLASH) or a combination of two or more of these. The RAM may be Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM) where appropriate, and the DRAM may be fast page mode dynamic random access memory 404 (FPMDRAM), extended Data Output Dynamic Random Access Memory (EDODRAM), synchronous Dynamic Random Access Memory (SDRAM), or the like.
Memory 404 may be used to store or cache various data files that need to be processed and/or used for communication, as well as possible computer program instructions for execution by processor 402.
Processor 402 implements any of the document-based intelligent question-answer optimization methods of the above embodiments by reading and executing computer program instructions stored in memory 404.
Optionally, the electronic apparatus may further include a transmission device 406 and an input/output device 408, where the transmission device 406 is connected to the processor 402 and the input/output device 408 is connected to the processor 402.
The transmission device 406 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wired or wireless network provided by a communication provider of the electronic device. In one example, the transmission device includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through the base station to communicate with the internet. In one example, the transmission device 406 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
The input-output device 408 is used to input or output information. In this embodiment, the input information may be a question or the like in the same field as the document database, and the output information may be an answer or the like to the corresponding question.
Alternatively, in the present embodiment, the above-mentioned processor 402 may be configured to execute the following steps by a computer program:
s101, constructing a document database and a question-answer model, obtaining questions belonging to the same field as documents in the document database, segmenting each question to obtain parts of speech, and decomposing each question into at least one sub-question according to the parts of speech;
s102, inputting each sub-question into the question-answer model, wherein the question-answer model is trained to search and recall in the document database according to each sub-question to generate a sub-answer corresponding to each sub-question, decomposing each sub-answer into a first recall result and a recall result according to a set recall threshold, and carrying out recall in the document database based on the question-answer model by using the recall result to obtain a second recall result, wherein a part, with the generation probability of a word segment in the sub-answer being greater than or equal to the recall threshold, is used as the first recall result, and a part, with the generation probability of the word segment in the sub-answer being smaller than the recall threshold, is used as the recall result;
and S103, vector fusion is carried out on the first recall result, the second recall result and the questions to obtain fusion results, and the question-answering model obtains answers to the questions based on the fusion results.
It should be noted that, specific examples in this embodiment may refer to examples described in the foregoing embodiments and alternative implementations, and this embodiment is not repeated herein.
In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of a mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also referred to as program products) including software routines, applets, and/or macros can be stored in any apparatus-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may include one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. In this regard, it should also be noted that any block of the logic flow as in fig. 3 may represent a procedure step, or interconnected logic circuits, blocks and functions, or a combination of procedure steps and logic circuits, blocks and functions. The software may be stored on a physical medium such as a memory chip or memory block implemented within a processor, a magnetic medium such as a hard disk or floppy disk, and an optical medium such as, for example, a DVD and its data variants, a CD, etc. The physical medium is a non-transitory medium.
It should be understood by those skilled in the art that the technical features of the above embodiments may be combined in any manner, and for brevity, all of the possible combinations of the technical features of the above embodiments are not described, however, they should be considered as being within the scope of the description provided herein, as long as there is no contradiction between the combinations of the technical features.
The foregoing examples merely represent several embodiments of the present application, the description of which is more specific and detailed and which should not be construed as limiting the scope of the present application in any way. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. The intelligent question-answering optimization method based on the document is characterized by comprising the following steps of:
constructing a document database and a question-answering model, obtaining questions belonging to the same field as documents in the document database, segmenting each question to obtain parts of speech, and decomposing each question into at least one sub-question according to the parts of speech;
inputting each sub-question into the question-answer model, wherein the question-answer model is trained to search and recall in the document database according to each sub-question to generate a sub-answer corresponding to each sub-question, decomposing each sub-answer into a first recall result and a recall result according to a set recall threshold, and carrying out recall in the document database based on the question-answer model by using the recall result to obtain a second recall result, wherein a part, with the generation probability of a word segment in the sub-answer being greater than or equal to the recall threshold, is used as the first recall result, and a part, with the generation probability of the word segment in the sub-answer being smaller than the recall threshold, is used as the recall result;
and carrying out vector fusion on the first recall result, the second recall result and the question to obtain a fusion result, and obtaining a question answer based on the fusion result by the question-answering model.
2. The intelligent question-answering optimization method based on the document according to claim 1, wherein in the scheme, in the steps of word segmentation of the question to obtain parts of speech and disassembly of the question into at least one sub-question according to parts of speech, word segmentation is carried out on the question to obtain parts of speech and obtain syntactic structural units corresponding to subjects, predicates and objects according to syntactic dependency of the question, wherein the syntactic dependency comprises a main-predicate relationship, a state-in-state relationship and a moving object relationship, the question is divided into a plurality of layers according to different syntactic dependencies in the question, each layer corresponds to one syntactic dependency, a new syntactic structural unit is obtained by combining different vocabularies in the current syntactic structural unit according to the part of speech in each syntactic structural unit in each layer, and the new syntactic structural unit is combined as the sub-question according to the syntactic dependency.
3. The intelligent question-answering optimization method based on the document according to claim 2, wherein in the step of "word segmentation of the question to obtain parts of speech and obtain syntactic structural units corresponding to subjects, predicates and objects according to syntactic dependency of the question", word segmentation of the question is performed to obtain parts of speech of each word in the question, and main-predicate relationships, mid-state relationships and guest-to-move relationships are identified in combination with the parts of speech of the words and the syntactic dependency of the question; the nouns or phrases in the main-predicate relationship are identified as subjects, the verbs in the identified relationships are predicates, and the nouns or phrases combined with the predicates in the moving-guest relationship are identified as objects.
4. The intelligent question-answering optimizing method based on the document according to claim 1, wherein in the step of training the question-answering model to retrieve and recall in the document database according to each sub-question to generate a sub-answer corresponding to each sub-question, the question-answering model retrieves and recalls in the document database according to each sub-question to obtain a corresponding recall result, and generates at least one word segment according to each recall result, and the question-answering model decodes the generated word segment to obtain a sub-answer corresponding to each sub-question.
5. The intelligent question-answering optimizing method based on documents according to claim 1, wherein a first number is set, when the question-answering model generates a first number of word segments according to recall results of a sub-question, decoding the first number of word segments to obtain sub-answers, and discarding the rest of word segments corresponding to the sub-question.
6. The intelligent question-answering optimization method based on the document according to claim 1, wherein in the step of carrying out the recall in the document database based on the question-answering model by using the recall result to obtain the second recall result, a recall problem corresponding to the recall result is generated by using the question-answering model, the question-answering model is used for carrying out the question asking, and the question-answering model carries out the recall in the document database based on the recall problem to obtain the second recall result.
7. The intelligent question-answering optimization method based on the document according to claim 1, wherein in the step of carrying out vector fusion on a first recall result, a second recall result and a question to obtain a fusion result, the question-answering model obtains a question answer based on the fusion result, each first recall result, each second recall result and the question are converted into vector representations through a vectorization model and are fused to obtain an integration vector, and the question-answering model carries out knowledge reasoning based on the integration vector to obtain the question answer.
8. An intelligent question-answering optimizing device based on a document, which is characterized by comprising:
the construction module comprises: constructing a document database and a question-answering model, obtaining questions belonging to the same field as documents in the document database, segmenting each question to obtain parts of speech, and decomposing each question into at least one sub-question according to the parts of speech;
and a recall module: inputting each sub-question into the question-answer model, wherein the question-answer model is trained to search and recall in the document database according to each sub-question to generate a sub-answer corresponding to each sub-question, decomposing each sub-answer into a first recall result and a recall result according to a set recall threshold, and carrying out recall in the document database based on the question-answer model by using the recall result to obtain a second recall result, wherein a part, with the generation probability of a word segment in the sub-answer being greater than or equal to the recall threshold, is used as the first recall result, and a part, with the generation probability of the word segment in the sub-answer being smaller than the recall threshold, is used as the recall result;
and a response module: and carrying out vector fusion on the first recall result, the second recall result and the question to obtain a fusion result, and obtaining a question answer based on the fusion result by the question-answering model.
9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform a document-based intelligent question-answer optimization method according to any one of claims 1-7.
10. A readable storage medium, characterized in that the readable storage medium has stored therein a computer program comprising program code for controlling a process to execute a process comprising a document based intelligent question-answer optimization method according to any of the claims 1-7.
CN202311406253.4A 2023-10-25 2023-10-25 Intelligent question-answering optimization method, device and application based on document Pending CN117520492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311406253.4A CN117520492A (en) 2023-10-25 2023-10-25 Intelligent question-answering optimization method, device and application based on document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311406253.4A CN117520492A (en) 2023-10-25 2023-10-25 Intelligent question-answering optimization method, device and application based on document

Publications (1)

Publication Number Publication Date
CN117520492A true CN117520492A (en) 2024-02-06

Family

ID=89752240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311406253.4A Pending CN117520492A (en) 2023-10-25 2023-10-25 Intelligent question-answering optimization method, device and application based on document

Country Status (1)

Country Link
CN (1) CN117520492A (en)

Similar Documents

Publication Publication Date Title
CN109344236B (en) Problem similarity calculation method based on multiple characteristics
CN111353310B (en) Named entity identification method and device based on artificial intelligence and electronic equipment
Duan et al. Question generation for question answering
CN108959312B (en) Method, device and terminal for generating multi-document abstract
CN106649742B (en) Database maintenance method and device
US10061766B2 (en) Systems and methods for domain-specific machine-interpretation of input data
US8332434B2 (en) Method and system for finding appropriate semantic web ontology terms from words
CN112287670A (en) Text error correction method, system, computer device and readable storage medium
CN111159359B (en) Document retrieval method, device and computer readable storage medium
CN110457708B (en) Vocabulary mining method and device based on artificial intelligence, server and storage medium
CN106484682A (en) Based on the machine translation method of statistics, device and electronic equipment
US20220343082A1 (en) System and method for ensemble question answering
US9720962B2 (en) Answering superlative questions with a question and answer system
WO2014155209A1 (en) User collaboration for answer generation in question and answer system
CN116805001A (en) Intelligent question-answering system and method suitable for vertical field and application of intelligent question-answering system and method
US10810245B2 (en) Hybrid method of building topic ontologies for publisher and marketer content and ad recommendations
CN111104803B (en) Semantic understanding processing method, device, equipment and readable storage medium
JP2022115815A (en) Semantic code search based on augmented programming language corpus
CN112581327B (en) Knowledge graph-based law recommendation method and device and electronic equipment
WO2024011813A1 (en) Text expansion method and apparatus, device, and medium
CN115309910B (en) Language-text element and element relation joint extraction method and knowledge graph construction method
CN113779062A (en) SQL statement generation method and device, storage medium and electronic equipment
CN112149427A (en) Method for constructing verb phrase implication map and related equipment
CN117094323A (en) Document relation extraction method and system for knowledge graph construction
US10296585B2 (en) Assisted free form decision definition using rules vocabulary

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination