CN117556057A - Knowledge question-answering method, vector database construction method and device - Google Patents

Knowledge question-answering method, vector database construction method and device Download PDF

Info

Publication number
CN117556057A
CN117556057A CN202311566499.8A CN202311566499A CN117556057A CN 117556057 A CN117556057 A CN 117556057A CN 202311566499 A CN202311566499 A CN 202311566499A CN 117556057 A CN117556057 A CN 117556057A
Authority
CN
China
Prior art keywords
vector
document
block
target
information block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311566499.8A
Other languages
Chinese (zh)
Inventor
陈祚松
谭学士
李云龙
李洪亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianxin Technology Group Co Ltd
Original Assignee
Qianxin Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianxin Technology Group Co Ltd filed Critical Qianxin Technology Group Co Ltd
Priority to CN202311566499.8A priority Critical patent/CN117556057A/en
Publication of CN117556057A publication Critical patent/CN117556057A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a knowledge question-answering method, a vector database construction method and a device, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: converting the questions to be answered into question vectors; determining a target information block vector matched with the problem vector in a vector database, wherein at least one information block vector is stored in the vector database; the information block vector is determined based on the document block; the file block is obtained by dividing a file based on event integrity, and a target Wen Dangkuai corresponding to the target information block vector is determined; determining target knowledge corresponding to the to-be-answered question based on the target document block; and outputting target knowledge. Because the document blocks are obtained by dividing the document based on the event integrity, the target document blocks corresponding to the determined target information block vector can completely express an event, so that the obtained associated information of the questions to be answered is more complete, and the accuracy of knowledge question answering can be improved.

Description

Knowledge question-answering method, vector database construction method and device
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a knowledge question-answering method, a vector database construction method and a vector database construction device.
Background
With the qualitative leap in the ability of large language models to understand and infer natural language in the general domain, more and more domains began exploring question-answering applications in the vertical domain based on large language models.
In the related art, a document in a specific field is generally divided into a plurality of information blocks according to punctuation marks, the information blocks are converted into information block vectors, and then the information blocks and the information block vectors are correspondingly stored in a vector database; when a question input by a user is acquired, the question is converted into a question vector, an information block vector matched with the question vector is searched in a vector database, information blocks corresponding to the matched information block vector are taken as centers, information adjacent to the information block and the information block are taken as associated information of the question based on a preset length threshold, and then answers corresponding to the question are determined based on the associated information.
However, in the related art, the information blocks stored in the vector database are obtained by dividing based on punctuation marks, and if the associated information of the question is determined based on a preset length threshold, the obtained associated information is incomplete, so that the accuracy of knowledge response is reduced.
Disclosure of Invention
Aiming at the problems existing in the prior art, the embodiment of the invention provides a knowledge question-answering method, a vector database construction method and a vector database construction device.
The invention provides a knowledge question-answering method, which comprises the following steps:
converting the questions to be answered into question vectors;
determining a target information block vector matched with the problem vector in a vector database, wherein at least one information block vector is stored in the vector database; the information block vector is determined based on the document block; the file blocks are obtained by dividing the files based on event integrity;
determining a target Wen Dangkuai corresponding to the target information block vector;
determining target knowledge corresponding to the to-be-answered question based on the target document block;
and outputting the target knowledge.
According to the knowledge question-answering method provided by the invention, the vector database also stores the identification of the document block to which each information block vector belongs;
the determining the target document block corresponding to the target information block vector comprises the following steps:
determining the identification of a target document block to which the target information block vector belongs in the vector database;
determining a target Wen Dangkuai in a document database corresponding to the identity of the target document block; the corresponding relation between the document block and the identification of the document block is stored in the document database.
According to the knowledge question-answering method provided by the invention, the method further comprises the following steps:
aiming at each document block, acquiring at least one preset problem corresponding to the document block;
splicing all preset problems corresponding to the Wen Dangkuai and the document blocks to obtain new document blocks;
dividing the new document block based on a preset semantic division character to obtain at least one information block corresponding to the new document block;
converting each information block into an information block vector;
and constructing the vector database based on the information block vectors and the identification of the document blocks to which the information block vectors belong.
According to the knowledge question-answering method provided by the invention, the method further comprises the following steps:
carrying out semantic analysis on each sentence in the document to obtain a semantic analysis result corresponding to each sentence;
determining at least one target sentence which characterizes the same event integrity based on each semantic analysis result;
dividing the at least one target sentence into a document block;
the document database is constructed based on the Wen Dangkuai and the identification of the document blocks.
According to the knowledge question-answering method provided by the invention, the target knowledge corresponding to the to-be-answered question is determined based on the target document block, and the knowledge question-answering method comprises the following steps:
Inputting the target document block and the questions to be answered into a large language model, and carrying out semantic understanding on the target document block based on the questions to be answered through the large language model to obtain target knowledge corresponding to the questions to be answered.
According to the knowledge question answering method provided by the invention, the target information block vector matched with the problem vector is determined in a vector database, and the knowledge question answering method comprises the following steps:
determining the similarity between the problem vector and each information block vector in the vector database;
outputting information blocks corresponding to the first information block vectors under the condition that the number of the first information block vectors corresponding to the similarity larger than the preset similarity is at least two;
receiving a selection instruction input by a user and used for selecting a target information block;
and responding to the selection instruction, and determining a first information block vector corresponding to the target information block as the target information block vector.
According to the knowledge question answering method provided by the invention, the to-be-answered questions are converted into the question vectors, and the knowledge question answering method comprises the following steps:
inputting the questions to be answered into a vector conversion model to obtain the question vectors output by the vector conversion model; the vector conversion model is obtained by training an initial vector conversion model based on a problem sample and label information of the problem sample, wherein the label information is used for representing a vector sample corresponding to the problem sample.
The invention also provides a vector database construction method, which comprises the following steps:
dividing the document based on event integrity to obtain at least one document block;
aiming at each document block, acquiring at least one preset problem corresponding to the document block;
splicing all preset problems corresponding to the Wen Dangkuai and the document blocks to obtain new document blocks;
dividing the new document block based on a preset semantic division character to obtain at least one information block corresponding to the new document block;
converting each information block into an information block vector;
the vector database is constructed based on each of the information block vectors.
The invention also provides a knowledge question-answering device, which comprises:
the first conversion unit is used for converting the questions to be answered into question vectors;
a first determining unit, configured to determine a target information block vector matched with the problem vector in a vector database, where at least one information block vector is stored in the vector database; the information block vector is determined based on the document block; the file blocks are obtained by dividing the files based on event integrity;
a second determining unit, configured to determine a target Wen Dangkuai corresponding to the target information block vector;
The third determining unit is used for determining target knowledge corresponding to the to-be-answered question based on the target document block;
and the output unit is used for outputting the target knowledge.
The invention also provides a vector database construction device, which comprises:
the dividing unit is used for dividing the document based on event integrity to obtain at least one document block;
an obtaining unit, configured to obtain, for each document block, at least one preset problem corresponding to the document block;
the splicing unit is used for splicing all preset problems corresponding to the Wen Dangkuai and the document blocks to obtain new document blocks;
the segmentation unit is used for segmenting the new document block based on a preset semantic segmenter to obtain at least one information block corresponding to the new document block;
a second conversion unit configured to convert each of the information blocks into an information block vector, respectively;
a first construction unit for constructing the vector database based on each of the information block vectors.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the knowledge question-answering method according to any one of the above or realizes the vector database construction method according to any one of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a knowledge question-and-answer method as described in any one of the above, or implements a vector database construction method as described in any one of the above.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a knowledge question-and-answer method as described in any one of the above, or implements a vector database construction method as described in any one of the above.
The knowledge question answering method, the vector database construction method and the device provided by the invention are used for converting the acquired questions to be answered into the question vectors, determining target information block vectors matched with the question vectors in the vector database, determining target document blocks corresponding to the target information block vectors, taking the target document blocks as associated knowledge of the questions to be answered, and further determining target knowledge corresponding to the questions to be answered based on the associated knowledge. Because the document blocks are obtained by dividing the document based on the event integrity, the target document blocks corresponding to the determined target information block vector can completely express an event, so that the obtained associated information of the questions to be answered is more complete, and the accuracy of knowledge question answering can be improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a knowledge question-answering method according to an embodiment of the present invention;
FIG. 2 is a second flowchart of a knowledge question-answering method according to an embodiment of the present invention;
FIG. 3 is a third flow chart of a knowledge question-answering method according to an embodiment of the present invention;
FIG. 4 is a flowchart of a vector database construction method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a knowledge question-answering device according to an embodiment of the present invention;
fig. 6 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The knowledge question-answering method of the present invention is described below with reference to fig. 1 to 3. The execution subject of the knowledge question-answering method can be electronic equipment such as a terminal, a computer and the like, or a knowledge question-answering device arranged in the electronic equipment, and the knowledge question-answering device can be realized by software, hardware or a combination of the two.
Fig. 1 is one of flow diagrams of a knowledge question-answering method provided by an embodiment of the present invention, as shown in fig. 1, the knowledge question-answering method includes the following steps:
and step 101, converting the questions to be answered into question vectors.
For example, the user may input a question to be answered through the electronic device, the input question to be answered may be a text type question to be answered or a voice type question to be answered, and when the electronic device obtains the voice type question to be answered, the electronic device converts the voice type question to be answered into the text type question to be answered, and then converts the text type question to be answered into a question vector.
102, determining a target information block vector matched with the problem vector in a vector database; at least one information block vector is stored in the vector database; the information block vector is determined based on the document block; the file blocks are obtained by dividing the files based on event integrity.
The purpose of dividing the document based on the event integrity is mainly to divide the document information belonging to the same complete event into the same document blocks so as to facilitate subsequent processing and analysis. For example, for a document of an enterprise employee computer management method, a section in the document describes the requirement that the employee computer uses the enterprise intranet, the section includes a plurality of paragraphs, if the document is divided in a semantic division manner (such as text division by periods, marks, paragraph line-feeding symbols of text, etc.), the section is divided into a plurality of small information blocks, and the section includes a plurality of paragraphs which should be divided into the same document block as a complete information according to event integrity (such as document information of a complete event belonging to the requirement that the employee computer uses the enterprise intranet) as a division condition, so that the document block obtained by division can represent a complete event. In other cases, the division may be based on events related to a particular enterprise, such that the divided document blocks are relatively complete, for example, event integrity includes a requirement event that an employee computer use an intranet of the enterprise, an enterprise security system management event, an enterprise rewards and punishments system management event, or an enterprise business process management event, etc.
For example, when a question vector corresponding to a question to be answered is obtained, the similarity between the question vector and each information block vector in the vector database is determined, and the information block vector corresponding to the maximum similarity is determined as a target information block vector matched with the question vector.
And step 103, determining a target file block corresponding to the target information block vector.
For example, in determining the target information block vector, a target document block corresponding to the target information block vector may be determined based on a home relationship of the information block vector and the document block.
And 104, determining target knowledge corresponding to the to-be-answered question based on the target document block.
When the target document block is obtained, the target document block is used as the associated information of the to-be-answered questions, and further, the contents of the target document block are analyzed and summarized based on the to-be-answered questions, so that target knowledge corresponding to the to-be-answered questions, namely answers corresponding to the to-be-answered questions, are obtained.
And step 105, outputting the target knowledge.
For example, when the target knowledge corresponding to the question to be answered is obtained, the target knowledge is displayed on a display screen of the electronic device, or is played through a speaker of the electronic device, or is sent to other devices for display, and the other devices can be projectors or other devices with display functions.
The knowledge question answering method provided by the invention converts the acquired questions to be answered into the question vectors, firstly determines target information block vectors matched with the question vectors in a vector database, then determines target document blocks corresponding to the target information block vectors, takes the target document blocks as associated knowledge of the questions to be answered, and further determines target knowledge corresponding to the questions to be answered based on the associated knowledge. Because the document blocks are obtained by dividing the document based on the event integrity, the target document blocks corresponding to the determined target information block vector can completely express an event, so that the obtained associated information of the questions to be answered is more complete, and the accuracy of knowledge question answering can be improved.
In an embodiment, the vector database further stores an identifier of a document block to which each information block vector belongs; the above step 103 may be specifically implemented by the following manner:
determining the identification of a target document block to which the target information block vector belongs in the vector database; determining a target Wen Dangkuai in a document database corresponding to the identity of the target document block; the corresponding relation between the document block and the identification of the document block is stored in the document database.
Wherein the identification of the document block is used to uniquely identify the document block.
For example, the identification of the document block corresponding to the target information block vector is searched in the vector database, the identification of the searched document block corresponding to the target information block vector is determined as the identification of the target document block to which the target information block vector belongs, the document block corresponding to the identification of the target document block is searched in the document database, and the searched document block corresponding to the identification of the target document block is determined as the target document block.
In an embodiment, fig. 2 is a second flowchart of a knowledge question-answering method according to the embodiment of the present invention, as shown in fig. 2, before step 101, the knowledge question-answering method further includes the following steps:
step 106, aiming at each document block, obtaining at least one preset problem corresponding to the document block.
Illustratively, dividing a complete document including knowledge about a particular enterprise based on event integrity to obtain a plurality of document blocks; for example, for a document of an enterprise employee computer management method, a section in the document describes the requirement that the employee computer uses the enterprise intranet, the section includes a plurality of paragraphs, if the document is divided in a semantic division manner (text division by periods, marks, paragraph line-boxes, etc.), the section is divided into a plurality of small information blocks, while the requirement that the employee computer uses the enterprise intranet is a single and more complete event, the section includes a plurality of paragraphs which should be divided into the same document block as complete information according to the event integrity (document information of the complete event which is the requirement that the employee computer uses the enterprise intranet), so that the document block obtained by the division of the present invention can represent a complete event.
Further, for each document block obtained by dividing the document based on the event integrity, presetting a plurality of corresponding preset questions for each document block, and correspondingly storing the plurality of preset questions and the document blocks; for example, the document block 1 is a description about the requirement of the employee computer to use the company intranet, and then some questions can be preset for the document block, and the preset question 1 can be "what conditions the employee computer access company content needs to meet? "what can be the preset question 2" what can be connected to the company intranet? "etc., then storing the document block 1, the preset question 1 and the preset question 2 correspondingly, so that at least one preset question corresponding to the document block can be acquired for each document block.
And 107, splicing all preset problems corresponding to the Wen Dangkuai and the document blocks to obtain a new document block.
When all preset questions corresponding to each document block are obtained, adding all the preset questions corresponding to the document block to the end position or the start position of the document block, and splicing all the preset questions corresponding to the document block and the document block to obtain a new document block comprising the document block and the preset questions.
And step 108, dividing the new document block based on a preset semantic division character to obtain at least one information block corresponding to the new document block.
For example, when a new document block is obtained, the new document block is segmented based on a preset semantic segmenter, where the preset semantic segmenter may be a period, a semicolon, a paragraph line-feeding symbol, etc., and at least one information block corresponding to the new document block is finally obtained, and all the information blocks form the new document block.
Step 109, converting each information block into an information block vector.
When obtaining each information block corresponding to a new document block, word embedding processing is carried out on each information block by using a document vector conversion model, so as to obtain an information block vector corresponding to each information block output by the document vector conversion model. The document vector conversion model is obtained by training an initial document vector conversion model based on the information sample and the real vector label corresponding to the information sample.
It should be noted that, the specific training process of the document vector conversion model is as follows: inputting a plurality of information samples into an initial document vector conversion model, performing feature analysis on the information samples through the initial document vector conversion model to obtain a predicted vector of each information sample output by the initial document vector conversion model, constructing a loss function based on the predicted vector and the corresponding information samples, and adjusting model parameters of the initial document vector conversion model based on the loss function until convergence conditions are reached to obtain the document vector conversion model.
It should be noted that, the specific structure of the initial document vector conversion model may be a deep neural network (Deep Neural Networks, DNN) or a convolutional neural network (Convolutional Neural Network, CNN), which is not limited in the present invention.
Step 110, constructing the vector database based on each information block vector and the identification of the document block to which each information block vector belongs.
For example, since each information block is a part of a new document block, any one information block belongs to a new document block, when an information block vector corresponding to each information block is obtained, the corresponding relationship between the information block vector and the identifier of the document block to which the information block vector belongs may be stored in the vector database, so as to obtain a constructed vector database.
In this embodiment, a plurality of preset questions are set for each document block, all preset questions corresponding to the document blocks are spliced, then new document blocks obtained by splicing are segmented based on preset semantic segmenters, each information block corresponding to the new document block is obtained, information block vectors corresponding to each information block and identifiers of the corresponding document block are stored in a vector database in a one-to-one correspondence manner, so that the constructed vector database simultaneously comprises information block vectors of the preset questions and information block vectors of the document blocks, and because the preset questions are preset based on contents of the document blocks, the question vectors can be preferentially matched to the information block vectors corresponding to the preset questions in the vector database, so that more accurate information block vectors can be matched to the information block vectors corresponding to the questions to be answered in the vector database, and further accuracy of target document blocks related to the questions to be answered is improved, namely, accuracy of associated information recall is improved, and completeness of question-and recall contents is further ensured.
In an embodiment, fig. 3 is a third flowchart of the knowledge question-answering method according to the embodiment of the present invention, as shown in fig. 3, before step 101, the knowledge question-answering method further includes the following steps:
and 111, carrying out semantic analysis on each sentence in the document to obtain a semantic analysis result corresponding to each sentence.
For example, the document can be input into a semantic analysis model, sentence division is performed on the document through the semantic analysis model to obtain each sentence, and semantic analysis is performed on each sentence to obtain a semantic analysis result corresponding to each sentence output by the semantic analysis model.
It should be noted that the semantic analysis model is specifically trained based on the following manner: inputting the document sample into an initial semantic analysis model to obtain a prediction semantic analysis result corresponding to each statement sample in the document sample output by the initial semantic analysis model, constructing a loss function based on the prediction semantic analysis result and the real semantic label, and adjusting model parameters of the initial semantic analysis model based on the loss function until convergence conditions are reached to obtain the semantic analysis model.
It should be noted that, the specific structure of the initial semantic analysis model may be a deep neural network or a convolutional neural network, which is not limited in the present invention.
Step 112, determining at least one target sentence which characterizes the same event integrity based on each semantic analysis result.
For example, when the semantic analysis results of each sentence are obtained, the semantic analysis results representing the same event integrity are divided together, and the sentences corresponding to the semantic analysis results divided together are determined as target sentences, that is, the target sentences are all related to the same event integrity.
Step 113, dividing the at least one target sentence into a document block.
For example, since the target sentences are all related to the same event integrity, the target sentences can be integrally used as a document block, so that the division of the documents based on the event integrity is completed.
Step 114, constructing the document database based on the Wen Dangkuai and the identification of the document block.
For example, when each document block divided based on event integrity is obtained, an identifier capable of uniquely representing the document block is set for each document block, and the correspondence between Wen Dangkuai and the identifier of the document block is stored in the document database, thereby completing the construction of the document database.
In this embodiment, at least one target sentence representing the same event integrity is determined based on the semantic analysis result of each sentence in the document, and all target sentences representing the same event integrity are used as one document block, so that each document block included in the constructed document database is a document block capable of completely representing the event integrity, and when the document database is used subsequently, complete associated information related to a question to be answered can be found in the document database, the integrity of the associated information of the question to be answered is improved, and the accuracy of knowledge questions and answers can be further improved.
In an embodiment, the determining, by the step 104, the target knowledge corresponding to the to-be-answered question based on the target document block may be implemented specifically by:
inputting the target document block and the questions to be answered into a large language model, and carrying out semantic understanding on the target document block based on the questions to be answered through the large language model to obtain target knowledge corresponding to the questions to be answered.
Where large language models refer to deep learning models trained using large amounts of text data, natural language text may be generated or meaning of the language text understood. The large language model may handle a variety of natural language tasks such as text classification, questions and answers, conversations, and the like.
The preset Prompt information (Prompt) template comprises a known information adding part, a Prompt content based on the known information and a question adding part, a target document block is added at the position of the known information adding part of the preset promt template, and a question to be answered is added at the position of the question adding part of the preset promt template, so that formatting processing of the target document block and the question to be answered based on the preset promt template is realized, a new Prompt is obtained, the new Prompt is input into a large language model, semantic understanding and summarizing are carried out on the target document block through the large language model based on the Prompt content in the new Prompt and the added question to be answered, and target knowledge corresponding to the question to be answered, namely an answer corresponding to the question to be answered, is obtained.
In this embodiment, semantic understanding is performed on the target document block based on the to-be-answered question through the large language model, so as to obtain target knowledge corresponding to the to-be-answered question, and because the accuracy of the determined target document block is higher, the accuracy of the finally determined target knowledge is also higher.
In one embodiment, the determining the target information block vector matching the problem vector in the vector database in step 102 may be implemented specifically by:
determining the similarity between the problem vector and each information block vector in the vector database; outputting information blocks corresponding to the first information block vectors under the condition that the number of the first information block vectors corresponding to the similarity larger than the preset similarity is at least two; receiving a selection instruction input by a user and used for selecting a target information block; and responding to the selection instruction, and determining a first information block vector corresponding to the target information block as the target information block vector.
The method includes the steps that the similarity of a question vector and each information block vector in a vector database is calculated respectively, the similarity is compared with preset similarity, when the number of first information block vectors corresponding to the similarity larger than the preset similarity is larger than or equal to two, the information blocks corresponding to all the first information block vectors are indicated to include associated information related to a question to be answered, at the moment, the information blocks corresponding to the first information block vectors are displayed in a control mode, so that a user can select a target information block from a plurality of displayed information blocks based on requirements, namely when the user determines the target information block, the position corresponding to the target information block is clicked, the electronic equipment can obtain a selection instruction which is input by the user and is used for selecting the target information block, and the first information block vector corresponding to the target information block indicated by the selection instruction is determined to be the target information block vector.
In this embodiment, when the number of first information block vectors corresponding to the similarity greater than the preset similarity is greater than or equal to two, the information blocks corresponding to each first information block vector are displayed, so that a user selects a target information block from a plurality of displayed information blocks based on requirements, and finally, target knowledge determined based on the target information block is more consistent with the requirements of the user, and accuracy of knowledge questions and answers is improved.
In one embodiment, the step 101 converts the questions to be answered into the question vectors, which may be implemented specifically by the following ways:
inputting the questions to be answered into a vector conversion model to obtain the question vectors output by the vector conversion model; the vector conversion model is obtained by training an initial vector conversion model based on a problem sample and label information of the problem sample, wherein the label information is used for representing a vector sample corresponding to the problem sample.
By way of example, the questions to be answered are input into a vector conversion model, word embedding processing is carried out on the questions to be answered through the vector conversion model, and the question vectors output by the vector conversion model are obtained.
It should be noted that, the specific training process of the vector conversion model is as follows: inputting a plurality of problem samples into an initial vector conversion model, performing feature analysis on the problem samples through the initial vector conversion model to obtain a predicted problem vector of the problem samples output by the initial vector conversion model, constructing a loss function based on the predicted problem vector and label information of the vector samples corresponding to the characterization problem samples, and adjusting model parameters of the initial vector conversion model based on the loss function until convergence conditions are reached to obtain the vector conversion model.
It should be noted that, the specific structure of the initial vector conversion model may be a deep neural network or a convolutional neural network, which is not limited in the present invention.
In this embodiment, the to-be-answered questions are converted into the question vectors based on the vector conversion model, so that the conversion efficiency of the question vectors is improved, and the answer efficiency of the knowledge questions and answers is further improved.
Fig. 4 is a flowchart of a vector database construction method according to an embodiment of the present invention, as shown in fig. 4, where the vector database construction method includes the following steps:
step 401, dividing the document based on the event integrity to obtain at least one document block.
Step 402, obtaining at least one preset problem corresponding to each document block according to each document block.
And 403, splicing all preset problems corresponding to the Wen Dangkuai and the document blocks to obtain a new document block.
And step 404, dividing the new document block based on a preset semantic division character to obtain at least one information block corresponding to the new document block.
Step 405, converting each information block into an information block vector.
Step 406, constructing the vector database based on each information block vector.
According to the vector database construction method provided by the invention, documents are divided based on event integrity, at least one preset question corresponding to Wen Dangkuai is acquired for each document block obtained through division, all preset questions corresponding to the document blocks and the document blocks are spliced to obtain new document blocks, the new document blocks are segmented based on preset semantic segmenters to obtain at least one information block corresponding to the new document blocks, each information block is converted into an information block vector respectively, a vector database is constructed based on each information block vector finally, the target information block vector of the question vector corresponding to the question to be answered is conveniently searched based on the vector database, and then target knowledge corresponding to the question to be answered is determined based on the target document block corresponding to the target information block vector. Because the information block vector in the vector database is determined based on the document blocks, and the document blocks are obtained by dividing the document based on the event integrity, the target document block corresponding to the determined target information block vector can completely express an event, so that the obtained associated information of the questions to be answered is more complete, and the accuracy of knowledge question and answer can be improved.
The knowledge question-answering device provided by the invention is described below, and the knowledge question-answering device described below and the knowledge question-answering method described above can be correspondingly referred to each other.
Fig. 5 is a schematic structural diagram of a knowledge question-answering device provided by an embodiment of the present invention, and as shown in fig. 5, the knowledge question-answering device 500 includes a first conversion unit 501, a first determination unit 502, a second determination unit 503, a third determination unit 504, and an output unit 505; wherein:
a first converting unit 501, configured to convert a question to be answered into a question vector;
a first determining unit 502, configured to determine a target information block vector matching the problem vector in a vector database, where at least one information block vector is stored; the information block vector is determined based on the document block; the file blocks are obtained by dividing the files based on event integrity;
a second determining unit 503, configured to determine a target Wen Dangkuai corresponding to the target information block vector;
a third determining unit 504, configured to determine, based on the target document block, target knowledge corresponding to the to-be-answered question;
an output unit 505 for outputting the target knowledge.
The knowledge question answering device provided by the invention converts the acquired questions to be answered into the question vectors, firstly determines target information block vectors matched with the question vectors in a vector database, then determines target document blocks corresponding to the target information block vectors, takes the target document blocks as associated knowledge of the questions to be answered, and further determines target knowledge corresponding to the questions to be answered based on the associated knowledge. Because the document blocks are obtained by dividing the document based on the event integrity, the target document blocks corresponding to the determined target information block vector can completely express an event, so that the obtained associated information of the questions to be answered is more complete, and the accuracy of knowledge question answering can be improved.
Based on any one of the above embodiments, the vector database further stores an identifier of a document block to which each information block vector belongs; the second determining unit 503 is specifically configured to:
determining the identification of a target document block to which the target information block vector belongs in the vector database;
determining a target Wen Dangkuai in a document database corresponding to the identity of the target document block; the corresponding relation between the document block and the identification of the document block is stored in the document database.
Based on any of the above embodiments, the knowledge question-answering apparatus 500 further includes:
an obtaining unit, configured to obtain, for each document block, at least one preset problem corresponding to the document block;
the splicing unit is used for splicing all preset problems corresponding to the Wen Dangkuai and the document blocks to obtain new document blocks;
the segmentation unit is used for segmenting the new document block based on a preset semantic segmenter to obtain at least one information block corresponding to the new document block;
a second conversion unit configured to convert each of the information blocks into an information block vector, respectively;
the first construction unit is used for constructing the vector database based on the information block vectors and the identification of the document block to which the information block vector belongs.
Based on any of the above embodiments, the knowledge question-answering apparatus 500 further includes:
the analysis unit is used for carrying out semantic analysis on each sentence in the document to obtain a semantic analysis result corresponding to each sentence;
a fourth determining unit configured to determine at least one target sentence characterizing the integrity of the same event based on each of the semantic analysis results;
a fifth determining unit for dividing the at least one target sentence into a document block;
and a second construction unit, configured to construct the document database based on the Wen Dangkuai and the identification of the document block.
Based on any of the above embodiments, the third determining unit 504 is specifically configured to:
inputting the target document block and the questions to be answered into a large language model, and carrying out semantic understanding on the target document block based on the questions to be answered through the large language model to obtain target knowledge corresponding to the questions to be answered.
Based on any of the above embodiments, the first determining unit 502 is specifically configured to:
determining the similarity between the problem vector and each information block vector in the vector database;
outputting information blocks corresponding to the first information block vectors under the condition that the number of the first information block vectors corresponding to the similarity larger than the preset similarity is at least two;
Receiving a selection instruction input by a user and used for selecting a target information block;
and responding to the selection instruction, and determining a first information block vector corresponding to the target information block as the target information block vector.
Based on any of the above embodiments, the first conversion unit 501 is specifically configured to:
inputting the questions to be answered into a vector conversion model to obtain the question vectors output by the vector conversion model; the vector conversion model is obtained by training an initial vector conversion model based on a problem sample and label information of the problem sample, wherein the label information is used for representing a vector sample corresponding to the problem sample.
The embodiment of the invention also provides a vector database construction device, which comprises:
the dividing unit is used for dividing the document based on event integrity to obtain at least one document block;
an obtaining unit, configured to obtain, for each document block, at least one preset problem corresponding to the document block;
the splicing unit is used for splicing all preset problems corresponding to the Wen Dangkuai and the document blocks to obtain new document blocks;
the segmentation unit is used for segmenting the new document block based on a preset semantic segmenter to obtain at least one information block corresponding to the new document block;
A second conversion unit configured to convert each of the information blocks into an information block vector, respectively;
a first construction unit for constructing the vector database based on each of the information block vectors.
Fig. 6 is a schematic physical structure of an electronic device according to an embodiment of the present invention, as shown in fig. 6, the electronic device may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. Processor 610 may invoke logic instructions in memory 630 to perform a knowledge question-answering method that includes: converting the questions to be answered into question vectors;
determining a target information block vector matched with the problem vector in a vector database, wherein at least one information block vector is stored in the vector database; the information block vector is determined based on the document block; the file blocks are obtained by dividing the files based on event integrity;
determining a target Wen Dangkuai corresponding to the target information block vector;
determining target knowledge corresponding to the to-be-answered question based on the target document block;
And outputting the target knowledge.
Alternatively, the processor 610 may invoke logic instructions in the memory 630 to perform a vector database construction method comprising:
dividing the document based on event integrity to obtain at least one document block;
aiming at each document block, acquiring at least one preset problem corresponding to the document block;
splicing all preset problems corresponding to the Wen Dangkuai and the document blocks to obtain new document blocks;
dividing the new document block based on a preset semantic division character to obtain at least one information block corresponding to the new document block;
converting each information block into an information block vector;
the vector database is constructed based on each of the information block vectors.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute the knowledge question-answering method provided by the above methods, and the method includes: converting the questions to be answered into question vectors;
determining a target information block vector matched with the problem vector in a vector database, wherein at least one information block vector is stored in the vector database; the information block vector is determined based on the document block; the file blocks are obtained by dividing the files based on event integrity;
determining a target Wen Dangkuai corresponding to the target information block vector;
determining target knowledge corresponding to the to-be-answered question based on the target document block;
and outputting the target knowledge.
Alternatively, when the program instructions are executed by a computer, the computer may implement the method of:
dividing the document based on event integrity to obtain at least one document block;
aiming at each document block, acquiring at least one preset problem corresponding to the document block;
Splicing all preset problems corresponding to the Wen Dangkuai and the document blocks to obtain new document blocks;
dividing the new document block based on a preset semantic division character to obtain at least one information block corresponding to the new document block;
converting each information block into an information block vector;
the vector database is constructed based on each of the information block vectors.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the knowledge question-answering method provided by the above methods, the method comprising: converting the questions to be answered into question vectors;
determining a target information block vector matched with the problem vector in a vector database, wherein at least one information block vector is stored in the vector database; the information block vector is determined based on the document block; the file blocks are obtained by dividing the files based on event integrity;
determining a target Wen Dangkuai corresponding to the target information block vector;
determining target knowledge corresponding to the to-be-answered question based on the target document block;
And outputting the target knowledge.
Alternatively, the computer program when executed by a processor implements the method of:
dividing the document based on event integrity to obtain at least one document block;
aiming at each document block, acquiring at least one preset problem corresponding to the document block;
splicing all preset problems corresponding to the Wen Dangkuai and the document blocks to obtain new document blocks;
dividing the new document block based on a preset semantic division character to obtain at least one information block corresponding to the new document block;
converting each information block into an information block vector;
the vector database is constructed based on each of the information block vectors.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (13)

1. A knowledge question-answering method, comprising:
converting the questions to be answered into question vectors;
determining a target information block vector matched with the problem vector in a vector database, wherein at least one information block vector is stored in the vector database; the information block vector is determined based on the document block; the file blocks are obtained by dividing the files based on event integrity;
determining a target Wen Dangkuai corresponding to the target information block vector;
determining target knowledge corresponding to the to-be-answered question based on the target document block;
and outputting the target knowledge.
2. The knowledge question-answering method according to claim 1, wherein the vector database further stores therein an identification of a document block to which each of the information block vectors belongs;
the determining the target document block corresponding to the target information block vector comprises the following steps:
determining the identification of a target document block to which the target information block vector belongs in the vector database;
determining a target Wen Dangkuai in a document database corresponding to the identity of the target document block; the corresponding relation between the document block and the identification of the document block is stored in the document database.
3. The knowledge question-answering method according to claim 2, wherein the method further comprises:
aiming at each document block, acquiring at least one preset problem corresponding to the document block;
splicing all preset problems corresponding to the Wen Dangkuai and the document blocks to obtain new document blocks;
dividing the new document block based on a preset semantic division character to obtain at least one information block corresponding to the new document block;
converting each information block into an information block vector;
and constructing the vector database based on the information block vectors and the identification of the document blocks to which the information block vectors belong.
4. The knowledge question-answering method according to claim 2, wherein the method further comprises:
carrying out semantic analysis on each sentence in the document to obtain a semantic analysis result corresponding to each sentence;
determining at least one target sentence which characterizes the same event integrity based on each semantic analysis result;
dividing the at least one target sentence into a document block;
the document database is constructed based on the Wen Dangkuai and the identification of the document blocks.
5. The knowledge question-answering method according to any one of claims 1-4, wherein the determining, based on the target document block, target knowledge corresponding to the question to be answered includes:
Inputting the target document block and the questions to be answered into a large language model, and carrying out semantic understanding on the target document block based on the questions to be answered through the large language model to obtain target knowledge corresponding to the questions to be answered.
6. The knowledge question-answering method according to claim 3, wherein the determining in a vector database a target information block vector that matches the problem vector comprises:
determining the similarity between the problem vector and each information block vector in the vector database;
outputting information blocks corresponding to the first information block vectors under the condition that the number of the first information block vectors corresponding to the similarity larger than the preset similarity is at least two;
receiving a selection instruction input by a user and used for selecting a target information block;
and responding to the selection instruction, and determining a first information block vector corresponding to the target information block as the target information block vector.
7. The knowledge question-answering method according to any one of claims 1-4, wherein the converting a question to be answered into a question vector comprises:
inputting the questions to be answered into a vector conversion model to obtain the question vectors output by the vector conversion model; the vector conversion model is obtained by training an initial vector conversion model based on a problem sample and label information of the problem sample, wherein the label information is used for representing a vector sample corresponding to the problem sample.
8. A method of vector database construction, comprising:
dividing the document based on the event integrity to obtain at least one document block;
aiming at each document block, acquiring at least one preset problem corresponding to the document block;
splicing all preset problems corresponding to the Wen Dangkuai and the document blocks to obtain new document blocks;
dividing the new document block based on a preset semantic division character to obtain at least one information block corresponding to the new document block;
converting each information block into an information block vector;
the vector database is constructed based on each of the information block vectors.
9. A knowledge question-answering apparatus, comprising:
the first conversion unit is used for converting the questions to be answered into question vectors;
a first determining unit, configured to determine a target information block vector matched with the problem vector in a vector database, where at least one information block vector is stored in the vector database; the information block vector is determined based on the document block; the file blocks are obtained by dividing the files based on event integrity;
a second determining unit, configured to determine a target Wen Dangkuai corresponding to the target information block vector;
The third determining unit is used for determining target knowledge corresponding to the to-be-answered question based on the target document block;
and the output unit is used for outputting the target knowledge.
10. A vector database construction apparatus, comprising:
the dividing unit is used for dividing the document based on the event integrity to obtain at least one document block;
an obtaining unit, configured to obtain, for each document block, at least one preset problem corresponding to the document block;
the splicing unit is used for splicing all preset problems corresponding to the Wen Dangkuai and the document blocks to obtain new document blocks;
the segmentation unit is used for segmenting the new document block based on a preset semantic segmenter to obtain at least one information block corresponding to the new document block;
a second conversion unit configured to convert each of the information blocks into an information block vector, respectively;
a first construction unit for constructing the vector database based on each of the information block vectors.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the knowledge question-answering method according to any one of claims 1 to 7, or the vector database construction method according to claim 8, when the program is executed by the processor.
12. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the knowledge question-answering method according to any one of claims 1 to 7, or implements the vector database construction method according to claim 8.
13. A computer program product comprising a computer program which, when executed by a processor, implements the knowledge question-answering method according to any one of claims 1 to 7, or implements the vector database construction method according to claim 8.
CN202311566499.8A 2023-11-22 2023-11-22 Knowledge question-answering method, vector database construction method and device Pending CN117556057A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311566499.8A CN117556057A (en) 2023-11-22 2023-11-22 Knowledge question-answering method, vector database construction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311566499.8A CN117556057A (en) 2023-11-22 2023-11-22 Knowledge question-answering method, vector database construction method and device

Publications (1)

Publication Number Publication Date
CN117556057A true CN117556057A (en) 2024-02-13

Family

ID=89812252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311566499.8A Pending CN117556057A (en) 2023-11-22 2023-11-22 Knowledge question-answering method, vector database construction method and device

Country Status (1)

Country Link
CN (1) CN117556057A (en)

Similar Documents

Publication Publication Date Title
WO2020147238A1 (en) Keyword determination method, automatic scoring method, apparatus and device, and medium
CN112487139B (en) Text-based automatic question setting method and device and computer equipment
CN109408821B (en) Corpus generation method and device, computing equipment and storage medium
CN114757176B (en) Method for acquiring target intention recognition model and intention recognition method
CN111309887B (en) Method and system for training text key content extraction model
CN112417158A (en) Training method, classification method, device and equipment of text data classification model
CN116737908A (en) Knowledge question-answering method, device, equipment and storage medium
CN112131881A (en) Information extraction method and device, electronic equipment and storage medium
CN112182186A (en) Intelligent customer service operation method, device and system
CN112395887A (en) Dialogue response method, dialogue response device, computer equipment and storage medium
CN109408175B (en) Real-time interaction method and system in general high-performance deep learning calculation engine
CN114186040A (en) Operation method of intelligent robot customer service
CN110377706B (en) Search sentence mining method and device based on deep learning
CN116701604A (en) Question and answer corpus construction method and device, question and answer method, equipment and medium
CN110765241A (en) Super-outline detection method and device for recommendation questions, electronic equipment and storage medium
CN115934904A (en) Text processing method and device
CN112860873B (en) Intelligent response method, device and storage medium
CN117556057A (en) Knowledge question-answering method, vector database construction method and device
CN115114404A (en) Question and answer method and device for intelligent customer service, electronic equipment and computer storage medium
CN114186041A (en) Answer output method
CN117453895B (en) Intelligent customer service response method, device, equipment and readable storage medium
CN111309990A (en) Statement response method and device
CN115147131A (en) Training method of dialogue behavior classification model, and dialogue log processing method and device
CN117763099A (en) Interaction method and device of intelligent customer service system
CN114757198A (en) Similar method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination