CN113254612A - Knowledge question-answering processing method, device, equipment and storage medium - Google Patents

Knowledge question-answering processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN113254612A
CN113254612A CN202110565939.2A CN202110565939A CN113254612A CN 113254612 A CN113254612 A CN 113254612A CN 202110565939 A CN202110565939 A CN 202110565939A CN 113254612 A CN113254612 A CN 113254612A
Authority
CN
China
Prior art keywords
question
answer
statement information
category
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110565939.2A
Other languages
Chinese (zh)
Inventor
孙泽烨
李炫�
陈思姣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202110565939.2A priority Critical patent/CN113254612A/en
Publication of CN113254612A publication Critical patent/CN113254612A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for processing a knowledge question and answer, relates to the technical field of language processing, and mainly aims to solve the problem of low efficiency of processing the conventional unconventional knowledge question and answer. The method comprises the following steps: acquiring first-class statement information to be subjected to knowledge question answering, and analyzing a data source of the first-class statement information; matching a unified language model which is trained according to the analyzed data sources, wherein the unified language model is obtained by training according to second category statement information in different data sources, and the second category statement information has a replacement relation with the first category statement information; and performing question-answer processing on the first category statement information according to the matched unified language model to generate question-answer information of the first category statement information. The method is mainly used for knowledge question answering processing.

Description

Knowledge question-answering processing method, device, equipment and storage medium
Technical Field
The present invention relates to the field of language processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing a knowledge question and answer.
Background
With the rapid development of natural language technology, FAQ (Frequently assigned Questions) based on intelligent common problem solving has become intelligent, and more enterprises use FAQ question-answering systems to solve various problems of online users without humanization. The method aims at solving the common problems in the FAQ question-answering system, belongs to the problems which are frequently proposed by users, and the problems are solved perfectly and accurately by methods such as big data processing and machine learning, but the solving accuracy is still lower aiming at the non-common problems in the FAQ question-answering system.
At present, the identification of the abnormal question-answer sentences is generally manually written by collecting the question-answer sentences and feeding the collected question-answer sentences back to a system background, but the problems are too large in occurrence amount, a large amount of human resources are consumed, and the manual writing is difficult to cover all the question-answers, so that the efficiency of processing knowledge question-answers is influenced.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a device and a storage medium for processing an abnormal knowledge question and answer, and mainly aims to solve the problem of low processing efficiency of the conventional abnormal knowledge question and answer.
According to an aspect of the present invention, there is provided a knowledge question-answer processing method, including:
acquiring first-class statement information to be subjected to knowledge question answering, and analyzing a data source of the first-class statement information;
matching a unified language model which is trained according to the analyzed data sources, wherein the unified language model is obtained by training according to second category statement information in different data sources, and the second category statement information has a replacement relation with the first category statement information;
and performing question-answer processing on the first category statement information according to the matched unified language model to generate question-answer information of the first category statement information.
According to another aspect of the present invention, there is provided a knowledge question-answering processing apparatus including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring first-class statement information to be subjected to knowledge question answering processing and analyzing a data source of the first-class statement information;
the matching module is used for matching a trained unified language model according to the analyzed data sources, the unified language model is obtained by training according to second category statement information in different data sources, and the second category statement information and the first category statement information have a replacement relationship;
and the processing module is used for performing question-answer processing on the first category statement information according to the matched unified language model to generate question-answer information of the first category statement information.
According to another aspect of the present invention, there is provided a storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the above-mentioned question-answer processing method.
According to still another aspect of the present invention, there is provided a computer apparatus including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the knowledge question-answering processing method.
By the technical scheme, the technical scheme provided by the embodiment of the invention at least has the following advantages:
compared with the prior art, the embodiment of the invention obtains the first category statement information to be subjected to the knowledge question answering processing and analyzes the data source of the first category statement information; matching a unified language model which is trained according to the analyzed data sources, wherein the unified language model is obtained by training according to second category statement information in different data sources, and the second category statement information has a replacement relation with the first category statement information; and performing question-answering processing on the first category statement information according to the matched uniform language model to generate the question-answering information of the first category statement information, so that the purpose of realizing the full coverage of question-answering aiming at the unconventional statement information is realized, the manpower and material resources for identifying the question-answering statements are greatly reduced, and the processing efficiency of knowledge question-answering is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart of a knowledge question-answering processing method according to an embodiment of the present invention;
FIG. 2 is a UNILM model network construction diagram provided by the embodiment of the present invention;
FIG. 3 is a block diagram showing a knowledge question answering processing apparatus according to an embodiment of the present invention;
fig. 4 shows a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the invention provides a knowledge question-answering processing method, as shown in figure 1, the method comprises the following steps:
101. acquiring first-class statement information to be subjected to knowledge question answering, and analyzing a data source of the first-class statement information.
In the embodiment of the invention, the first type of statement information to be subjected to knowledge question answering is a statement subjected to one-time conventional question answering identification, that is, the first category statement information is statement information in which the question and answer information is not found in the preset question and answer library, the second category statement information is statement information in which the question and answer information has been found in the preset question and answer library, a large amount of question and answer information corresponding to different statement information obtained by operation based on a trained unified language model is stored in the preset question and answer library, specifically, the first category statement information is a regular question and answer statement corresponding to a pair which can be identified (question and answer), the unidentified question-answer sentences can be directly used as unconventional question-answer sentences to identify question-answer information based on the established knowledge question-answer base, therefore, the unconventional question-and-answer sentences are the first category sentence information, and the conventional question-and-answer sentences are the second category sentence information. The data source is used for representing a data storage source of the first category statement information to be identified, and includes but is not limited to a question and answer knowledge base, a knowledge graph base, a product list database, and a product clause database, and a specific analysis method may be determined based on a storage path of the first category statement information, and the like, which is not specifically limited in the embodiment of the present invention.
It should be noted that the knowledge question and answer in the embodiment of the present invention may be applicable to a product transaction application program, or may be applicable to an application scenario with different requirements, such as a web question and answer, and the knowledge question and answer processing in the embodiment of the present invention may be performed on any sentence that cannot be identified based on the common sense knowledge question and answer. The conventional question-answer sentences are a large number of question-answer sentences corresponding to (question, answer) pairs obtained based on big data analysis, the unconventional question-answer sentences are a large number of question-answer sentences which are not accurately processed yet, and the conventional question-answer sentences and the unconventional question-answer sentences are consultation questions which are generated by users who are habitually questioned on different services and products, and the embodiment of the invention is not specifically limited.
102. And matching the trained unified language model according to the analyzed data source.
Aiming at the characteristics that different data sources correspond to different data storage structures, in the embodiment of the invention, the corresponding unified language models are trained according to the training sample sets in the different data sources, wherein the unified language models are obtained by training according to the second type statement information in the different data sources and indicate that the different data sources correspond to the different unified language models, so that the corresponding unified language models can be directly matched according to the analyzed specific data sources. In addition, the second category statement information is the conventional question and answer statements, so that a unified language model which is trained on the basis of the conventional question and answer statements is used as a replacement model for performing question and answer recognition on the conventional question and answer statements. Specifically, the second category statement information has a replacement relationship with the first category statement information, and the replacement relationship is used to represent a relationship between the first category statement information as an input parameter of the unified language model and the second statement information for replacement, that is, the first category statement information is processed as an input parameter of the unified language model corresponding to the second category statement information, so as to obtain question and answer information of the first category statement information.
It should be noted that, because the first category statement information to be subjected to the question and answer knowledge processing may be stored in different data source databases, or may be expected to be stored in the data source databases corresponding to different data sources, when analyzing the data source, the first category statement information may be identified based on a stored path or an expected stored path, which is not specifically limited in the embodiment of the present invention.
103. And performing question-answer processing on the first category statement information according to the matched unified language model to generate question-answer information of the first category statement information.
In the embodiment of the invention, when the data source based on the first category statement information is matched with the unified language model, model processing is carried out based on the first category statement as the input parameter of the unified language model, so that the question and answer information corresponding to the first category statement information is obtained, and the unified language model which is correspondingly trained by the second category statement information is used as the first category statement information processing and identifying model, so that the expansion of a knowledge question and answer base is realized, the coverage rate of unconventional question and answer is greatly improved, and a large amount of manpower and material resources are saved.
It should be noted that the question-answer sentences in the embodiment of the present invention include sentence information for identifying the question and the answer, and therefore, the question-answer information obtained by processing based on the unified language model includes the (question, answer) pair corresponding to the sentence information.
In an embodiment of the present invention, in order to achieve the answer coverage rate of the irregular questions and answers based on the conventional question and answer library, the data sources include a question and answer knowledge base, a knowledge chart base, a product list database, and a product clause database, and before the trained unified language model is matched according to the parsed data sources, the method further includes: respectively acquiring a second category statement information training sample set in the question and answer knowledge base, the knowledge chart base, the product list database and the product clause database; training a unified language model which is constructed by a language network by using the second category statement information training sample set to obtain the unified language model which is respectively suitable for the question-answer knowledge base, the knowledge map base, the product list database and the product clause database and completes training; and establishing a replacement link between the input parameters of the unified language model and the first category statement information.
Specifically, in combination with the embodiment of the present invention, the data sources at least include a question and answer knowledge base, a knowledge graph base, a product list database, and a product clause database, and in order to enable the first classification statement information to be directly replaced by the input parameters of the unified language model that completes training corresponding to the second classification statement information, the training of the unified language model needs to be performed in advance. The question and answer knowledge base, the knowledge map base, the product list database and the product clause data respectively store training sample sets matched with second category statement information, namely the second category statement information training sample sets. Then, a unified Language Model (UNI □ ed pre-trained Language Model, UNILM) is trained by using a second category statement information training sample set of different data sources, a Language network is firstly constructed, as shown in fig. 2, and then Model training is performed by using the training sample set. After the training is finished, in order to conveniently replace the first-class statement information as an input parameter, a replacement link between the input parameter of the unified language model and the first-class statement information is established, so that the model which finishes the training in different data sources is directly utilized to process when the first-class statement information is identified.
It should be noted that the training process of the UNILM model includes: 1. defining a model function loss function, e.g., cross entropy; 2. updating the model parameters by performing gradient descent on the loss function; 3. when the loss function is less than the threshold, the model training is complete. Wherein, the structure of the model function comprises: and (2) obtaining final model output from the bottom to the top through an embedding layer- > a transmission layer- > an output layer, wherein all input is in a text sentence form, the text is divided into words and converted into temporary identification tokens of word vectors (each token corresponds to one word or punctuation mark), then the embedding layer maps the tokens into vectors, and the output layer calculates output text (question and answer) pairs (each token of the output text answer pairs is the token with the maximum probability value) through calculation of the transmission layer. In addition, the model input parameter X is a string of text sequence, which may be a text segment or a pair of text segments, and in the training process, the main network structure is composed of 24 layers of transformers, and the input vector { xi } is converted into H0 [ X1., X | X | ], and is transmitted to the 24 layers of Transformer networks, so as to perform model training.
In addition, in the embodiment of the present invention, since the output question-answer information is the content of the question-answer pair format, when the UNILM model is trained, the output format is set in advance as the question-answer pair format, so that the obtained question-answer information exists in the question-answer pair format regardless of whether the input parameter is a single word, a single sentence, or a text paragraph, so as to improve the recognition coverage of the question-answer information.
In an embodiment of the present invention, to further define and explain, in order to implement that first category statement information of different data sources may be applicable to a corresponding trained unified language model, so as to implement a wide coverage effect of irregular knowledge question-answering recognition, performing question-answering processing on the first category statement information according to the matched unified language model includes: if the data source is a question-answer knowledge base, performing model operation processing by using the first category statement information as an input parameter of the unified language model according to the replacement link; if the data source is a knowledge graph library, splitting the first category statement information according to a triple form of a knowledge graph, and performing model operation processing by taking a split statement subject and a statement predicate as input parameters of the unified language model according to the replacement link; if the data source is a product list database, extracting the list structured data of the first category statement from the product list database, and performing model operation processing by using the extracted list structured data as the input parameters of the unified language model according to the replacement link; and if the data source is a product clause database, extracting clause structured data of the first category statement from the product clause database, and performing model operation processing by using the extracted clause structured data as an input parameter of the unified language model according to the replacement link.
In the embodiment of the invention, for a specific question-answer knowledge base, such as an FAQ question-answer knowledge base, if the data source is the question-answer knowledge base, the question-answer processing is carried out by taking the first category statement information as the input parameters of the unified language model according to the alternative link. Specifically, since the FAQ question-answer repository is a database obtained by developers who have completed the question-answer recognition processing of regular question-answer sentences, and stores a large number of entity names, questions, and question-answer repositories corresponding to output (question, answer) pairs as inputs, when performing the recognition processing, the first category sentence information is taken as an input parameter of a unified language model trained by using a training sample set of the second category sentence information according to a replacement link, and is transported to obtain the question-answer information. For example, the UNILM model is trained by using the sentence contents such as "prosperous insurance feature" in the training sample set of the second category of sentence information, and after the data source of the first category of sentence information is determined as the FAQ question and answer knowledge base, the first category of sentence information is used as the "what is the selling point of jinriy life" instead of the input parameter of the UNILM model for operation processing, so as to obtain the question and answer information. It should be noted that most of the sentence information stored in the FAQ question and answer knowledge base is applicable to general questions in application scenarios such as insurance products and selection of insurance products, for example, # how to settle a claim for a product? What characteristics do the product? And the like, so that the question and answer pairs which accord with claims, characteristics and the like are obtained after relevant input is replaced, and the question and answer coverage efficiency is improved.
In the embodiment of the present invention, if the data source is a knowledge graph library, that is, for a specific knowledge graph library, since data in the knowledge graph library is stored according to a (S, P, O) triple format, where S represents a subject, P represents a predicate, and O represents an object, for example: the 'peaceful waiting period is 90 days' and is stored in a format (peaceful, waiting period, 90 days) in a knowledge graph library, so that training is completed by taking a triple form as an input sample when a UNILM model is trained, correspondingly, the first category statement information is split according to the triple form of the knowledge graph, and a subject and a predicate of the split statement are taken as input parameters of the unified language model to perform question-answering processing according to the replacement link. Specifically, the first classification statement information is split in a triple storage structure of the knowledge graph, a subject obtained by splitting is an entity, a predicate is an attribute, and then the entity and the attribute obtained by splitting are used as input parameters of the UNILM model to perform operation, so that question and answer identification of the first classification statement information is completed. It should be noted that most of the sentence information stored in the knowledge map library is suitable for the question and answer of the existing product and the identification of the existing triple question and answer, including but not limited to the questions of the product information class and the insurance class, such as the identification of the question and answer pair such as the grace period of peaceful.
In the embodiment of the invention, if the data source is the product list database, namely the specific product list database, and the data stored in the product list database is a pdf-form list, and the stored sentences are all structured, the list structured data of the first category of sentences is extracted from the product list database, and the extracted list structured data is used as the input parameters of the unified language model for question and answer processing according to the alternative link. Specifically, structured data in the pdf list is extracted based on the pipeline technology, that is, table contents are converted into structured statement contents, and after data structuring is completed, the obtained list structured data is used as input parameters of a UNILM model to perform operation, so that question and answer recognition of first category statement information is completed. The trained UNILM model is trained after structured data extraction is carried out on the basis of the second category statement information, so that the purpose of replacing the first category statement information can be achieved, when the first category statement information is replaced, the list head of the structured data needs to be traversed, namely the table head serves as an input parameter, and the table content is output obtained by the model in advance under the corresponding condition.
Since the table content and the header in the list structured data may be one sentence or a plurality of sentences (text paragraphs), in order to extract entities and intents as model input parameters, the semantic dependency analyzer is used to process the sentence information of the first category. For example, as the first classification statement information of the user question and answer is "how much money is bought by boys in the age of 16", the "peace life" is extracted as an entity based on the entity extraction module in the semantic dependency analyzer, and the intention of "premium counseling" is identified, and the semantic dependency analyzer can further analyze two constraint conditions of "age of 16" and "boys", so that the entity and the intention are used as model input parameters, and the output parameters are determined to be "144 yuan to the age of 80 and 160 yuan to the age of 100" by combining 2 constraint conditions.
In the embodiment of the present invention, if the data source is a product clause database, that is, for a specific product clause database, since data stored in the product clause database is in a pdf format and stored statements are structured, clause structured data of a first category of statements is extracted from the product clause database, and the extracted clause structured data is used as an input parameter of a unified language model for question and answer processing according to a replacement link. Specifically, structured data in a product clause data pdf file is extracted based on a pipeline technology, namely clause content is converted into structured statement content, and after data structuring is completed, the obtained clause structured data is used as input parameters of a UNILM model to be operated, so that question and answer identification of first category statement information is completed. The trained UNILM model is trained after structured data is extracted based on the second category statement information, so that the purpose of replacing the first category statement information can be achieved, when replacement is performed based on a replacement link, each paragraph text of the structured data needs to be traversed, model operation is performed by combining the recognized intention as an input parameter of model training, and question and answer recognition of the first category statement information is completed. In addition, the first category statement information is product clause data, and by combining the characteristic that the data content in the product clause data is large paragraph text content, after the structural processing of the product clause data is completed, the content realized based on model operation is the answer extracted and generated from the paragraph text according to the identified intention. For example, the large text content in the product clause is "after we have received the insurance payment application book and written the above-mentioned related certification material.,. means the value of the insurance policy.. since you signed off the next day of the main insurance contract, there is a 20-day hesitation period.", after structured processing, formatted text contents such as "the loan amount should not exceed.", "the loan period should not exceed 6 months at the longest each time, and" the hesitation period is 20 days "are obtained, and then model operation is performed as model input to obtain question and answer information.
In an embodiment of the present invention, for further definition and explanation, the parsing the data source of the first category statement information includes: acquiring a storage path of the first category statement information, and analyzing a storage position in the storage path; and determining the data source of the first category according to the database corresponding to the storage position matched with at least one data source.
In the embodiment of the present invention, because different data sources are data sources stored according to different data formats, when analyzing a data source, a storage path of first category statement information is obtained, where the storage path represents a database in which data of the first category statement information can be stored or a system path corresponding to a database to be stored. Specifically, the storage location of the storage path is analyzed, that is, the content of the character string, the code identifier, and the like belonging to the storage location in the storage path is screened, so that the data source is determined according to the storage location.
In addition, because different data sources include at least one of the question and answer knowledge base, the knowledge graph base, the product list database and the product clause database, one data can be stored in the question and answer knowledge base or the knowledge graph base, that is, a plurality of storage paths can be stored, in the process of analyzing the data sources, if a plurality of data sources are analyzed based on the storage paths, such as the question and answer knowledge graph base, the product list database and the product clause database all store the statement information, the data sources are determined according to the preset priority of the question and answer knowledge base, the knowledge graph base, the product list database and the product clause database.
In an embodiment of the present invention, in order to effectively cover all question-answering sentences and improve the answering efficiency of intellectual question-answering, before acquiring the first category sentence information to be subjected to the intellectual question-answering, the method further includes; collecting at least one statement information of a request knowledge question and answer; searching whether question-answer information of the statement information exists in a question-answer library corresponding to the second category statement information; if the statement information does not exist, determining the statement information as first-class statement information, and performing knowledge question-answering processing; and if so, determining that the searched question and answer information is the question and answer information of the statement.
In the embodiment of the invention, the conventional question-answer sentences are question-answer pairs which are based on standard answers obtained after a large amount of data processing and sentence recognition, so that when a user requests to perform knowledge question-answer, whether the user requests the knowledge question-answer sentences is judged firstly, namely at least one sentence information requesting the knowledge question-answer is collected, and the sentence information represents template type question-answer sentences input or selected by the user to answer. Then, whether question-answer information matched with the statement information exists is searched in a question-answer library established by the second category statement information, if yes, the statement information is a regular question-answer statement, if not, the statement is an irregular question-answer statement, and the statement information is determined as the first category statement information, so that the identification method in the steps 101 to 103 is carried out.
In an embodiment of the present invention, for further limitation and description, the searching whether the question-answer information of the first category statement information exists in the question-answer library corresponding to the second category statement information includes: respectively extracting semantic terms in the first category of sentences and the second category of sentences, and calculating the similarity between the semantic terms; and judging whether the question-answer information of the second category statement information is suitable for the knowledge question-answer of the first category statement information according to the similarity.
Since the second category statement information is a conventional question and answer statement, and all the pairs (question, answer) stored in the corresponding question and answer library are matched with the conventional question and answer statement, the specific method for searching the pairs (question, answer) corresponding to the first category statement information in the question and answer library can be based on the similarity. Specifically, semantic words corresponding to the first category statement information and the second category statement information, including subject words, predicate words, object words, and the like, may be analyzed based on a natural language processing technique, and then similarities between the semantic words, that is, similarities between the subject words in the first category statement information and the subject words in the second category statement information, similarities between the predicate words in the first category statement information and the object words in the second category statement information, and similarities between the object words in the first category statement information and the object words in the second category statement information, may be calculated. Whether the question-answer information of the second category statement information is suitable for the knowledge question-answer of the first category statement information is judged based on the three calculated similarities, namely a similarity threshold value is preset according to the similarity of word meanings, if any two of the three similarities exceed the similarity threshold value, the question-answer information of the second category statement information is determined to be suitable for the knowledge question-answer of the first category statement information, and the question-answer information corresponding to the second category statement information determined by similarity matching in a question-answer library can be used as the question-answer information of the first category statement information to finish recognition.
In an embodiment of the present invention, in order to optimize the unified language model to improve the recognition accuracy of the question-answering sentence, the method further includes: receiving a question-answer feedback result obtained according to the output question-answer information, wherein the question-answer feedback result is used for representing the answer satisfaction degree of the question-answer information; and determining to convert the first category statement information into the second category statement information according to the question and answer feedback result, and determining to update the question and answer information to a second category statement information training sample set for model updating.
Specifically, after the first category statement information is identified based on the unified language model, the obtained question and answer information is fed back to the user, the user determines whether the answer is the answer desired to be obtained based on the question and answer information, and if the answer is the answer obtained by the user, the question and answer feedback result fed back by the user is information which represents the satisfaction degree such as satisfaction, dissatisfaction, acceptability, unacceptability and the like, namely the question and answer feedback result is used for representing the answer satisfaction degree of the question and answer information. And after receiving the question-answer feedback result, the current execution end determines whether to convert the first category statement information into the second category statement information according to the question-answer feedback result, and if the question-answer feedback result is satisfied, the question-answer information identified by the first category statement information is correct, so that the first category statement information of the unconventional question-answer can be converted into the second category statement information of the conventional question-answer. In addition, in order to improve the training efficiency of the model, because the question and answer information is the corresponding answer of the first category statement information, the question and answer information is updated to the second category statement information training sample set at the same time of conversion, so that when the model training is carried out again, the model training efficiency is improved based on the updated training set.
Compared with the prior art, the embodiment of the invention provides a knowledge question and answer processing method, which comprises the steps of obtaining first type statement information to be subjected to knowledge question and answer processing, and analyzing a data source of the first type statement information; matching a unified language model which is trained according to the analyzed data sources, wherein the unified language model is obtained by training according to second category statement information in different data sources, and the second category statement information has a replacement relation with the first category statement information; and performing question-answering processing on the first category statement information according to the matched uniform language model to generate the question-answering information of the first category statement information, so that the purpose of realizing the full coverage of question-answering aiming at the unconventional statement information is realized, the manpower and material resources for identifying the question-answering statements are greatly reduced, and the processing efficiency of knowledge question-answering is improved.
Further, as an implementation of the method shown in fig. 1, an embodiment of the present invention provides a knowledge question and answer processing apparatus, as shown in fig. 3, the apparatus includes:
the acquiring module 21 is configured to acquire first category statement information to be subjected to a knowledge question answering process, and analyze a data source of the first category statement information;
the matching module 22 is configured to match a unified language model that is trained according to an analyzed data source, where the unified language model is obtained by training according to second category statement information in different data sources, and the second category statement information has a replacement relationship with the first category statement information;
and the processing module 23 is configured to perform question-answering processing on the first category statement information according to the matched unified language model, and generate question-answering information of the first category statement information.
Further, the first category statement information is statement information for which the question and answer information is not found in a preset question and answer library, the second category statement information is statement information for which the question and answer information is found in the preset question and answer library, the data source includes a question and answer knowledge base, a knowledge map base, a product list database and a product clause database, and the apparatus further includes: a training module, an establishing module,
the acquisition module is further configured to acquire a second category statement information training sample set in the question and answer knowledge base, the knowledge map base, the product list database, and the product clause database, respectively;
the training module is used for training the unified language model which is constructed by the language network by using the second category statement information training sample set to obtain the unified language model which is respectively suitable for the question-answer knowledge base, the knowledge map library, the product list database and the product clause database and completes training;
the establishing module is configured to establish a replacement link between the input parameter of the unified language model and the first category statement information to determine a replacement relationship between the first category statement information and the second category statement information, where the replacement relationship is used to represent that the first category statement information is used as a relationship for replacing the input parameter of the unified language model with the second statement information.
Further, the air conditioner is provided with a fan,
the processing module is specifically configured to perform model operation processing on the first category statement information as an input parameter of the unified language model according to the replacement link if the data source is a question-answer knowledge base;
the processing module is specifically configured to split the first category statement information according to a triple form of a knowledge graph if the data source is a knowledge graph library, and perform model operation processing by using a split statement subject and a statement predicate as input parameters of the unified language model according to the replacement link;
the processing module is specifically configured to, if the data source is a product list database, extract the list structured data of the first category statement from the product list database, and perform model operation processing by using the extracted list structured data as an input parameter of the unified language model according to the replacement link;
the processing module is specifically configured to, if the data source is a product clause database, extract clause structured data of the first category statement from the product clause database, and perform model operation processing using the extracted clause structured data as an input parameter of the unified language model according to the replacement link.
Further, the obtaining module comprises:
the acquisition unit is used for acquiring a storage path of the first category statement information and analyzing a storage position in the storage path;
and the determining unit is used for determining the data source of the first category according to the database corresponding to the storage position matched with at least one data source.
Further, the apparatus further comprises:
the acquisition module is used for acquiring at least one statement information requesting knowledge question answering;
the searching module is used for searching whether the question-answer information of the statement information exists in a question-answer library corresponding to the second category statement information;
the first determining module is used for determining the statement information as first-class statement information and performing knowledge question-answering processing if the statement information does not exist;
and the second determining module is used for determining that the searched question and answer information is the question and answer information of the statement if the searched question and answer information exists.
Further, the lookup module includes:
the extraction unit is used for respectively extracting semantic terms in the first category of sentences and the second category of sentences and calculating the similarity between the semantic terms;
and the judging unit is used for judging whether the question-answer information of the second category statement information is suitable for the knowledge question-answer of the first category statement information according to the similarity.
Further, the apparatus further comprises:
the receiving module is used for receiving a question and answer feedback result obtained according to the output question and answer information, and the question and answer feedback result is used for representing the answer satisfaction degree of the question and answer information;
and the judging module is used for determining to convert the first category statement information into the second category statement information according to the question and answer feedback result, and determining to update the question and answer information to a second category statement information training sample set so as to update the model.
Compared with the prior art, the embodiment of the invention provides a knowledge question and answer processing device, which is characterized in that a first type of statement information to be subjected to knowledge question and answer processing is obtained, and a data source of the first type of statement information is analyzed; matching a unified language model which is trained according to the analyzed data sources, wherein the unified language model is obtained by training according to second category statement information in different data sources, and the second category statement information has a replacement relation with the first category statement information; and performing question-answering processing on the first category statement information according to the matched uniform language model to generate the question-answering information of the first category statement information, so that the purpose of realizing the full coverage of question-answering aiming at the unconventional statement information is realized, the manpower and material resources for identifying the question-answering statements are greatly reduced, and the processing efficiency of knowledge question-answering is improved.
According to an embodiment of the present invention, a storage medium is provided, wherein the storage medium stores at least one executable instruction, and the computer executable instruction can execute the method for processing the knowledge question and answer in any method embodiment.
Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computer device.
As shown in fig. 4, the computer apparatus may include: a processor (processor)302, a communication Interface 304, a memory 306, and a communication bus 308.
Wherein: the processor 302, communication interface 304, and memory 306 communicate with each other via a communication bus 308.
A communication interface 304 for communicating with network elements of other devices, such as clients or other servers.
The processor 302 is configured to execute the program 310, and may specifically execute the relevant steps in the above-mentioned embodiment of the knowledge question answering method.
In particular, program 310 may include program code comprising computer operating instructions.
The processor 302 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The computer device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 306 for storing a program 310. Memory 306 may comprise high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 310 may specifically be configured to cause the processor 302 to perform the following operations:
acquiring first-class statement information to be subjected to knowledge question answering, and analyzing a data source of the first-class statement information;
matching a unified language model which is trained according to the analyzed data sources, wherein the unified language model is obtained by training according to second category statement information in different data sources, and the second category statement information has a replacement relation with the first category statement information;
and performing question-answer processing on the first category statement information according to the matched unified language model to generate question-answer information of the first category statement information.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A knowledge question-answer processing method is characterized by comprising the following steps:
acquiring first-class statement information to be subjected to knowledge question answering, and analyzing a data source of the first-class statement information;
matching a unified language model which is trained according to the analyzed data sources, wherein the unified language model is obtained by training according to second category statement information in different data sources, and the second category statement information has a replacement relation with the first category statement information;
and performing question-answer processing on the first category statement information according to the matched unified language model to generate question-answer information of the first category statement information.
2. The method according to claim 1, wherein the first category statement information is statement information for which no question and answer information is found in a preset question and answer library, the second category statement information is statement information for which question and answer information is found in a preset question and answer library, the data sources include a question and answer knowledge base, a knowledge graph library, a product list database, and a product clause database, and before the matching of the trained unified language model according to the parsed data sources, the method further includes:
respectively acquiring a second category statement information training sample set in the question and answer knowledge base, the knowledge chart base, the product list database and the product clause database;
training a unified language model which is constructed by a language network by using the second category statement information training sample set to obtain the unified language model which is respectively suitable for the question-answer knowledge base, the knowledge map base, the product list database and the product clause database and completes training;
and establishing a replacement link between the input parameters of the unified language model and the first category statement information to determine a replacement relationship between the first category statement information and the second category statement information, wherein the replacement relationship is used for representing the relationship between the first category statement information as the input parameters of the unified language model and the second statement information for replacement.
3. The method according to claim 2, wherein the performing question-answering processing on the first category statement information according to the matched unified language model comprises:
if the data source is a question-answer knowledge base, performing model operation processing by using the first category statement information as an input parameter of the unified language model according to the replacement link;
if the data source is a knowledge graph library, splitting the first category statement information according to a triple form of a knowledge graph, and performing model operation processing by taking a split statement subject and a statement predicate as input parameters of the unified language model according to the replacement link;
if the data source is a product list database, extracting the list structured data of the first category statement from the product list database, and performing model operation processing by using the extracted list structured data as the input parameters of the unified language model according to the replacement link;
and if the data source is a product clause database, extracting clause structured data of the first category statement from the product clause database, and performing model operation processing by using the extracted clause structured data as an input parameter of the unified language model according to the replacement link.
4. The method of claim 1, wherein parsing the data source of the first category statement information comprises:
acquiring a storage path of the first category statement information, and analyzing a storage position in the storage path;
and determining the data source of the first category according to the database corresponding to the storage position matched with at least one data source.
5. The method according to claim 1, wherein before the obtaining of the first category statement information to be subjected to the knowledge question and answer processing, the method further comprises;
collecting at least one statement information of a request knowledge question and answer;
searching whether question-answer information of the statement information exists in a question-answer library corresponding to the second category statement information;
if the statement information does not exist, determining the statement information as first-class statement information, and performing knowledge question-answering processing;
and if so, determining that the searched question and answer information is the question and answer information of the statement.
6. The method according to claim 5, wherein the searching whether the question-answer information of the first category statement information exists in the question-answer library corresponding to the second category statement information comprises:
respectively extracting semantic terms in the first category of sentences and the second category of sentences, and calculating the similarity between the semantic terms;
and judging whether the question-answer information of the second category statement information is suitable for the knowledge question-answer of the first category statement information according to the similarity.
7. The method according to any one of claims 1-6, further comprising:
receiving a question-answer feedback result obtained according to the output question-answer information, wherein the question-answer feedback result is used for representing the answer satisfaction degree of the question-answer information;
and determining to convert the first category statement information into the second category statement information according to the question and answer feedback result, and determining to update the question and answer information to a second category statement information training sample set for model updating.
8. A question-answer processing apparatus characterized by comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring first-class statement information to be subjected to knowledge question answering processing and analyzing a data source of the first-class statement information;
the matching module is used for matching a trained unified language model according to the analyzed data sources, the unified language model is obtained by training according to second category statement information in different data sources, and the second category statement information and the first category statement information have a replacement relationship;
and the processing module is used for performing question-answer processing on the first category statement information according to the matched unified language model to generate question-answer information of the first category statement information.
9. A storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the question-answer processing method according to any one of claims 1 to 7.
10. A computer device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the knowledge question-answer processing method of any one of claims 1-7.
CN202110565939.2A 2021-05-24 2021-05-24 Knowledge question-answering processing method, device, equipment and storage medium Pending CN113254612A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110565939.2A CN113254612A (en) 2021-05-24 2021-05-24 Knowledge question-answering processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110565939.2A CN113254612A (en) 2021-05-24 2021-05-24 Knowledge question-answering processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113254612A true CN113254612A (en) 2021-08-13

Family

ID=77183989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110565939.2A Pending CN113254612A (en) 2021-05-24 2021-05-24 Knowledge question-answering processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113254612A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117390169A (en) * 2023-12-11 2024-01-12 季华实验室 Form data question-answering method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516055A (en) * 2019-08-16 2019-11-29 西北工业大学 A kind of cross-platform intelligent answer implementation method for teaching task of combination BERT
CN110532397A (en) * 2019-07-19 2019-12-03 平安科技(深圳)有限公司 Answering method, device, computer equipment and storage medium based on artificial intelligence
US20200226212A1 (en) * 2019-01-15 2020-07-16 International Business Machines Corporation Adversarial Training Data Augmentation Data for Text Classifiers

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200226212A1 (en) * 2019-01-15 2020-07-16 International Business Machines Corporation Adversarial Training Data Augmentation Data for Text Classifiers
CN110532397A (en) * 2019-07-19 2019-12-03 平安科技(深圳)有限公司 Answering method, device, computer equipment and storage medium based on artificial intelligence
CN110516055A (en) * 2019-08-16 2019-11-29 西北工业大学 A kind of cross-platform intelligent answer implementation method for teaching task of combination BERT

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117390169A (en) * 2023-12-11 2024-01-12 季华实验室 Form data question-answering method, device, equipment and storage medium
CN117390169B (en) * 2023-12-11 2024-04-12 季华实验室 Form data question-answering method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US11520975B2 (en) Lean parsing: a natural language processing system and method for parsing domain-specific languages
AU2017296412B2 (en) System and method for automatically understanding lines of compliance forms through natural language patterns
US20200257659A1 (en) Method and apparatus for determing description information, electronic device and computer storage medium
CN107451153A (en) The method and apparatus of export structure query statement
JPH07295989A (en) Device that forms interpreter to analyze data
US20080208836A1 (en) Regression framework for learning ranking functions using relative preferences
CN115470338B (en) Multi-scenario intelligent question answering method and system based on multi-path recall
CN111274822A (en) Semantic matching method, device, equipment and storage medium
CN111831810A (en) Intelligent question and answer method, device, equipment and storage medium
CN114357195A (en) Knowledge graph-based question-answer pair generation method, device, equipment and medium
CN113157887A (en) Knowledge question-answering intention identification method and device and computer equipment
CN111651994B (en) Information extraction method and device, electronic equipment and storage medium
CN113254612A (en) Knowledge question-answering processing method, device, equipment and storage medium
CN113705207A (en) Grammar error recognition method and device
WO2021004118A1 (en) Correlation value determination method and apparatus
CN111723182A (en) Key information extraction method and device for vulnerability text
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN114417010A (en) Knowledge graph construction method and device for real-time workflow and storage medium
CN112015857A (en) User perception evaluation method and device, electronic equipment and computer storage medium
CN117408679B (en) Operation and maintenance scene information processing method and device
CN117591657B (en) Intelligent dialogue management system and method based on AI
CN110728148B (en) Entity relation extraction method and device
CA3076418C (en) Lean parsing: a natural language processing system and method for parsing domain-specific languages
CN116541071A (en) Application programming interface migration method based on prompt learning
CN117272066A (en) Data pattern matching method and system for interface

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination