WO2022095368A1 - 基于文本生成模型的问答语料生成方法和装置 - Google Patents

基于文本生成模型的问答语料生成方法和装置 Download PDF

Info

Publication number
WO2022095368A1
WO2022095368A1 PCT/CN2021/090798 CN2021090798W WO2022095368A1 WO 2022095368 A1 WO2022095368 A1 WO 2022095368A1 CN 2021090798 W CN2021090798 W CN 2021090798W WO 2022095368 A1 WO2022095368 A1 WO 2022095368A1
Authority
WO
WIPO (PCT)
Prior art keywords
question
historical
questions
keywords
keyword
Prior art date
Application number
PCT/CN2021/090798
Other languages
English (en)
French (fr)
Inventor
谢忠玉
陈立
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022095368A1 publication Critical patent/WO2022095368A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a question and answer corpus generation method, device, computer equipment and storage medium based on a text generation model.
  • the question answering system is one of the important areas of artificial intelligence, especially for many businesses currently need a customer service system to solve some of the user's questions, and most of the user's problems are concentrated on some high-frequency problems in the head, also This is the motivation for Frequently Asked Questions (FAQ, frequently asked questions).
  • FAQ Frequently Asked Questions
  • a method, apparatus, computer device, and storage medium for generating question-and-answer corpus based on a text generation model are provided.
  • the text generation model is based on training samples marked with keywords and syntactic feature words. trained;
  • a question-answer pair including the target question and the paraphrase is constructed.
  • the data acquisition module is used to acquire historical questions and standard documents, and extract the keywords in the standard documents and the paraphrase sentences corresponding to the keywords;
  • the historical question word segmentation module is used to process the word segmentation of the historical question, identify and discard the entity nouns in the historical question, and obtain the syntactic feature words of the historical question;
  • the target question generation module is used to combine the syntactic feature words and keywords, and input the combined data into the pre-trained text generation model to obtain the target question corresponding to the keyword. trained on training samples of words and syntactic feature words;
  • the question-answer pair building module is used to construct a question-answer pair including the target question and the paraphrase according to the target question corresponding to the keyword and the paraphrase corresponding to the keyword.
  • the text generation model is based on training samples marked with keywords and syntactic feature words. trained;
  • a question-answer pair including the target question and the paraphrase is constructed.
  • One or more computer-readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
  • a question-answer pair including the target question and the paraphrase is constructed.
  • the above-mentioned question and answer corpus generation method, device, computer equipment and storage medium based on the text generation model by obtaining historical questions and standard documents, by segmenting the historical questions, identifying and discarding the entity nouns in the historical questions, and obtaining the history
  • the syntactic feature words of the question sentence then combine the syntactic feature words and keywords, and input the combined data into the pre-trained text generation model based on the training samples marked with keywords and syntactic feature words, and obtain the target corresponding to the keyword.
  • Questions, and then based on the paraphrase sentences corresponding to the keywords in the standard document construct a question-answer pair including the target question and paraphrase sentences.
  • the target question obtained by the pre-trained text generation model based on the training samples marked with keywords and syntactic feature words can improve the quality of the target question and the question-answer pair.
  • FIG. 2 is a schematic flowchart of a question-and-answer corpus generation method based on a text generation model according to one or more embodiments
  • FIG. 4 is a schematic flowchart of a question-and-answer corpus generation method based on a text generation model according to one or more embodiments
  • FIG. 7 is a block diagram of a question-and-answer corpus generation device based on a text generation model according to one or more embodiments
  • FIG. 8 is a block diagram of a computer device in accordance with one or more embodiments.
  • a method for generating question and answer corpus based on a text generation model is provided. Taking the method applied to the server in FIG. 1 as an example, the following steps 202 to 208 are included.
  • Step 202 obtaining historical questions and standard documents, and extracting keywords in the standard documents and paraphrase sentences corresponding to the keywords.
  • the standard document refers to the reference document due to the construction of the question and answer corpus.
  • a standard document may be a normative text including the content of clauses and technical terms and their definitions.
  • the keywords in the standard document refer to the corresponding keywords in the content of the clause and the professional terms, such as the name of the clause, the name of the professional term, etc.
  • the paraphrase sentences corresponding to the keywords refer to the explanations of the clauses and the noun explanations of the professional terms.
  • Step 204 Perform word segmentation processing on the historical question, identify and discard the entity nouns in the historical question, and obtain the syntactic feature words of the historical question.
  • word segmentation refers to the process of recombining consecutive word sequences into word sequences according to certain specifications.
  • historical questions can be divided into fields, and the part-of-speech of each word in the historical question, such as nouns and interrogative words in the question, can be identified.
  • the entity nouns in the historical question can be identified.
  • the syntactic feature words of the historical question can be obtained.
  • the syntactic feature words are composed of question words and syntactic structures.
  • Step 206 Combine the syntactic feature word and the keyword, and input the combined data into the pre-trained text generation model to obtain the target question corresponding to the keyword.
  • the text generation model is trained based on the training samples marked with keywords and syntactic feature words.
  • the combination of the syntactic feature word and the keyword can be realized by filling the keyword into the vacancy in the syntactic feature word, and the result of the combination of the syntactic feature word and the keyword is a combined question.
  • the input pre-trained text generation model After the data processing of the text generation model, the input combined question can be adjusted and reorganized, and finally a target question with threshold data is output, wherein, Get the target question corresponding to the keyword.
  • the text generation model is trained based on training samples marked with keywords and syntactic feature words.
  • Training samples refer to the data used to train the initially constructed model. From the initial model to the final applicable model, it needs to undergo multiple training, verification and testing until the model evaluation parameters meet the set requirements.
  • Step 208 construct a question-answer pair including the target question and the paraphrase according to the target question corresponding to the keyword and the paraphrase corresponding to the keyword.
  • the server uses the keyword as the intermediate correlation information to establish the correlation between the target question and the paraphrase, and based on the correlation Relation construction consists of question-answer pairs of target questions and paraphrases.
  • the paraphrase sentence corresponding to the keyword is the result obtained by parsing the standard document.
  • the question sentence corresponding to the keyword can be obtained, and the paraphrase sentence corresponding to the keyword is used as the answer sentence corresponding to the question sentence. right.
  • the above-mentioned question and answer corpus generation method based on the text generation model by obtaining historical questions and standard documents, will identify and discard the entity nouns in the historical questions by segmenting the historical questions, and obtain the syntactic feature words of the historical questions, and then The syntactic feature words and keywords are combined, and the combined data is input into the pre-trained text generation model based on the training samples marked with keywords and syntactic feature words to obtain the target question corresponding to the keyword, and then based on the standard document.
  • Paraphrasing sentences corresponding to keywords construct question-answer pairs including target questions and paraphrasing sentences, and based on historical questions and standard documents, more combination results of keywords and syntactic feature words can be obtained.
  • the training samples of syntactic feature words The target question obtained by the pre-trained text generation model can improve the quality of the target question and the question-answer pair.
  • Step 302 performing word segmentation processing on the historical question according to the part of speech to obtain a word segmentation result
  • Step 304 performing entity noun filtering on the word segmentation result to obtain the syntactic structure and interrogative words of the historical question;
  • the syntactic feature word includes the syntactic structure in the question sentence, the question word and other words used to characterize the feature of the question sentence. It should be noted that the syntactic feature word does not contain specific entity nouns. For example, the user's consultation question is "What does the beneficiary mean?" In this question, "beneficiary” is the entity noun, the syntactic structure is "...is", and the question word is "What does it mean? ?”, the extracted syntactic feature words can be "...what does it mean?”
  • entity nouns in historical questions can be filtered out accurately and quickly, which is helpful for subsequent recombination of keywords and syntactic feature words in standard documents to construct a New question and answer corpus to increase the amount of question and answer corpus generated.
  • step 202 historical questions and standard documents are obtained, and the keywords in the standard documents and the paraphrase sentences corresponding to the keywords are extracted, i.e., step 202, including steps 402 to 406.
  • Step 402 obtaining historical question sets and standard documents
  • Step 404 extracting the keywords in the standard document and the paraphrase sentences corresponding to the keywords.
  • Step 406 matching the historical questions in the historical question set with the keywords to obtain the historical questions corresponding to the keywords.
  • a historical question set refers to a data set that includes multiple historical questions. Among the multiple questions provided by the historical question set, the historical question with the highest similarity can be selected as the historical question matching the keyword through similarity matching. sentence.
  • the intelligent question and answer precisely locates the user's question, and provides personalized information services for the user through interaction with the user. maintain a certain degree of matching.
  • the similarity calculation can be obtained by the jaccard similarity coefficient based on probability statistics, cosine similarity based on word vector, Manhattan distance, Euclidean distance or Ming distance, etc., by calculating the relationship between the user's historical consultation question and the keyword.
  • the similarity between the user's historical consulting questions and the keywords is obtained, and the historical consulting questions of each user are screened based on the similarity represented by the similarity to obtain the questions related to the keywords.
  • the similarity matching is performed between the questions in the historical question set and the keywords, and obtaining the historical questions corresponding to the keywords includes:
  • the historical question is filtered, and the historical question corresponding to the keyword is obtained.
  • the target historical question is regarded as the historical question corresponding to the keyword.
  • the keyword is also an entity noun in essence, by calculating the jaccard similarity between the entity noun and the keyword in the word segmentation result, the historical question with a high degree of matching with the keyword in the historical question set can be accurately obtained, In order to further improve the probability value of generating target questions based on the combination of keywords and syntactic feature words in historical questions.
  • step 202 historical questions and standard documents are acquired, and keywords in the standard documents and paraphrase sentences corresponding to the keywords are extracted, namely step 202 , including steps 502 to 506 .
  • Step 502 obtaining standard documents from the pre-examined document database, and searching for historical questions associated with the document content labels according to the document content labels corresponding to the standard documents;
  • Step 506 Extract keywords in the target text, and use the target text as a paraphrase sentence corresponding to the keyword.
  • the pre-reviewed document database Based on the pre-reviewed document database, it can ensure that the standard documents obtained are all compliant documents, and the paraphrase sentences corresponding to the keywords in the standard documents are used as the answer sentences in the question-and-answer pair, which helps to improve the effectiveness of the question-and-answer corpus produced. , to avoid the occurrence of wrong answers when the customer service uses the answer sentences based on the question-and-answer pair in the application process.
  • the target text carrying keywords in the standard document can be selected in a targeted manner, and then the keywords in the target text can be extracted, and the target text can be used as the target text.
  • the paraphrase corresponding to the keyword can be selected in a targeted manner, and then the keywords in the target text can be extracted, and the target text can be used as the target text.
  • the training process of the text generation model includes:
  • the initial text generation model is trained to obtain the text generation model.
  • the user consultation log is a data file in the Q&A system used to record the Q&A corpus between the user and the customer service. Obtaining consultation questions based on the user consultation log can obtain consultation questions more suitable for actual application scenarios. . By identifying the professional terms of the clause word industry in the consulting question, the keywords corresponding to the professional term of the clause word in the consulting question can be obtained, and then based on the same processing method for the historical question above, the consulting question is subjected to word segmentation processing and discarded.
  • the syntactic feature word in the consulting question is obtained, and the combination result of the keyword corresponding to the consulting question and the syntactic feature word is used as the input data, and the consulting question is used as the target output data to construct a training data set.
  • the initial text generation model is trained to obtain the text generation model.
  • applying the text generation model to the generation process of the target question can improve the quality of the generated target question.
  • the candidate question is used as the target question corresponding to the keyword.
  • the candidate question when the probability data of the candidate question is not greater than the preset probability threshold, the candidate question is discarded.
  • the candidate question is discarded.
  • a question-and-answer corpus generation method based on a text generation model is provided, and the application of the method in the insurance field is taken as an example to illustrate.
  • the server obtains the insurance description document, and extracts the terms and definitions in the insurance description document by parsing the document.
  • the server obtains the user's question, and extracts the keyword of the user's question.
  • use the pre-built data set for model generation to train the model to obtain a text generation model, input the user's question keywords and keywords in terms and definitions into the text generation model, generate questions, and then based on the questions and the extracted terms and definitions to generate question-answer pairs.
  • a question and answer corpus generation device based on a text generation model is provided, including: a data acquisition module 702, a historical question segmentation module 704, a target question generation module 706, and a question-and-answer pair construction Module 708, wherein:
  • the historical question word segmentation module 704 is used to perform word segmentation processing on the historical question, identify and discard the entity nouns in the historical question, and obtain the syntactic feature word of the historical question;
  • the target question generation module 706 is used to combine the syntactic feature words and keywords, and input the combined data into the pre-trained text generation model to obtain the target question corresponding to the keyword, wherein the text generation model is based on annotated with training samples of keywords and syntactic feature words;
  • the question-answer pair building module 708 is configured to construct a question-answer pair including the target question and the paraphrase according to the target question corresponding to the keyword and the paraphrase corresponding to the keyword.
  • the historical question word segmentation module is further configured to perform word segmentation processing on the historical question according to part of speech to obtain a word segmentation result; perform entity noun filtering on the word segmentation result to obtain the syntactic structure and interrogative words of the historical question; Structure and question words, get the syntactic feature words of historical questions.
  • the data acquisition module is further configured to acquire a set of historical questions and standard documents; extract keywords in the standard documents and paraphrase sentences corresponding to the keywords; The similarity is matched to obtain the historical question corresponding to the keyword.
  • the data acquisition module is also used to perform word segmentation on the historical questions in the historical question set to obtain the word segmentation result corresponding to the historical question; calculate the jaccard similarity between the entity noun and the keyword in the word segmentation result, Obtain the similarity between the historical question and the keyword; by comparing the similarity corresponding to each historical question, filter the historical question, and obtain the historical question corresponding to the keyword.
  • the data acquisition module is further configured to screen out the target historical question with the largest corresponding similarity from the historical questions based on the similarity between each historical question and the keyword; The historical question corresponding to the keyword.
  • the data acquisition module is further configured to acquire standard documents from a pre-examined document database, and search for historical questions associated with the document content labels according to the document content labels corresponding to the standard documents; Item words and professional terms are identified to obtain the target text; keywords in the target text are extracted, and the target text is used as the paraphrase sentence corresponding to the keyword.
  • the question-and-answer corpus generation device based on the text generation model further includes a model training module for acquiring the consulting questions in the user's consulting log; performing terms and industry terminology recognition on the consulting questions to obtain the consulting questions.
  • the key words in the consultation question are processed by word segmentation, the entity nouns in the question are discarded, and the syntactic feature words in the consultation question are obtained.
  • the question sentence is used as the target output data, and a training data set is constructed; according to the training data set, the initial text generation model is trained to obtain a text generation model.
  • the target question generation module is also used to combine the syntactic feature words and keywords, and input the combined data into the pre-trained text generation model to obtain the candidate question of the carried probability data; when When the probability data of the candidate question is greater than the preset probability threshold, the candidate question is used as the target question corresponding to the keyword.
  • the above-mentioned question-and-answer corpus generation device based on the text generation model obtains the historical questions and standard documents, and identifies and discards the entity nouns in the historical questions by segmenting the historical questions to obtain the syntactic feature words of the historical questions, and then The syntactic feature words and keywords are combined, and the combined data is input into the pre-trained text generation model based on the training samples marked with keywords and syntactic feature words to obtain the target question corresponding to the keyword, and then based on the standard document.
  • Paraphrasing sentences corresponding to keywords construct question-answer pairs including target questions and paraphrasing sentences, and based on historical questions and standard documents, more combinations of keywords and syntactic feature words can be obtained.
  • the training samples of syntactic feature words The target question obtained by the pre-trained text generation model can improve the quality of the target question and the question-answer pair.
  • Each module in the above-mentioned apparatus for generating question and answer corpus based on a text generation model can be implemented in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 8 .
  • the computer device includes a processor, memory, and a network interface connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes non-volatile or volatile storage media, internal memory.
  • the non-volatile or volatile storage medium stores an operating system, computer readable instructions and a database.
  • the internal memory provides an environment for the execution of an operating system and computer-readable instructions in a non-volatile or volatile storage medium.
  • the database of the computer device is used to store the question-and-answer corpus generation data based on the text generation model.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions when executed by the processor, implement a question-and-answer corpus generation method based on a text generation model.
  • FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • a computer device includes a memory and one or more processors, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, causes the one or more processors to perform the following steps:
  • Obtain historical questions and standard documents extract the keywords in the standard documents and the paraphrase sentences corresponding to the keywords; perform word segmentation on the historical questions, identify and discard the entity nouns in the historical questions, and obtain the syntactic features of the historical questions words; combine syntactic feature words and keywords, and input the combined data into a pre-trained text generation model to obtain target questions corresponding to the keywords, wherein the text generation model is based on annotated keywords and syntactic feature words.
  • the training samples are obtained by training; according to the target question corresponding to the keyword and the paraphrase corresponding to the keyword, a question-answer pair including the target question and the paraphrase is constructed.
  • the processor further implements the following steps when executing the computer-readable instructions:
  • the historical question is segmented to obtain the word segmentation result; the entity noun filtering is performed on the word segmentation result to obtain the syntactic structure and interrogative words of the historical question; according to the syntactic structure and interrogative words, the syntactic feature word of the historical question is obtained.
  • the processor further implements the following steps when executing the computer-readable instructions:
  • the processor further implements the following steps when executing the computer-readable instructions:
  • the processor further implements the following steps when executing the computer-readable instructions:
  • the target historical question with the largest corresponding similarity is selected from the historical question; the target historical question is regarded as the historical question corresponding to the keyword.
  • the processor further implements the following steps when executing the computer-readable instructions:
  • the processor further implements the following steps when executing the computer-readable instructions:
  • the processor further implements the following steps when executing the computer-readable instructions:
  • the above-mentioned computer equipment for realizing the question-and-answer corpus generation method based on the text generation model obtains the historical question by segmenting the historical question and identifying and discarding the entity nouns in the historical question by acquiring the historical question and the standard document. Then, combine the syntactic feature words with the keywords, and input the combined data into the text generation model pre-trained based on the training samples marked with the keywords and the syntactic feature words to obtain the target question corresponding to the keyword. Then, based on the paraphrase sentences corresponding to the keywords in the standard document, a question-and-answer pair including the target question and the paraphrase sentence is constructed. Based on the historical question and the standard document, more combination results of keywords and syntactic feature words can be obtained. The target questions obtained by the pre-trained text generation model based on the training samples marked with keywords and syntactic feature words can improve the quality of the target questions and question-answer pairs.
  • One or more computer-readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
  • Obtain historical questions and standard documents extract the keywords in the standard documents and the paraphrase sentences corresponding to the keywords; perform word segmentation on the historical questions, identify and discard the entity nouns in the historical questions, and obtain the syntactic features of the historical questions words; combine syntactic feature words and keywords, and input the combined data into a pre-trained text generation model to obtain target questions corresponding to the keywords, wherein the text generation model is based on annotated keywords and syntactic feature words.
  • the training samples are obtained by training; according to the target question corresponding to the keyword and the paraphrase corresponding to the keyword, a question-answer pair including the target question and the paraphrase is constructed.
  • the computer readable instructions when executed by the processor, further implement the following steps:
  • the computer readable instructions when executed by the processor, further implement the following steps:
  • the computer readable instructions when executed by the processor, further implement the following steps:
  • the target historical question with the largest corresponding similarity is selected from the historical question; the target historical question is regarded as the historical question corresponding to the keyword.
  • the computer readable instructions when executed by the processor, further implement the following steps:
  • the computer readable instructions when executed by the processor, further implement the following steps:
  • the target questions obtained by the pre-trained text generation model based on the training samples marked with keywords and syntactic feature words can improve the quality of the target questions and the question-answer pairs.
  • any reference to memory, storage, database or other media used in the various embodiments provided in this application may include at least one of non-volatile and volatile memory.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical memory, and the like.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • the RAM may be in various forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请为一种基于文本生成模型的问答语料生成方法,涉及人工智能领域,方法包括:获取历史问句和标准文档,提取标准文档中的关键词以及与关键词对应的释义语句,对历史问句进行分词处理,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词,将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与关键词对应的目标问句,其中,文本生成模型基于标注有关键词和句法特征词的训练样本训练得到,根据与关键词对应的目标问句以及与关键词对应的释义语句,构建包括目标问句与释义语句的问答对。

Description

基于文本生成模型的问答语料生成方法和装置
相关申请的交叉引用
本申请要求于2020年11月04日提交中国专利局,申请号为2020112166427,申请名称为“基于文本生成模型的问答语料生成方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,特别是涉及一种基于文本生成模型的问答语料生成方法、装置、计算机设备和存储介质。
背景技术
随着人工智能技术的发展,人工智能在越来越多的场景得到了应用。其中,问答系统是人工智能的重要领域之一,尤其是对于目前很多的商家需要一个客服系统来解决用户的一些疑问,而用户的问题绝大部分集中在头部的一些高频问题上,也就是Frequently Asked Questions(FAQ,常见问题)的提出动机。
FAQ语料集的数量和质量是整个系统的基础,但是目前没有办法给出一个通用的全覆盖的FAQ语料集,所以都是每个垂直领域需要各自重新开始构建FAQ语料集。重新构建语料集通常使用基于历史数据录入的方式建立FAQ。
然而,发明人意识到,目前的数据录入方式,会导致其录入的部分问句与答句之间的存在匹配度不够高的问题。
发明内容
根据本申请公开的各种实施例,提供一种基于文本生成模型的问答语料生成方法、装置、计算机设备和存储介质。
一种基于文本生成模型的问答语料生成方法,方法包括:
获取历史问句和标准文档,提取标准文档中的关键词以及与关键词对应的释义语句;
对历史问句进行分词处理,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词;
将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与关键词对应的目标问句,其中,文本生成模型基于标注有关键词和句法特征词的训练样本训练得到;及
根据与关键词对应的目标问句以及与关键词对应的释义语句,构建包括目标问句与释义语句的问答对。
一种基于文本生成模型的问答语料生成装置,装置包括:
数据获取模块,用于获取历史问句和标准文档,提取标准文档中的关键词以及与关键词对应的释义语句;
历史问句分词模块,用于对历史问句进行分词处理,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词;
目标问句生成模块,用于将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与关键词对应的目标问句,其中,文本生成模型基于标注有关键词和句法特征词的训练样本训练得到;及
问答对构建模块,用于根据与关键词对应的目标问句以及与关键词对应的释义语句,构建包括目标问句与释义语句的问答对。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:
获取历史问句和标准文档,提取标准文档中的关键词以及与关键词对应的释义语句;
对历史问句进行分词处理,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词;
将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与关键词对应的目标问句,其中,文本生成模型基于标注有关键词和句法特征词的训练样本训练得到;及
根据与关键词对应的目标问句以及与关键词对应的释义语句,构建包括目标问句与释义语句的问答对。
一个或多个存储有计算机可读指令的计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
获取历史问句和标准文档,提取标准文档中的关键词以及与关键词对应的释义语句;
对历史问句进行分词处理,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词;
将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与关键词对应的目标问句,其中,文本生成模型基于标注有关键词和句法特征词的训练样本训练得到;及
根据与关键词对应的目标问句以及与关键词对应的释义语句,构建包括目标问句与释义语句的问答对。
上述基于文本生成模型的问答语料生成方法、装置、计算机设备和存储介质,通过获取历史问句和标准文档,将通过对历史问句进行分词,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词,然后将句法特征词与关键词进行组合,并将组合数据输入基于标注有关键词和句法特征词的训练样本预先训练好的文本生成模型,得到与关键词对应 的目标问句,然后基于标准文档中与关键词对应的释义语句,构建包括目标问句与释义语句的问答对,基于历史问句和标准文档,能够得到更多的关键词和句法特征词的组合结果,通过基于标注有关键词和句法特征词的训练样本预先训练好的文本生成模型得到的目标问句,能够提高目标问句以及问答对的质量。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中基于文本生成模型的问答语料生成方法的应用场景图;
图2为根据一个或多个实施例中基于文本生成模型的问答语料生成方法的流程示意图;
图3为根据一个或多个实施例中基于文本生成模型的问答语料生成方法的流程示意图;
图4为根据一个或多个实施例中基于文本生成模型的问答语料生成方法的流程示意图;
图5为根据一个或多个实施例中基于文本生成模型的问答语料生成方法的流程示意图;
图6为根据一个或多个实施例中基于文本生成模型的问答语料生成方法的流程示意图;
图7为根据一个或多个实施例中基于文本生成模型的问答语料生成装置的框图;
图8为根据一个或多个实施例中计算机设备的框图。
具体实施方式
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的基于文本生成模型的问答语料生成方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104进行通信。服务器响应终端的问答语料生成请求,根据问答语料生成请求,获取历史问句和标准文档,提取标准文档中的关键词以及与关键词对应的释义语句,对历史问句进行分词处理,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词,将句法特征词与关键词进行组合,并将组合数据输入预 先训练好的文本生成模型,得到与关键词对应的目标问句,其中,文本生成模型基于标注有关键词和句法特征词的训练样本训练得到,根据与关键词对应的目标问句以及与关键词对应的释义语句,构建包括目标问句与释义语句的问答对,并将构建的问答对反馈至终端102。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在其中一个实施例中,如图2所示,提供了一种基于文本生成模型的问答语料生成方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤202至步骤208。
步骤202,获取历史问句和标准文档,提取标准文档中的关键词以及与关键词对应的释义语句。
其中,历史问句是指记录在服务器中的问句。具体地,历史问句可以是基于问答系统的历史问答日志中记录的问句。也可以是自定义编辑并存储在服务器中的问句,还可以是基于指定的爬虫条件,通过数据爬取得到的问句。
标准文档是指由于构建问答语料的参考文档。具体地,标准文档可以是包括条款内容和专业术语及其释义的规范文本。标准文档中的关键词是指条款内容和专业术语中对应的关键词,例如条款名称、专业术语名称等。关键词对应的释义语句是指用于对条款的解释说明,以及对专业术语的名词解释等。
例如,在保险领域,标准文档可以是保险说明文档,如保单说明书等。当用户在投保过程中存在疑问时,会咨询客服寻求解答。通过预先构建的问答对,可以快速便捷地获取到咨询问句对应的答句并反馈给用户,从而提高问答处理效率,还有利于提高用户体验。
步骤204,对历史问句进行分词处理,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词。
其中,分词是指将连续的字序列按照一定的规范重新组合成词序列的过程。具体来说,通过分词处理,可以对历史问句进行字段划分,并识别历史问句中各个词语的词性,例如问句中的名词、疑问词等。通过分词处理后,能够识别出历史问句中的实体名词,通过对实体名词进行丢弃操作,可以得到历史问句的句法特征词,句法特征词由疑问词和句法结构构成。
步骤206,将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与关键词对应的目标问句。
其中,文本生成模型基于标注有关键词和句法特征词的训练样本训练得到。
句法特征词与关键词的组合,可以通过将关键词填充至句法特征词中的空位实现,句法特征词与关键词的组合结果是一个组合问句。通过将该组合问句输入至输入预先训练好的文本生成模型,经过文本生成模型的数据处理,可以对输入的组合问句进行调整重组,最后输出一个携带有阈值数据的目标问句,其中,得到目标问句与关键词对应。
具体地,文本生成模型基于标注有关键词和句法特征词的训练样本训练得到。训练样 本是指用于对初始构建的模型进行训练的数据,从初始的模型到最终能应用的模型,需要经过多次训练、验证和测试,直到模型评估参数达到设定的要求。
步骤208,根据与关键词对应的目标问句以及与关键词对应的释义语句,构建包括目标问句与释义语句的问答对。
服务器根据文本生成模型输出的与关键词对应的目标问句、以及提取得到的与关键词对应的释义语句,以关键词为中间关联信息,建立目标问句与释义语句的关联关系,并基于关联关系构建包括目标问句与释义语句的问答对。关键词对应的释义语句是通过对标准文档进行解析得到的结果,通过文本生成模型,可以得到与关键词对应的问句,将关键词对应的释义语句作为问句对应的答句,构建得到问答对。
通过自动生成问答对的形式,对于垂直领域构建问答语料库有极大的帮助,尤其是在项目初期,可以快速生成大量的问答语料,并且由于这部分语料的答案是从现有的标准文档中得到的,所以答案并不会存在任何的合规问题,相对于人工构建的问答语料,还可以进一步节省语料合规审核的成本。同时标准文档中关键词相关的内容也是用户咨询的高频问题,对于问答系统在初期对高频问题的覆盖度有较大的帮助。
上述基于文本生成模型的问答语料生成方法,通过获取历史问句和标准文档,将通过对历史问句进行分词,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词,然后将句法特征词与关键词进行组合,并将组合数据输入基于标注有关键词和句法特征词的训练样本预先训练好的文本生成模型,得到与关键词对应的目标问句,然后基于标准文档中与关键词对应的释义语句,构建包括目标问句与释义语句的问答对,基于历史问句和标准文档,能够得到更多的关键词和句法特征词的组合结果,通过基于标注有关键词和句法特征词的训练样本预先训练好的文本生成模型得到的目标问句,能够提高目标问句以及问答对的质量。
在其中一个实施例中,如图3所示,对历史问句进行分词处理,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词即步骤204,包括步骤302至步骤306。
步骤302,按词性对历史问句进行分词处理,得到分词结果;
步骤304,对分词结果进行实体名词过滤,得到历史问句的句法结构和疑问词;及
步骤306,根据句法结构和疑问词,得到历史问句的句法特征词。
在实施例中,句法特征词包括问句中的句法结构,疑问词等用于表征问句特征的词语,需要说明的是,句法特征词中是不包含具体的实体名词的。举例来说,用户的咨询问句为“受益人是什么意思呢?”这个问句中,“受益人”为实体名词,句法结构为“……是……”,疑问词为“什么意思呢?”则提取的句法特征词可以是“……是什么意思呢?”
在上述实施例中,通过对历史问句按词性进行分词,可以准确快速过滤掉历史问句中的实体名词,有助于后续将标准文档中的关键词与句法特征词重新进行组合,构建出新的问答语料,增加问答语料的生成量。
在其中一个实施例中,如图4所示,获取历史问句和标准文档,提取标准文档中的关 键词以及与关键词对应的释义语句即步骤202,包括步骤402至步骤406。
步骤402,获取历史问句集和标准文档;
步骤404,提取标准文档中的关键词以及与关键词对应的释义语句;及
步骤406,将历史问句集中的历史问句与关键词进行相似度匹配,得到与关键词对应的历史问句。
历史问句集是指包括多个历史问句的数据集合,在历史问句集提供的多个问句中,可以通过相似度匹配,筛选出相似度最高的历史问句作为与关键词匹配的历史问句。
具体地,智能问答以一问一答的形式,精确定位用户的提问,通过与用户交互,为用户提供个性化的信息服务,为了满足用户的提问需求,问答对中的问句与答句需保持一定的匹配度。相似度的计算可以通过基于概率统计的jaccard相似系数、基于词向量的余弦相似度、曼哈顿距离、欧几里得距离或是明式距离等方式得到,通过计算用户历史咨询问句与关键词之间的相似度,得到用户历史咨询问句与关键词的相似度匹配结果,基于相似度表征的相似程度,对各用户历史咨询问句进行筛选,得到与关键词相关的问句。
在其中一个实施例中,将历史问句集中的问句与关键词进行相似度匹配,得到与关键词对应的历史问句包括:
对历史问句集中的历史问句进行分词处理,得到历史问句对应的分词结果;
计算分词结果中的实体名词与关键词的jaccard相似度,得到历史问句与关键词的相似度;及
通过比较各历史问句对应的相似度大小,对历史问句进行筛选,得到与关键词对应的历史问句。
进一步地,通过比较各历史问句对应的相似度大小,对历史问句进行筛选,得到与关键词对应的历史问句,包括:
基于各历史问句与关键词的相似度大小,从历史问句中筛选出对应相似度最大的目标历史问句;及
将目标历史问句作为与关键词对应的历史问句。
在本实施例中,由于关键词实质上也是实体名词,通过计算分词结果中的实体名词与关键词的jaccard相似度,能够准确得到历史问句集中与关键词匹配度较高的历史问句,以便于进一步提高基于关键词与历史问句中的句法特征词的组合生成目标问句的概率值。
在其中一个实施例中,如图5所示,获取历史问句和标准文档,提取标准文档中的关键词以及与关键词对应的释义语句即步骤202,包括步骤502至步骤506。
步骤502,从预先审核通过的文档数据库中获取标准文档,并根据标准文档对应的文档内容标签,查找与文档内容标签关联的历史问句;
步骤504,对标准文档进行条款词和专业术语识别,得到目标文本;及
步骤506,提取目标文本中的关键词,并将目标文本作为关键词对应的释义语句。
基于预先审核通过的文档数据库,能够保证获取的标准文档都是合规的文档,以标准 文档中关键词对应的释义语句作为问答对中的答句,有助于提高生产的问答语料的有效性,避免出现在应用过程中客服利用基于问答对中的答句进行回答时,出现回答错误的情况的发生。
具体地,标准文档携带有文档内容标签,基于文档内容标签,来查找与文档内容标签关联的历史问句,能够保证历史问句与标准文档之间的关联性,从数据选取的过程中,确保历史问句与标准文档中的关键词之间能具有一定的匹配程度。
在本实施例中,通过识别标准文档中的条款词和专业术语,能够对标准文档中携带有关键字的目标文本进行针对性的选取,进而提取目标文本中的关键词,并将目标文本作为关键词对应的释义语句。
在其中一个实施例中,文本生成模型的训练过程包括:
获取用户咨询日志中的咨询问句;
对咨询问句进行条款词行业专业术语识别,得到咨询问句中的关键词,并对咨询问句进行分词处理,丢弃问句中的实体名词,得到咨询问句中的句法特征词;
以咨询问句对应的关键词和句法特征词作为输入数据,以咨询问句作为目标输出数据,构建训练数据集;及
根据训练数据集,对初始文本生成模型进行训练,得到文本生成模型。
在本实施例中,用户咨询日志是问答系统中用于记录用户与客服之间的问答语料的数据文件,基于用户咨询日志来获取咨询问句,能得到更贴合实际应用场景的咨询问句。通过对咨询问句进行条款词行业专业术语识别,能够得到咨询问句中条款词行业专业术语对应的关键词,然后基于上述对历史问句相同的处理方式,对咨询问句进行分词处理,丢弃问句中的实体名词,得到咨询问句中的句法特征词,以咨询问句对应的关键词和句法特征词的组合结果作为输入数据,以咨询问句作为目标输出数据,构建训练数据集。根据训练数据集,对初始文本生成模型进行训练,得到文本生成模型。
在本实施例中,将文本生成模型应用在目标问句的生成过程中,能够提高生成的目标问句的质量。
在其中一个实施例中,将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与关键词对应的目标问句包括:
将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到携带的概率数据的备选问句;及
当备选问句的概率数据大于预设概率阈值时,将备选问句作为与关键词对应的目标问句。
具体地,将句法特征词与所述关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到携带的概率数据的备选问句包括:
将关键词填充至句法特征词中的空位,得到组合问句;及
将组合问句输入预先训练好的文本生成模型,得到携带的概率数据的备选问句。
在本实施例中,当备选问句的概率数据不大于预设概率阈值时,丢弃该备选问句。通过按预设阈值对备选问句进行筛选,能够进一步确保得到的目标问句是满足需要且与关键词高度匹配的问句。
在一个应用实例中,如图6所示,提供了一种基于文本生成模型的问答语料生成方法,以该方法在保险领域的应用为例进行说明。
首先,服务器获取保险说明文档,通过对文档解析,抽取保险说明文档中的条款和释义。服务器获取用户问句,提取用户问法关键词。然后利用预先构建的用于生成模型的数据集,对模型进行训练,得到文本生成模型,将用户问法关键词和条款和释义中的关键词输入文本生成模型,生成问句,然后基于问句和抽取的条款与释义,生成问答对。
应该理解的是,虽然上述实施例涉及的各流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,上述实施例涉及的各流程图中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图7所示,提供了一种基于文本生成模型的问答语料生成装置,包括:数据获取模块702、历史问句分词模块704、目标问句生成模块706和问答对构建模块708,其中:
数据获取模块702,用于获取历史问句和标准文档,提取标准文档中的关键词以及与关键词对应的释义语句;
历史问句分词模块704,用于对历史问句进行分词处理,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词;
目标问句生成模块706,用于将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与关键词对应的目标问句,其中,文本生成模型基于标注有关键词和句法特征词的训练样本训练得到;及
问答对构建模块708,用于根据与关键词对应的目标问句以及与关键词对应的释义语句,构建包括目标问句与释义语句的问答对。
在其中一个实施例中,历史问句分词模块还用于按词性对历史问句进行分词处理,得到分词结果;对分词结果进行实体名词过滤,得到历史问句的句法结构和疑问词;根据句法结构和疑问词,得到历史问句的句法特征词。
在其中一个实施例中,数据获取模块还用于获取历史问句集和标准文档;提取标准文档中的关键词以及与关键词对应的释义语句;将历史问句集中的历史问句与关键词进行相似度匹配,得到与关键词对应的历史问句。
在其中一个实施例中,数据获取模块还用于对历史问句集中的历史问句进行分词处 理,得到历史问句对应的分词结果;计算分词结果中的实体名词与关键词的jaccard相似度,得到历史问句与关键词的相似度;通过比较各历史问句对应的相似度大小,对历史问句进行筛选,得到与关键词对应的历史问句。
在其中一个实施例中,数据获取模块还用于基于各历史问句与关键词的相似度大小,从历史问句中筛选出对应相似度最大的目标历史问句;将目标历史问句作为与关键词对应的历史问句。
在其中一个实施例中,数据获取模块还用于从预先审核通过的文档数据库中获取标准文档,并根据标准文档对应的文档内容标签,查找与文档内容标签关联的历史问句;对标准文档进行条款词和专业术语识别,得到目标文本;提取目标文本中的关键词,并将目标文本作为关键词对应的释义语句。
在其中一个实施例中,基于文本生成模型的问答语料生成装置还包括模型训练模块,用于获取用户咨询日志中的咨询问句;对咨询问句进行条款词行业专业术语识别,得到咨询问句中的关键词,并对咨询问句进行分词处理,丢弃问句中的实体名词,得到咨询问句中的句法特征词;以咨询问句对应的关键词和句法特征词作为输入数据,以咨询问句作为目标输出数据,构建训练数据集;根据训练数据集,对初始文本生成模型进行训练,得到文本生成模型。
在其中一个实施例中,目标问句生成模块还用于将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到携带的概率数据的备选问句;当备选问句的概率数据大于预设概率阈值时,将备选问句作为与关键词对应的目标问句。
在其中一个实施例中,目标问句生成模块还用于将关键词填充至句法特征词中的空位,得到组合问句;将组合问句输入预先训练好的文本生成模型,得到携带的概率数据的备选问句。
上述基于文本生成模型的问答语料生成装置,通过获取历史问句和标准文档,将通过对历史问句进行分词,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词,然后将句法特征词与关键词进行组合,并将组合数据输入基于标注有关键词和句法特征词的训练样本预先训练好的文本生成模型,得到与关键词对应的目标问句,然后基于标准文档中与关键词对应的释义语句,构建包括目标问句与释义语句的问答对,基于历史问句和标准文档,能够得到更多的关键词和句法特征词的组合结果,通过基于标注有关键词和句法特征词的训练样本预先训练好的文本生成模型得到的目标问句,能够提高目标问句以及问答对的质量。
关于基于文本生成模型的问答语料生成装置的具体限定可以参见上文中对于基于文本生成模型的问答语料生成方法的限定,在此不再赘述。上述基于文本生成模型的问答语料生成装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性或易失性存储介质、内存储器。该非易失性或易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性或易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储基于文本生成模型的问答语料生成数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种基于文本生成模型的问答语料生成方法。
本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤:
获取历史问句和标准文档,提取标准文档中的关键词以及与关键词对应的释义语句;对历史问句进行分词处理,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词;将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与关键词对应的目标问句,其中,文本生成模型基于标注有关键词和句法特征词的训练样本训练得到;根据与关键词对应的目标问句以及与关键词对应的释义语句,构建包括目标问句与释义语句的问答对。
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:
按词性对历史问句进行分词处理,得到分词结果;对分词结果进行实体名词过滤,得到历史问句的句法结构和疑问词;根据句法结构和疑问词,得到历史问句的句法特征词。
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:
获取历史问句集和标准文档;提取标准文档中的关键词以及与关键词对应的释义语句;将历史问句集中的历史问句与关键词进行相似度匹配,得到与关键词对应的历史问句。
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:
对历史问句集中的历史问句进行分词处理,得到历史问句对应的分词结果;计算分词结果中的实体名词与关键词的jaccard相似度,得到历史问句与关键词的相似度;通过比较各历史问句对应的相似度大小,对历史问句进行筛选,得到与关键词对应的历史问句。
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:
基于各历史问句与关键词的相似度大小,从历史问句中筛选出对应相似度最大的目标历史问句;将目标历史问句作为与关键词对应的历史问句。
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:
从预先审核通过的文档数据库中获取标准文档,并根据标准文档对应的文档内容标签,查找与文档内容标签关联的历史问句;对标准文档进行条款词和专业术语识别,得到 目标文本;提取目标文本中的关键词,并将目标文本作为关键词对应的释义语句。
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:
获取用户咨询日志中的咨询问句;对咨询问句进行条款词行业专业术语识别,得到咨询问句中的关键词,并对咨询问句进行分词处理,丢弃问句中的实体名词,得到咨询问句中的句法特征词;以咨询问句对应的关键词和句法特征词作为输入数据,以咨询问句作为目标输出数据,构建训练数据集;根据训练数据集,对初始文本生成模型进行训练,得到文本生成模型。
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:
将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到携带的概率数据的备选问句;当备选问句的概率数据大于预设概率阈值时,将备选问句作为与关键词对应的目标问句。
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:
将关键词填充至句法特征词中的空位,得到组合问句;将组合问句输入预先训练好的文本生成模型,得到携带的概率数据的备选问句。
上述用于实现基于文本生成模型的问答语料生成方法的计算机设备,通过获取历史问句和标准文档,将通过对历史问句进行分词,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词,然后将句法特征词与关键词进行组合,并将组合数据输入基于标注有关键词和句法特征词的训练样本预先训练好的文本生成模型,得到与关键词对应的目标问句,然后基于标准文档中与关键词对应的释义语句,构建包括目标问句与释义语句的问答对,基于历史问句和标准文档,能够得到更多的关键词和句法特征词的组合结果,通过基于标注有关键词和句法特征词的训练样本预先训练好的文本生成模型得到的目标问句,能够提高目标问句以及问答对的质量。
一个或多个存储有计算机可读指令的计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤::
获取历史问句和标准文档,提取标准文档中的关键词以及与关键词对应的释义语句;对历史问句进行分词处理,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词;将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与关键词对应的目标问句,其中,文本生成模型基于标注有关键词和句法特征词的训练样本训练得到;根据与关键词对应的目标问句以及与关键词对应的释义语句,构建包括目标问句与释义语句的问答对。
在一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:
按词性对历史问句进行分词处理,得到分词结果;对分词结果进行实体名词过滤,得到历史问句的句法结构和疑问词;根据句法结构和疑问词,得到历史问句的句法特征词。
在一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:
获取历史问句集和标准文档;提取标准文档中的关键词以及与关键词对应的释义语 句;将历史问句集中的历史问句与关键词进行相似度匹配,得到与关键词对应的历史问句。
在一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:
对历史问句集中的历史问句进行分词处理,得到历史问句对应的分词结果;计算分词结果中的实体名词与关键词的jaccard相似度,得到历史问句与关键词的相似度;通过比较各历史问句对应的相似度大小,对历史问句进行筛选,得到与关键词对应的历史问句。
在一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:
基于各历史问句与关键词的相似度大小,从历史问句中筛选出对应相似度最大的目标历史问句;将目标历史问句作为与关键词对应的历史问句。
在一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:
从预先审核通过的文档数据库中获取标准文档,并根据标准文档对应的文档内容标签,查找与文档内容标签关联的历史问句;对标准文档进行条款词和专业术语识别,得到目标文本;提取目标文本中的关键词,并将目标文本作为关键词对应的释义语句。
在一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:
获取用户咨询日志中的咨询问句;对咨询问句进行条款词行业专业术语识别,得到咨询问句中的关键词,并对咨询问句进行分词处理,丢弃问句中的实体名词,得到咨询问句中的句法特征词;以咨询问句对应的关键词和句法特征词作为输入数据,以咨询问句作为目标输出数据,构建训练数据集;根据训练数据集,对初始文本生成模型进行训练,得到文本生成模型。
在一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:
将句法特征词与关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到携带的概率数据的备选问句;当备选问句的概率数据大于预设概率阈值时,将备选问句作为与关键词对应的目标问句。
在一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:
将关键词填充至句法特征词中的空位,得到组合问句;将组合问句输入预先训练好的文本生成模型,得到携带的概率数据的备选问句。
上述用于实现基于文本生成模型的问答语料生成方法的计算机存储介质,通过获取历史问句和标准文档,将通过对历史问句进行分词,识别并丢弃历史问句中的实体名词,得到历史问句的句法特征词,然后将句法特征词与关键词进行组合,并将组合数据输入基于标注有关键词和句法特征词的训练样本预先训练好的文本生成模型,得到与关键词对应的目标问句,然后基于标准文档中与关键词对应的释义语句,构建包括目标问句与释义语句的问答对,基于历史问句和标准文档,能够得到更多的关键词和句法特征词的组合结果,通过基于标注有关键词和句法特征词的训练样本预先训练好的文本生成模型得到的目标问句,能够提高目标问句以及问答对的质量。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存 储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种基于文本生成模型的问答语料生成方法,所述方法包括:
    获取历史问句和标准文档,提取所述标准文档中的关键词以及与所述关键词对应的释义语句;
    对所述历史问句进行分词处理,识别并丢弃所述历史问句中的实体名词,得到所述历史问句的句法特征词;
    将所述句法特征词与所述关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与所述关键词对应的目标问句,其中,所述文本生成模型基于标注有关键词和句法特征词的训练样本训练得到;及
    根据与所述关键词对应的目标问句以及与所述关键词对应的释义语句,构建包括所述目标问句与所述释义语句的问答对。
  2. 根据权利要求1所述的方法,其中,所述对所述历史问句进行分词处理,识别并丢弃所述历史问句中的实体名词,得到所述历史问句的句法特征词包括:
    按词性对所述历史问句进行分词处理,得到分词结果;
    对所述分词结果进行实体名词过滤,得到所述历史问句的句法结构和疑问词;及
    根据所述句法结构和所述疑问词,得到所述历史问句的句法特征词。
  3. 根据权利要求1所述的方法,其中,所述获取历史问句和标准文档,提取所述标准文档中的关键词以及与所述关键词对应的释义语句包括:
    获取历史问句集和标准文档;
    提取所述标准文档中的关键词以及与所述关键词对应的释义语句;及
    将所述历史问句集中的历史问句与所述关键词进行相似度匹配,得到与所述关键词对应的历史问句。
  4. 根据权利要求1所述的方法,其中,所述将所述历史问句集中的问句与所述关键词进行相似度匹配,得到与所述关键词对应的历史问句包括:
    对历史问句集中的历史问句进行分词处理,得到历史问句对应的分词结果;
    计算所述分词结果中的实体名词与所述关键词的jaccard相似度,得到所述历史问句与所述关键词的相似度;及
    通过比较各所述历史问句对应的相似度大小,对所述历史问句进行筛选,得到与所述关键词对应的历史问句。
  5. 根据权利要求4所述的方法,其中,所述通过比较各历史问句对应的相似度大小,对历史问句进行筛选,得到与关键词对应的历史问句,包括:
    基于各历史问句与所述关键词的相似度大小,从所述历史问句中筛选出对应相似度最大的目标历史问句;及
    将所述目标历史问句作为与所述关键词对应的历史问句。
  6. 根据权利要求1所述的方法,其中,获取历史问句和标准文档,提取所述标准文 档中的关键词以及与所述关键词对应的释义语句包括:
    从预先审核通过的文档数据库中获取标准文档,并根据所述标准文档对应的文档内容标签,查找与所述文档内容标签关联的历史问句;
    对所述标准文档进行条款词和专业术语识别,得到目标文本;及
    提取所述目标文本中的关键词,并将所述目标文本作为所述关键词对应的释义语句。
  7. 根据权利要求1所述的方法,其中,所述文本生成模型的训练过程包括:
    获取用户咨询日志中的咨询问句;
    对所述咨询问句进行条款词行业专业术语识别,得到所述咨询问句中的关键词,并对所述咨询问句进行分词处理,丢弃所述问句中的实体名词,得到所述咨询问句中的句法特征词;
    以所述咨询问句对应的关键词和句法特征词作为输入数据,以所述咨询问句作为目标输出数据,构建训练数据集;及
    根据所述训练数据集,对初始文本生成模型进行训练,得到所述文本生成模型。
  8. 根据权利要求1所述的方法,其中,所述将所述句法特征词与所述关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与所述关键词对应的目标问句包括:
    将所述句法特征词与所述关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到携带的概率数据的备选问句;及
    当所述备选问句的概率数据大于预设概率阈值时,将所述备选问句作为与所述关键词对应的目标问句。
  9. 根据权利要求1所述的方法,其中,所述将所述句法特征词与所述关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到携带的概率数据的备选问句包括:
    将所述关键词填充至所述句法特征词中的空位,得到组合问句;及
    将所述组合问句输入预先训练好的文本生成模型,得到携带的概率数据的备选问句。
  10. 一种基于文本生成模型的问答语料生成装置,包括:
    数据获取模块,用于获取历史问句和标准文档,提取所述标准文档中的关键词以及与所述关键词对应的释义语句;
    历史问句分词模块,用于对所述历史问句进行分词处理,识别并丢弃所述历史问句中的实体名词,得到所述历史问句的句法特征词;
    目标问句生成模块,用于将所述句法特征词与所述关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与所述关键词对应的目标问句,其中,所述文本生成模型基于标注有关键词和句法特征词的训练样本训练得到;及
    问答对构建模块,用于根据与所述关键词对应的目标问句以及与所述关键词对应的释义语句,构建包括所述目标问句与所述释义语句的问答对。
  11. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机 可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取历史问句和标准文档,提取所述标准文档中的关键词以及与所述关键词对应的释义语句;
    对所述历史问句进行分词处理,识别并丢弃所述历史问句中的实体名词,得到所述历史问句的句法特征词;
    将所述句法特征词与所述关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与所述关键词对应的目标问句,其中,所述文本生成模型基于标注有关键词和句法特征词的训练样本训练得到;及
    根据与所述关键词对应的目标问句以及与所述关键词对应的释义语句,构建包括所述目标问句与所述释义语句的问答对。
  12. 根据权利要求11所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:
    按词性对所述历史问句进行分词处理,得到分词结果;
    对所述分词结果进行实体名词过滤,得到所述历史问句的句法结构和疑问词;及
    根据所述句法结构和所述疑问词,得到所述历史问句的句法特征词。
  13. 根据权利要求11所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:
    获取历史问句集和标准文档;
    提取所述标准文档中的关键词以及与所述关键词对应的释义语句;及
    将所述历史问句集中的历史问句与所述关键词进行相似度匹配,得到与所述关键词对应的历史问句。
  14. 根据权利要求11所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:
    对历史问句集中的历史问句进行分词处理,得到历史问句对应的分词结果;
    计算所述分词结果中的实体名词与所述关键词的jaccard相似度,得到所述历史问句与所述关键词的相似度;及
    通过比较各所述历史问句对应的相似度大小,对所述历史问句进行筛选,得到与所述关键词对应的历史问句。
  15. 根据权利要求11所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:
    获取用户咨询日志中的咨询问句;
    对所述咨询问句进行条款词行业专业术语识别,得到所述咨询问句中的关键词,并对所述咨询问句进行分词处理,丢弃所述问句中的实体名词,得到所述咨询问句中的句法特征词;
    以所述咨询问句对应的关键词和句法特征词作为输入数据,以所述咨询问句作为目标输出数据,构建训练数据集;及
    根据所述训练数据集,对初始文本生成模型进行训练,得到所述文本生成模型。
  16. 一个或多个存储有计算机可读指令的计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取历史问句和标准文档,提取所述标准文档中的关键词以及与所述关键词对应的释义语句;
    对所述历史问句进行分词处理,识别并丢弃所述历史问句中的实体名词,得到所述历史问句的句法特征词;
    将所述句法特征词与所述关键词进行组合,并将组合数据输入预先训练好的文本生成模型,得到与所述关键词对应的目标问句,其中,所述文本生成模型基于标注有关键词和句法特征词的训练样本训练得到;及
    根据与所述关键词对应的目标问句以及与所述关键词对应的释义语句,构建包括所述目标问句与所述释义语句的问答对。
  17. 根据权利要求16所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    按词性对所述历史问句进行分词处理,得到分词结果;
    对所述分词结果进行实体名词过滤,得到所述历史问句的句法结构和疑问词;及
    根据所述句法结构和所述疑问词,得到所述历史问句的句法特征词。
  18. 根据权利要求16所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    获取历史问句集和标准文档;
    提取所述标准文档中的关键词以及与所述关键词对应的释义语句;及
    将所述历史问句集中的历史问句与所述关键词进行相似度匹配,得到与所述关键词对应的历史问句。
  19. 根据权利要求16所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    对历史问句集中的历史问句进行分词处理,得到历史问句对应的分词结果;
    计算所述分词结果中的实体名词与所述关键词的jaccard相似度,得到所述历史问句与所述关键词的相似度;及
    通过比较各所述历史问句对应的相似度大小,对所述历史问句进行筛选,得到与所述关键词对应的历史问句。
  20. 根据权利要求16所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    获取用户咨询日志中的咨询问句;
    对所述咨询问句进行条款词行业专业术语识别,得到所述咨询问句中的关键词,并对所述咨询问句进行分词处理,丢弃所述问句中的实体名词,得到所述咨询问句中的句法特征词;
    以所述咨询问句对应的关键词和句法特征词作为输入数据,以所述咨询问句作为目标输出数据,构建训练数据集;及
    根据所述训练数据集,对初始文本生成模型进行训练,得到所述文本生成模型。
PCT/CN2021/090798 2020-11-04 2021-04-29 基于文本生成模型的问答语料生成方法和装置 WO2022095368A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011216642.7 2020-11-04
CN202011216642.7A CN112328762B (zh) 2020-11-04 2020-11-04 基于文本生成模型的问答语料生成方法和装置

Publications (1)

Publication Number Publication Date
WO2022095368A1 true WO2022095368A1 (zh) 2022-05-12

Family

ID=74324724

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090798 WO2022095368A1 (zh) 2020-11-04 2021-04-29 基于文本生成模型的问答语料生成方法和装置

Country Status (2)

Country Link
CN (1) CN112328762B (zh)
WO (1) WO2022095368A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116187346A (zh) * 2023-05-05 2023-05-30 世优(北京)科技有限公司 人机交互方法、装置、系统及介质
CN116756296A (zh) * 2023-08-18 2023-09-15 中联神帆(北京)科技有限公司 一种基于隐私保护的咨询信息管理方法及系统
CN116842148A (zh) * 2023-05-17 2023-10-03 北京易聊科技有限公司 无标注语料下的问答自动抽取方法及系统
CN116911311A (zh) * 2023-08-02 2023-10-20 北京市农林科学院 一种农业领域技术咨询问答方法
CN117093706A (zh) * 2023-10-19 2023-11-21 杭州烛微智能科技有限责任公司 一种试卷生成方法、系统、介质及电子设备
CN117992600A (zh) * 2024-04-07 2024-05-07 之江实验室 一种业务执行方法、装置、存储介质以及电子设备

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328762B (zh) * 2020-11-04 2023-12-19 平安科技(深圳)有限公司 基于文本生成模型的问答语料生成方法和装置
CN112949280B (zh) * 2021-03-02 2023-07-07 中国联合网络通信集团有限公司 一种数据处理方法和装置
CN112989205A (zh) * 2021-04-14 2021-06-18 北京有竹居网络技术有限公司 媒体文案推荐方法、装置、介质及电子设备
CN113157897B (zh) * 2021-05-26 2024-06-11 中国平安人寿保险股份有限公司 语料生成方法、装置、计算机设备及存储介质
CN113326691B (zh) * 2021-05-27 2023-07-28 北京百度网讯科技有限公司 数据处理方法和装置、电子设备、计算机可读介质
CN114328852B (zh) * 2021-08-26 2024-06-14 腾讯科技(深圳)有限公司 一种文本处理的方法、相关装置及设备
CN113808758B (zh) * 2021-08-31 2024-06-07 联仁健康医疗大数据科技股份有限公司 一种检验数据标准化的方法、装置、电子设备和存储介质
CN114254090A (zh) * 2021-12-08 2022-03-29 马上消费金融股份有限公司 问答知识库的扩充方法及装置
CN116069936B (zh) * 2023-02-28 2023-08-01 北京朗知网络传媒科技股份有限公司 一种数码传媒文章的生成方法和装置
CN116431838B (zh) * 2023-06-15 2024-01-30 北京墨丘科技有限公司 文献检索方法、装置、系统及存储介质
CN117350387B (zh) * 2023-12-05 2024-04-02 中水三立数据技术股份有限公司 一种基于水利知识平台的智能问答系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9348817B2 (en) * 2014-01-09 2016-05-24 International Business Machines Corporation Automatic generation of question-answer pairs from conversational text
CN107832374A (zh) * 2017-10-26 2018-03-23 平安科技(深圳)有限公司 标准知识库的构建方法、电子装置及存储介质
CN108763529A (zh) * 2018-05-31 2018-11-06 苏州大学 一种智能检索方法、装置和计算机可读存储介质
CN110390006A (zh) * 2019-07-23 2019-10-29 腾讯科技(深圳)有限公司 问答语料生成方法、装置和计算机可读存储介质
CN112328762A (zh) * 2020-11-04 2021-02-05 平安科技(深圳)有限公司 基于文本生成模型的问答语料生成方法和装置

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2564629C1 (ru) * 2014-03-31 2015-10-10 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Способ кластеризации результатов поиска в зависимости от семантики
CN104850539B (zh) * 2015-05-28 2017-08-25 宁波薄言信息技术有限公司 一种自然语言理解方法及基于该方法的旅游问答系统
US10769185B2 (en) * 2015-10-16 2020-09-08 International Business Machines Corporation Answer change notifications based on changes to user profile information
CN107305550A (zh) * 2016-04-19 2017-10-31 中兴通讯股份有限公司 一种智能问答方法及装置
CN108446286B (zh) * 2017-02-16 2023-04-25 阿里巴巴集团控股有限公司 一种自然语言问句答案的生成方法、装置及服务器
CN110019305B (zh) * 2017-12-18 2024-03-15 上海智臻智能网络科技股份有限公司 知识库扩展方法及存储介质、终端
CN108287822B (zh) * 2018-01-23 2022-03-01 北京容联易通信息技术有限公司 一种中文相似问题生成系统与方法
CN108804521B (zh) * 2018-04-27 2021-05-14 南京柯基数据科技有限公司 一种基于知识图谱的问答方法及农业百科问答系统
CN109145292B (zh) * 2018-07-26 2022-05-27 黑龙江工程学院 释义文本深度匹配模型构建方法与释义文本深度匹配方法
CN109977370B (zh) * 2019-03-19 2023-06-16 河海大学常州校区 一种基于文档结构树的问答对自动构建方法
CN110851576A (zh) * 2019-10-16 2020-02-28 迈达斯智能(深圳)有限公司 问答处理方法、装置、设备及可读介质
CN110941708B (zh) * 2019-11-04 2023-02-28 智器云南京信息科技有限公司 智能问答库建立方法、智能问答方法及装置、计算机设备
CN111597321B (zh) * 2020-07-08 2024-06-11 腾讯科技(深圳)有限公司 问题答案的预测方法、装置、存储介质及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9348817B2 (en) * 2014-01-09 2016-05-24 International Business Machines Corporation Automatic generation of question-answer pairs from conversational text
CN107832374A (zh) * 2017-10-26 2018-03-23 平安科技(深圳)有限公司 标准知识库的构建方法、电子装置及存储介质
CN108763529A (zh) * 2018-05-31 2018-11-06 苏州大学 一种智能检索方法、装置和计算机可读存储介质
CN110390006A (zh) * 2019-07-23 2019-10-29 腾讯科技(深圳)有限公司 问答语料生成方法、装置和计算机可读存储介质
CN112328762A (zh) * 2020-11-04 2021-02-05 平安科技(深圳)有限公司 基于文本生成模型的问答语料生成方法和装置

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116187346A (zh) * 2023-05-05 2023-05-30 世优(北京)科技有限公司 人机交互方法、装置、系统及介质
CN116842148A (zh) * 2023-05-17 2023-10-03 北京易聊科技有限公司 无标注语料下的问答自动抽取方法及系统
CN116842148B (zh) * 2023-05-17 2023-12-05 北京易聊科技有限公司 无标注语料下的问答自动抽取方法及系统
CN116911311A (zh) * 2023-08-02 2023-10-20 北京市农林科学院 一种农业领域技术咨询问答方法
CN116756296A (zh) * 2023-08-18 2023-09-15 中联神帆(北京)科技有限公司 一种基于隐私保护的咨询信息管理方法及系统
CN116756296B (zh) * 2023-08-18 2023-11-17 中联神帆(北京)科技有限公司 一种基于隐私保护的咨询信息管理方法及系统
CN117093706A (zh) * 2023-10-19 2023-11-21 杭州烛微智能科技有限责任公司 一种试卷生成方法、系统、介质及电子设备
CN117093706B (zh) * 2023-10-19 2024-01-09 杭州烛微智能科技有限责任公司 一种试卷生成方法、系统、介质及电子设备
CN117992600A (zh) * 2024-04-07 2024-05-07 之江实验室 一种业务执行方法、装置、存储介质以及电子设备
CN117992600B (zh) * 2024-04-07 2024-06-11 之江实验室 一种业务执行方法、装置、存储介质以及电子设备

Also Published As

Publication number Publication date
CN112328762A (zh) 2021-02-05
CN112328762B (zh) 2023-12-19

Similar Documents

Publication Publication Date Title
WO2022095368A1 (zh) 基于文本生成模型的问答语料生成方法和装置
US10102254B2 (en) Confidence ranking of answers based on temporal semantics
US10147051B2 (en) Candidate answer generation for explanatory questions directed to underlying reasoning regarding the existence of a fact
US9542496B2 (en) Effective ingesting data used for answering questions in a question and answer (QA) system
US9336485B2 (en) Determining answers in a question/answer system when answer is not contained in corpus
US10671929B2 (en) Question correction and evaluation mechanism for a question answering system
US9558263B2 (en) Identifying and displaying relationships between candidate answers
US10642874B2 (en) Using paraphrase metrics for answering questions
US9495463B2 (en) Managing documents in question answering systems
US10140272B2 (en) Dynamic context aware abbreviation detection and annotation
US9715531B2 (en) Weighting search criteria based on similarities to an ingested corpus in a question and answer (QA) system
US10147047B2 (en) Augmenting answer keys with key characteristics for training question and answer systems
US9535980B2 (en) NLP duration and duration range comparison methodology using similarity weighting
US9760828B2 (en) Utilizing temporal indicators to weight semantic values
US20160110415A1 (en) Using question answering (qa) systems to identify answers and evidence of different medium types
CN111417940A (zh) 支持复杂答案的证据搜索
US9842096B2 (en) Pre-processing for identifying nonsense passages in documents being ingested into a corpus of a natural language processing system
US9703773B2 (en) Pattern identification and correction of document misinterpretations in a natural language processing system
CN112651236B (zh) 提取文本信息的方法、装置、计算机设备和存储介质
WO2021208444A1 (zh) 电子病例自动生成方法、装置、设备及存储介质
US20210056261A1 (en) Hybrid artificial intelligence system for semi-automatic patent pinfringement analysis
US10902342B2 (en) System and method for scoring the geographic relevance of answers in a deep question answering system based on geographic context of an input question
US20230163988A1 (en) Computer-implemented system and method for providing an artificial intelligence powered digital meeting assistant
US20180081906A1 (en) System and Method for Scoring the Geographic Relevance of Answers in a Deep Question Answering System Based on Geographic Context of a Candidate Answer
US11157538B2 (en) System and method for generating summary of research document

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21888072

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21888072

Country of ref document: EP

Kind code of ref document: A1