WO2021139297A1 - 基于Transformer模型的问答方法、问答装置及存储装置 - Google Patents

基于Transformer模型的问答方法、问答装置及存储装置 Download PDF

Info

Publication number
WO2021139297A1
WO2021139297A1 PCT/CN2020/121199 CN2020121199W WO2021139297A1 WO 2021139297 A1 WO2021139297 A1 WO 2021139297A1 CN 2020121199 W CN2020121199 W CN 2020121199W WO 2021139297 A1 WO2021139297 A1 WO 2021139297A1
Authority
WO
WIPO (PCT)
Prior art keywords
question
sequence
answer
sentence
transformer model
Prior art date
Application number
PCT/CN2020/121199
Other languages
English (en)
French (fr)
Inventor
骆加维
吴信朝
周宸
周宝
陈远旭
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021139297A1 publication Critical patent/WO2021139297A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • This application relates to the technical field of natural language processing, and in particular to a question answering method, question answering device and storage device based on a Transformer model.
  • the traditional online question and answer system is based on assembly lines. Under the premise of single-round question and answer or domain knowledge question and answer, by pre-setting the knowledge base and the answers corresponding to the questions in the knowledge base, when the user asks a question, the actual intention of the user’s question is identified through the intention recognition module. After intent recognition, the screening range of the knowledge base is narrowed, the question is recalled, and the deep semantic similarity matching is performed through the deep learning model, and finally a text answer with a higher matching degree will be returned.
  • the end-to-end question answering model system has the following disadvantages compared with the traditional knowledge base question answering: 1. The intent to answer the question is not accurately recognized. 2. The answer is not humane enough. 3. The contextual connection in the dialogue process is not close enough, it is more like a simple question and answer in a single round of dialogue.
  • This application provides a question and answer method, question and answer device, and storage device based on a Transformer model, which can solve the problems of insufficient recognition of the intent to answer the question, insufficient humanization of the reply answer, and insufficient contextual connection in the dialogue process.
  • a technical solution adopted by this application is to provide a question and answer method based on the Transformer model, including:
  • An obtaining module which is used to obtain the question text input by the user, process the question text, and obtain the question sequence
  • a decoding module the processing module is coupled to the acquisition module, and is configured to decode the question sequence to obtain multiple candidate answers related to the question sequence;
  • a splicing module is coupled to the decoding module, and is configured to splice the question sequence with each candidate answer;
  • a scoring module which is coupled to the splicing module, is used to score each of the splicing results, and select the candidate answer corresponding to the highest score as the optimal answer of the question sequence.
  • another technical solution adopted in this application is to provide a storage device that stores a program file capable of implementing the above-mentioned question and answer method based on the Transformer model, and the program file implements the following steps when executed by a processor:
  • the beneficial effect of this application is: by inputting the question sequence into the decoding layer, multiple candidate answers related to the question sequence are obtained, which increases the diversity of the answers, and effectively avoids the mechanism of returning the same answer after the user inputs the question.
  • the question sequence is spliced with each candidate answer, each splicing result is scored, and the candidate answer corresponding to the highest score is selected as the optimal answer of the question sequence, which can strengthen the contextual relevance and effectively filter out Spoken reply.
  • FIG. 1 is a schematic diagram of a partial network structure of a Transformer model of an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a question and answer method based on a Transformer model in the first embodiment of the present application
  • FIG. 3 is a schematic flowchart of step S202 in FIG. 2;
  • FIG. 4 is a schematic flowchart of a question and answer method based on a Transformer model according to a second embodiment of the present application
  • FIG. 5 is a schematic structural diagram of a question and answer device based on a Transformer model according to an embodiment of the present application
  • Fig. 6 is a schematic structural diagram of a storage device according to an embodiment of the present application.
  • first”, “second”, and “third” in this application are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined with “first”, “second”, and “third” may explicitly or implicitly include at least one of the features.
  • "a plurality of” means at least two, such as two, three, etc., unless otherwise specifically defined. All directional indicators (such as up, down, left, right, front, back%) in the embodiments of this application are only used to explain the relative positional relationship between the components in a specific posture (as shown in the drawings) , Movement status, etc., if the specific posture changes, the directional indication will also change accordingly.
  • the network structure of the Transformer model in the embodiment of the present application includes a decoding layer 10 and a mutual information layer 20 located after the decoding layer 10, where the decoding layer 10 includes: a self-attention mechanism module 11 and a front Feeding network module 12 and normalization processing module 13.
  • Fig. 2 is a schematic flow chart of the question and answer method based on the Transformer model of the first embodiment of the present application. It should be noted that if there are substantially the same results, the method of the present application is not limited to the sequence of the process shown in Fig. 2. As shown in Figure 2, the method includes steps:
  • Step S201 Obtain the question text input by the user, and process the question text to obtain the question sequence.
  • the question text includes a question sentence and a dialogue sentence containing the question sentence; firstly, a label is inserted for the question sentence and the dialogue sentence, specifically, a start tag is inserted at the beginning of the question sentence, and a start tag is inserted at the end of the question sentence.
  • End tag insert a delimiting tag in the dialogue sentence, for example, "Beg"Query"Sep"Sen”Sep”Sen, Beg means the beginning of the question that opens the dialogue, Sep means the end of the question, and all subsequent dialogue sentences use Sep
  • dialogue can be opened at any time, except that after the dialogue opening sentence is marked once, the subsequent question and answer sentences are no longer distinguished, and the indiscriminate full splicing is used.
  • the contextual information association can be strengthened, based on the dialogue.
  • Information exchange is different from the question-and-answer in the traditional pipeline model.
  • the distinction between question and answer sentences is no longer obvious.
  • encoding and word embedding are performed on the question sentence after the label is inserted, and the question sentence sequence is obtained.
  • the word embedding in this embodiment adopts NLP general model technology.
  • the question sequence in this embodiment includes sequence coding and position coding, where the position coding is relative position coding, and the use of relative position coding can effectively improve the relevance of short-distance conversations.
  • Step S202 Decoding the question sequence to obtain multiple candidate answers related to the question sequence.
  • step S202 the input question sequence in this embodiment is spliced by the addition of sequence codes and position codes.
  • first input the question sequence Q1 into the decoding layer output a candidate answer A1, then concatenate Q1 and A1 and then input it into the decoding layer again, output another candidate answer A2, and then concatenate Q1 and A2 and then input the decoding again
  • another candidate answer A3 is output, and the loop is repeated multiple times to obtain candidate answers A1, A2, A3...
  • this step by inputting the question sequence into the decoding layer, multiple candidate answers related to the question sequence are obtained, which increases the diversity of the answers and effectively avoids the mechanical nature of returning the same answer after the user enters the question.
  • step S202 also includes the following steps executed in sequence:
  • Step S301 Use the self-attention mechanism module to perform feature extraction on the question sequence.
  • the self-attention mechanism module involves the attention mechanism at different positions in a single sequence, and can calculate the representation of the question sequence, thereby effectively improving the implicit semantic feature extraction capability of the text.
  • the self-attention mechanism module multiplies the input vector by the attention weight vector, and adds the bias vector to obtain the Enter the key value, value and query vector of the vector.
  • Step S302 Use the feedforward network module to perform nonlinear transformation on the feature extraction result.
  • step S302 the feedforward network module adopts an FFNN feedforward network, and the FFNN feedforward network performs a nonlinear transformation on the feature extraction result and projects it back to the dimensionality of the model.
  • Step S303 Use the normalization processing module to perform normalization processing on the nonlinear transformation result.
  • step S303 the normalization processing module uses the softmax function to perform normalization processing.
  • the normalization processing module ensures the uniformity of the distribution of the sample input and the final output, and can effectively accelerate the convergence.
  • step S202 is performed as follows:
  • the structure of the Transformer model includes an Encoder (encoder) and a Decoder (decoder).
  • the input part of the Transformer model is input to the encoder and decoder by Embedding (word vector) through Position Encoding (Position Encoding, PE).
  • word vector word vector
  • Position Encoding Position Encoding
  • pos refers to the position of the word in the sequence
  • d model is the dimension of the model
  • 2i represents the even-numbered dimension
  • 2i+1 represents the odd-numbered dimension
  • the encoder has two sub-layers, namely the Multi-head attention layer (multi-head attention mechanism) and the Feed-forward Networks layer (full link network).
  • the multi-head attention mechanism uses self-attention (self-attention mechanism) to learn the source
  • the full-link network performs the same operation on the vector at each position, including two linear transformations and a ReLU activation function.
  • the multi-head attention mechanism is composed of multiple self-attention mechanisms.
  • the multi-head attention mechanism of the mask uses the self-attention mechanism to learn the internal relationship of the target sentence, and then the output of this layer and the result passed by the encoder are input to the above multi-head attention mechanism.
  • the multi-head attention mechanism is not self-attention.
  • the mechanism is encoder-decoder attention, which is used to learn the relationship between the source sentence and the target sentence.
  • the multi-head attention mechanism In the multi-head attention mechanism, first calculate the similarity between K (key value) and Q (query vector) to obtain S (similarity), then normalize S through the softmax function to obtain the weight a, and finally calculate a and The weighted sum of V (value) obtains the attention vector, namely K (key value), V (value) and Q (query vector).
  • K (key value) and V (value) are the same as Q (query vector).
  • Q represents the output of the previous step of the decoder
  • K and V are the output from the encoder.
  • Each multi-head attention mechanism also includes an Add&Norm layer above.
  • Add stands for Residual Connection, which is used to prevent network degradation
  • Norm stands for Layer Normalization, which is used to activate the value of each layer. Normalization is performed, that is, the input is converted into data with a mean value of 0 and a variance of 1 to avoid data falling into the saturation region of the activation function.
  • the normalization layer calculates the mean and variance for each sample, not a batch of data.
  • the encoder and decoder of this embodiment are basically the same, the difference is that a Mask is added.
  • Mask can mask certain values so that they do not play a role when the parameters are updated.
  • the main purpose of using mask in the decoder is to ensure that the word at the i-th position can only use the first i-1 words when making predictions, and will not use future information.
  • Step S203 concatenate the question sequence with each candidate answer.
  • step 203 the input question sequence and the multiple candidate answers output in step 202 are respectively spliced to obtain multiple splicing results.
  • the splicing form is "Begin” Query "Sep” Ans, where Query represents a sequence of questions, and Ans represents candidate answers.
  • the question sequence Q1 is spliced with the candidate answers A1, A2, and A3, respectively, and the spliced results are "Begin” Q1, "Sep” A1, “Begin” Q1, "Sep” A2, “Begin” Q1, "Sep” A3, respectively.
  • Step S204 Score each splicing result, and select the candidate answer corresponding to the highest score as the optimal answer of the question sequence.
  • step S204 based on the joint probability distribution algorithm and the reverse scoring training model, the correlation between the question sequence and the candidate answer in each splicing result is calculated and the correlation is scored. The higher the correlation, the higher the corresponding score; The candidate answer corresponding to the highest score is used as the optimal answer of the question sequence, so that the final output answer is not only a suitable reply in the context of the previous order, but also a reply similar to the overall dialogue intent.
  • the question answering method based on the Transformer model of the first embodiment of the present application increases the diversity of answers by obtaining multiple candidate answers related to a question sequence, effectively avoiding the mechanical nature of returning the same answer after the user enters the question, and at the same time , Concatenate the question sequence with each candidate answer, score each concatenation result, and select the candidate answer corresponding to the highest score as the optimal answer to the question sequence, which can strengthen the contextual relevance and effectively filter out spoken responses .
  • FIG. 4 is a schematic flowchart of the question and answer method based on the Transformer model in the second embodiment of the present application. It should be noted that if there are substantially the same results, the method of the present application is not limited to the sequence of the process shown in FIG. 4. As shown in Figure 4, the method includes the steps:
  • Step S401 Construct a Transformer model.
  • the network structure of the Transformer model includes a decoding layer and a mutual information layer located after the decoding layer, where the decoding layer includes: a self-attention mechanism module, a feedforward network module, and a normalization processing module that are sequentially set.
  • Step S402 Use the loss function to optimize the Transformer model.
  • the loss function includes the loss function of the decoding layer and the loss function of the mutual information layer.
  • the maximum value of the superimposed loss deviation value is used as the loss deviation value of the Transformer model; the parameters of the Transformer model are updated according to the loss deviation value of the Transformer model.
  • Loss Max(Loss AR + Loss MMI ), where Loss represents the loss deviation value of the Transformer model, Loss AR represents the loss deviation value of the decoding layer, and Loss MMI represents the loss deviation value of the mutual information layer.
  • the loss deviation value of the Transformer model of this embodiment is the maximum value after the loss deviation value of the decoding layer and the loss deviation value of the mutual information layer are superimposed. Among them, the loss deviation value of the mutual information layer of this embodiment is a variable. In the process, take the result with the highest correlation between the current input question and the previous dialogue.
  • the loss deviation value of the decoding layer is calculated according to the following formula: Among them, P represents the probability, x represents the word, z and t represent the position of the word in the question text, z and t are integers between 1 and T, x t represents the word at position t, and x z ⁇ t represents The word before the t position.
  • P(m/n) represents the probability of the correlation between the current input question and the previous dialogue.
  • Steps S403 to S406 are similar to steps S201 to S204 in FIG. 2 and will not be described in detail here. Steps S401 and S402 in this embodiment can be executed before step S403 or after step S403.
  • the question answering party based on the Transformer model in the second embodiment of the present application optimizes the Transformer model to make the output more accurate and reliable.
  • Fig. 5 is a schematic structural diagram of a question and answer device based on a Transformer model according to an embodiment of the present application.
  • the question and answer device 50 includes an acquisition module 51, a decoding module 52, a splicing module 53 and a scoring module 54.
  • the obtaining module 51 is used to obtain the question text input by the user, process the question text, and obtain the question sequence.
  • the decoding module 52 is coupled to the obtaining module 51, and is used to decode the question sequence to obtain multiple candidate answers related to the question sequence.
  • the splicing module 53 is coupled to the decoding module 52, and is used for splicing the question sequence with each candidate answer.
  • the scoring module 54 is coupled with the splicing module 53 and is used to score each splicing result, and select the candidate answer corresponding to the highest score as the optimal answer of the question sequence.
  • FIG. 6 is a schematic structural diagram of a storage device according to an embodiment of the application.
  • the storage device in the embodiment of the present application stores a program file 61 that can implement all the above methods.
  • the program file 61 can be stored in the above storage device in the form of a software product, and includes a number of instructions to enable a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage device may be non-volatile or volatile.
  • the storage device includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), and random access memory (RAM, Random Access Memory). ), various media that can store program codes such as magnetic disks or optical discs, or terminal devices such as computers, servers, mobile phones, and tablets.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

本申请涉及自然语言处理技术领域,具体公开了一种基于Transformer模型的问答方法、问答装置及存储装置。该问答方法包括:获取用户输入的问句文本,对问句文本进行处理,得到问句序列;对问句序列进行解码,获得与问句序列相关的多个候选回答;将问句序列与每个候选回答进行拼接;对每个拼接结果进行打分,选取最高得分对应的候选回答作为问句序列的最优回答。通过上述方式,本申请能够解决回答问题意图识别不够准确,回复答案不够人性化以及对话过程中的上下文联系不够紧密的问题。

Description

基于Transformer模型的问答方法、问答装置及存储装置
本申请要求于2020年7月28日提交中国专利局、申请号为202010737212.3,发明名称为“基于Transformer模型的问答方法、问答装置及存储装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及自然语言处理技术领域,特别是涉及一种基于Transformer模型的问答方法、问答装置及存储装置。
背景技术
传统的线上问答系统的搭建方式是基于流水线的。在单轮问答或领域性知识问答的前提下,通过预先设置好知识库以及知识库内问题对应的答案,当用户提问的时候,首先通过意图识别模块,对用户问句的实际意图进行识别,经过意图识别,缩小知识库的筛选范围,对问题进行召回,再通过深度学习模型进行深度语义的相似度匹配,最终将返回匹配度较高的文本答案。除了基于流水线的方式,当前还有基于端对端的对话系统也正在快速发展。但是,发明人发现基于端对端的问答模型系统相比于传统知识库问答,存在以下弊端:1.回答问题意图识别不够准确。2.回复答案不够人性化。3.对话过程中的上下文联系不够紧密,更像是单轮对话的简单问答。
发明内容
本申请提供一种基于Transformer模型的问答方法、问答装置及存储装置,能够解决回答问题意图识别不够准确,回复答案不够人性化以及对话过程中的上下文联系不够紧密的问题。
为解决上述技术问题,本申请采用的一个技术方案是:提供一种基于Transformer模型的问答方法,包括:
获取用户输入的问句文本,对所述问句文本进行处理,得到问句序列;
对所述问句序列进行解码,获得与所述问句序列相关的多个候选回答;
将所述问句序列与每个所述候选回答进行拼接;
对每个所述拼接结果进行打分,选取最高得分对应的所述候选回答作为所述问句序列的最优回答。
为解决上述技术问题,本申请采用的另一个技术方案是:提供一种基于Transformer模型的问答装置,包括:
获取模块,所述获取模块用于获取用户输入的问句文本,对所述问句文本进行处理,得到问句序列;
解码模块,所述处理模块与所述获取模块耦接,用于对所述问句序列进行解码,获得与所述问句序列相关的多个候选回答;
拼接模块,所述拼接模块与所述解码模块耦接,用于将所述问句序列与每个所述候选回答进行拼接;
打分模块,所述打分模块与所述拼接模块耦接,用于对每个所述拼接结果进行打分,选取最高得分对应的所述候选回答作为所述问句序列的最优回答。
为解决上述技术问题,本申请采用的再一个技术方案是:提供一种存储装置, 存储有能够实现上述基于Transformer模型的问答方法的程序文件,所述程序文件被处理器执行时实现以下步骤:
获取用户输入的问句文本,对所述问句文本进行处理,得到问句序列;
对所述问句序列进行解码,获得与所述问句序列相关的多个候选回答;
将所述问句序列与每个所述候选回答进行拼接;
对每个所述拼接结果进行打分,选取最高得分对应的所述候选回答作为所述问句序列的最优回答。
本申请的有益效果是:通过将问句序列输入到解码层中,获得与问句序列相关的多个候选回答,增加了回答的多样性,有效避免用户输入问句后,返回同样答复的机械性,同时,将问句序列与每个候选回答进行拼接,对每个拼接结果进行打分,选取最高得分对应的候选回答作为问句序列的最优回答,能够强化上下文的关联性,有效筛除口语化回复。
附图说明
图1是本申请实施例的Transformer模型的部分网络结构示意图;
图2是本申请第一实施例的基于Transformer模型的问答方法的流程示意图;
图3是图2中步骤S202的流程示意图;
图4是本申请第二实施例的基于Transformer模型的问答方法的流程示意图;
图5是本申请实施例的基于Transformer模型的问答装置的结构示意图;
图6是本申请实施例的存储装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请中的术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”、“第三”的特征可以明示或者隐含地包括至少一个该特征。本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。本申请实施例中所有方向性指示(诸如上、下、左、右、前、后……)仅用于解释在某一特定姿态(如附图所示)下各部件之间的相对位置关系、运动情况等,如果该特定姿态发生改变时,则该方向性指示也相应地随之改变。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
本申请涉及人工智能技术领域,具体涉及自然语言处理技术。请参见图1,本申请实施例的Transformer模型的网络结构包括解码层10和设于解码层10之后的互信息层20,其中,解码层10包括:依次设置的自注意力机制模块11、前馈网络模块12以及归一化处理模块13。图2是本申请第一实施例的基于Transformer模型的 问答方法的流程示意图,需注意的是,若有实质上相同的结果,本申请的方法并不以图2所示的流程顺序为限。如图2所示,该方法包括步骤:
步骤S201:获取用户输入的问句文本,对问句文本进行处理,得到问句序列。
在步骤S201中,问句文本包括问句以及包含问句的对话句子;首先,对问句和对话句子插入标签,具体地,在问句的开始处插入开始标签,在问句的结尾处插入结束标签,在对话句子中插入分隔标签,例如,「Beg」Query「Sep」Sen「Sep」Sen,Beg表示对话开启的问句的开始,Sep表示问句的结尾,后续的对话句子全部用Sep进行分隔,本实施例可以任意时候开启对话交流,除了对话开启句子进行一次标记后,后续不再区分问句和答句,使用无差别全拼接,此外,还可以强化上下文信息关联,基于对话的信息互换,区别于传统流水线模式的一问一答,问句、答句的区分不再明显。然后,对插入标签后的问句进行编码和词嵌入处理,得到问句序列。本实施例的词嵌入采用NLP通用模型技术。本实施例的问句序列包括:序列编码和位置编码,其中,位置编码为相对位置编码,使用相对位置编码能够有效提升短距离对话的关联性。
步骤S202:对问句序列进行解码,获得与问句序列相关的多个候选回答。
在步骤S202中,本实施例输入的问句序列由序列编码和位置编码相加拼接而成。先将问句序列输入到解码层中,输出与问句序列相关的一个候选回答;再循环将问句序列与解码层的输出结果进行拼接后,再次输入到解码层中,获得多个候选回答。例如,首先将问句序列Q1输入到解码层中,输出一个候选回答A1,然后将Q1与A1拼接后再次输入解码层中,输出另一个候选回答A2,再将Q1与A2拼接后再次输入解码层中,输出再一个候选回答A3,重复多次循环以获得候选回答A1、A2、A3…。该步骤通过将问句序列输入到解码层中,获得与问句序列相关的多个候选回答,增加了回答的多样性,有效避免用户输入问句后,返回同样答复的机械性。
请参见图3,步骤S202还包括依次执行的以下各步骤:
步骤S301:采用自注意力机制模块对问句序列进行特征提取。
在步骤S301中,自注意力机制模块涉及单序列不同位置的注意力机制,并能够计算问句序列的表征,从而有效提升文本的隐含语义特征提取能力。在本实施例中,解码层输入一个向量(由序列编码和位置编码拼接而成)时,自注意力机制模块将输入的向量与注意力权重向量相乘,再加上偏置向量,获得该输入向量的键值、值和查询向量。
步骤S302:采用前馈网络模块对特征提取结果进行非线性变换。
在步骤S302中,前馈网络模块采用FFNN前馈网络,FFNN前馈网络对特征提取结果进行非线性变换,并且投影回模型的维度大小。
步骤S303:采用归一化处理模块对非线性变换结果进行归一化处理。
在步骤S303中,归一化处理模块采用softmax函数进行归一化处理,归一化处理模块保证样本输入与最终输出时的分布统一性,同时能够有效加速收敛。
在一具体实施例中,步骤S202的具体工作流程如下进行:Transformer模型的结构包括Encoder(编码器)和Decoder(解码器)。
在该实施例中,Transformer模型的输入部分由Embedding(词向量)经Position Encoding(位置编码,PE)后输入到编码器和解码器。在Transformer模型的输入中,是将词向量和位置编码的结果进行相加,然后输入到编码器/解码器中的。
具体地,PE的计算公式如下:
Figure PCTCN2020121199-appb-000001
Figure PCTCN2020121199-appb-000002
其中,pos是指词语在序列中的位置,d model是模型的维度,2i表示偶数维度,2i+1表示奇数维度。
其中,编码器有两个子层,分别为Multi-head attention层(多头注意力机制)和Feed-forward Networks层(全链接网络),多头注意力机制利用self-attention(自注意力机制)学习源句内部的关系,全链接网络对每个位置的向量分别进行相同的操作,包括两个线性变换和一个ReLU激活函数。
解码器中有三个子层,一个Masked multi-head attention层(掩码的多头注意力机制)、一个Multi-head attention层(多头注意力机制)和一个Feed-forward Networks层(全链接网络)。多头注意力机制是由多个自注意力机制组成的。掩码的多头注意力机制是利用自注意力机制学习目标句内部的关系,之后该层输出与编码器传过来的结果一起输入到上面的多头注意力机制,多头注意力机制并不是自注意力机制,而是encoder-decoder attention,用于学习源句与目标句之间的关系。
在多头注意力机制中,首先计算K(键值)与Q(查询向量)之间的相似度得到S(相似度),然后将S通过softmax函数进行归一化得到权重a,最后计算a与V(值)的加权和得到attention向量,即K(键值)、V(值)与Q(查询向量)。在自注意力机制中,K(键值)、V(值)与Q(查询向量)相同。在解码器中的多头注意力机制中,Q代表解码器上一步的输出,K和V是来自编码器的输出。
每个多头注意力机制的上方还包括一个Add&Norm层,Add表示残差连接(Residual Connection),用于防止网络退化,Norm表示归一化层(Layer Normalization),用于对每一层的激活值进行归一化,即把输入转化成均值为0,方差为1的数据,以避免数据落入激活函数的饱和区。归一化层是对每一个样本计算均值和方差,而不是一批数据。
该实施例的编码器与解码器基本相同,差别就在于增加了一个Mask掩码。Mask可以对某些值进行掩盖,使其在参数更新时不发挥作用。解码器中使用mask主要的目的是确保第i个位置上的单词在进行预测时只能用到前i-1个单词,不会用到未来的信息。
步骤S203:将问句序列与每个候选回答进行拼接。
在步骤203中,将输入的问句序列与步骤202中输出的多个候选回答分别进行拼接,得到多个拼接结果。拼接的形式为「Begin」Query「Sep」Ans,其中,Query表示问句序列,Ans表示候选回答。例如,将问句序列Q1分别与候选回答A1、A2、A3拼接,获得拼接结果分别为「Begin」Q1「Sep」A1、「Begin」Q1「Sep」A2、「Begin」Q1「Sep」A3。
步骤S204:对每个拼接结果进行打分,选取最高得分对应的候选回答作为问句序列的最优回答。
在步骤S204中,基于联合概率分布算法和反向打分训练模型计算每个拼接结果中问句序列与候选回答的相关性以及对相关性进行打分,相关性越高,对应的得分越高;选取最高得分对应的候选回答作为问句序列的最优回答,使得最终输出的回答不仅是从前序背景语境中的合适回复,同时还是与整体对话意图相近的回复。
本申请第一实施例的基于Transformer模型的问答方法通过获得与一个问句序列相关的多个候选回答,增加了回答的多样性,有效避免用户输入问句后,返回同样答复的机械性,同时,将问句序列与每个候选回答进行拼接,对每个拼接结果进行打分,选取最高得分对应的候选回答作为问句序列的最优回答,能够强化上下文的关联性,有效筛除口语化回复。
图4是本申请第二实施例的基于Transformer模型的问答方法的流程示意图,需 注意的是,若有实质上相同的结果,本申请的方法并不以图4所示的流程顺序为限。如图4所示,该方法包括步骤:
步骤S401:构建Transformer模型。
在步骤S401中,Transformer模型的网络结构包括解码层和设于解码层之后的互信息层,其中,解码层包括:依次设置的自注意力机制模块、前馈网络模块以及归一化处理模块。
步骤S402:采用损失函数对Transformer模型进行优化。
在步骤S402中,损失函数包括解码层的损失函数和互信息层的损失函数,首先计算解码层的损失偏差值和互信息层的损失偏差值;选取解码层的损失偏差值和互信息层的损失偏差值叠加后的最大值作为Transformer模型的损失偏差值;根据Transformer模型的损失偏差值更新Transformer模型的参数。
具体地,Transformer模型的损失偏差值的计算公式如下:
Loss=Max(Loss AR+Loss MMI),其中,Loss表示Transformer模型的损失偏差值,Loss AR表示解码层的损失偏差值,Loss MMI表示互信息层的损失偏差值。本实施例的Transformer模型的损失偏差值为解码层的损失偏差值与互信息层的损失偏差值叠加之后取最大值,其中,本实施例的互信息层的损失偏差值是一个变量,在计算过程中,取当前输入问句与前序对话的相关性最高的结果。
进一步地,解码层的损失偏差值按照如下公式进行计算:
Figure PCTCN2020121199-appb-000003
Figure PCTCN2020121199-appb-000004
其中,P表示概率,x表示词,z和t表示词在问句文本中的位置,z和t取1至T之间的整数,x t表示t位置上的词,x z<t表示在t位置之前的词。
互信息层的损失偏差值按照如下公式进行计算:Loss MMI=Max(P(m/n)),其中,P表示概率,n表示当前输入问句的向量,m表示当前输入问句之前的前序对话信息的向量,P(m/n)表示当前输入问句与前序对话的相关性的概率。
步骤S403至步骤S406与图2中的步骤S201至步骤S204类似,在此不再进行详细描述,本实施例的步骤S401、步骤S402可以在步骤S403之前也可以在步骤S403之后执行。
本申请第二实施例的基于Transformer模型的问答方在第一实施例的基础上,通过优化Transformer模型使得输出更加准确、可靠。
图5是本申请实施例的基于Transformer模型的问答装置的结构示意图。如图5所示,该问答装置50包括获取模块51、解码模块52、拼接模块53以及打分模块54。
获取模块51用于获取用户输入的问句文本,对问句文本进行处理,得到问句序列。
解码模块52与获取模块51耦接,用于对问句序列进行解码,获得与问句序列相关的多个候选回答。
拼接模块53与解码模块52耦接,用于将问句序列与每个候选回答进行拼接。
打分模块54与拼接模块53耦接,用于对每个拼接结果进行打分,选取最高得分对应的候选回答作为问句序列的最优回答。
参阅图6,图6为本申请实施例的存储装置的结构示意图。本申请实施例的存储装置存储有能够实现上述所有方法的程序文件61,其中,该程序文件61可以以软件产品的形式存储在上述存储装置中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各 个实施方式所述方法的全部或部分步骤。而前述的存储装置可以是非易失性,也可以是易失性,存储装置包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质,或者是计算机、服务器、手机、平板等终端设备。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
以上仅为本申请的实施方式,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种基于Transformer模型的问答方法,其中,所述问答方法包括:
    获取用户输入的问句文本,对所述问句文本进行处理,得到问句序列;
    对所述问句序列进行解码,获得与所述问句序列相关的多个候选回答;
    将所述问句序列与每个所述候选回答进行拼接;
    对每个所述拼接结果进行打分,选取最高得分对应的所述候选回答作为所述问句序列的最优回答。
  2. 根据权利要求1所述的问答方法,其中,所述Transformer模型的网络结构包括解码层和设于所述解码层之后的互信息层,所述对所述问句序列进行解码,获得与所述问句序列相关的多个候选回答的步骤包括:
    将所述问句序列输入到解码层中,输出与所述问句序列相关的一个所述候选回答;
    循环将所述问句序列与所述解码层的输出结果进行拼接后,再次输入到所述解码层中,获得多个所述候选回答。
  3. 根据权利要求2所述的问答方法,其中,所述解码层包括:依次设置的自注意力机制模块、前馈网络模块以及归一化处理模块;所述将所述问句序列输入到所述解码层中,输出与所述问句序列相关的一个所述候选回答的步骤包括:
    采用所述自注意力机制模块对所述问句序列进行特征提取;
    采用所述前馈网络模块对特征提取结果进行非线性变换;
    采用所述归一化处理模块对非线性变换结果进行归一化处理。
  4. 根据权利要求1所述的问答方法,其中,所述获取用户输入的问句文本,对所述问句文本进行处理,得到问句序列的步骤还包括:
    获取用户输入的问句文本,所述问句文本包括问句以及包含所述问句的对话句子;
    对所述问句和所述对话句子插入标签;
    对插入标签后的所述问句进行编码和词嵌入处理,得到所述问句序列,所述问句序列包括:序列编码和位置编码,所述位置编码为相对位置编码。
  5. 根据权利要求4所述的问答方法,其中,所述对所述问句和所述对话句子插入标签的步骤包括;
    在所述问句的开始处插入开始标签,在所述问句的结尾处插入结束标签,在所述对话句子中插入分隔标签。
  6. 根据权利要求1所述的问答方法,其中,所述对每个所述拼接结果进行打分,选取最高得分对应的所述候选回答作为所述问句序列的最优回答的步骤包括:
    基于联合概率分布算法计算每个所述拼接结果中所述问句序列与所述候选回答的相关性;
    对所述相关性进行打分,所述相关性的程度越高,对应的得分越高;
    选取最高得分对应的所述候选回答作为所述问句序列的最优回答。
  7. 根据权利要求1所述的问答方法,其中,所述问答方法还包括:
    构建所述Transformer模型,所述Transformer模型的网络结构包括解码层和设于所述解码层之后的互信息层;
    采用损失函数对所述Transformer模型进行优化。
  8. 根据权利要求7所述的问答方法,其中,所述采用损失函数对所述Transformer模型进行优化的步骤还包括:
    计算所述解码层的损失偏差值和所述互信息层的损失偏差值;
    选取所述解码层的损失偏差值和所述互信息层的损失偏差值叠加后的最大值作为所述Transformer模型的损失偏差值;
    根据所述Transformer模型的损失偏差值更新所述Transformer模型的参数。
  9. 一种基于Transformer模型的问答装置,其中,所述问答装置包括:
    获取模块,所述获取模块用于获取用户输入的问句文本,对所述问句文本进行处理,得到问句序列;
    解码模块,所述处理模块与所述获取模块耦接,用于对所述问句序列进行解码,获得与所述问句序列相关的多个候选回答;
    拼接模块,所述拼接模块与所述解码模块耦接,用于将所述问句序列与每个所述候选回答进行拼接;
    打分模块,所述打分模块与所述拼接模块耦接,用于对每个所述拼接结果进行打分,选取最高得分对应的所述候选回答作为所述问句序列的最优回答。
  10. 一种存储装置,其中,存储有能够实现基于Transformer模型的问答方法的程序文件,所述程序文件被处理器执行时实现以下步骤:
    获取用户输入的问句文本,对所述问句文本进行处理,得到问句序列;
    对所述问句序列进行解码,获得与所述问句序列相关的多个候选回答;
    将所述问句序列与每个所述候选回答进行拼接;
    对每个所述拼接结果进行打分,选取最高得分对应的所述候选回答作为所述问句序列的最优回答。
  11. 根据权利要求10所述的存储装置,其中,所述Transformer模型的网络结构包括解码层和设于所述解码层之后的互信息层。
  12. 根据权利要求11所述的存储装置,其中,所述对所述问句序列进行解码,获得与所述问句序列相关的多个候选回答的步骤包括:
    将所述问句序列输入到解码层中,输出与所述问句序列相关的一个所述候选回答;
    循环将所述问句序列与所述解码层的输出结果进行拼接后,再次输入到所述解码层中,获得多个所述候选回答。
  13. 根据权利要求12所述的存储装置,其中,所述解码层包括:依次设置的自注意力机制模块、前馈网络模块以及归一化处理模块。
  14. 根据权利要求13所述的存储装置,其中,所述将所述问句序列输入到所述解码层中,输出与所述问句序列相关的一个所述候选回答的步骤包括:
    采用所述自注意力机制模块对所述问句序列进行特征提取;
    采用所述前馈网络模块对特征提取结果进行非线性变换;
    采用所述归一化处理模块对非线性变换结果进行归一化处理。
  15. 根据权利要求10所述的存储装置,其中,所述获取用户输入的问句文本,对所述问句文本进行处理,得到问句序列的步骤还包括:
    获取用户输入的问句文本,所述问句文本包括问句以及包含所述问句的对话句子;
    对所述问句和所述对话句子插入标签;
    对插入标签后的所述问句进行编码和词嵌入处理,得到所述问句序列,所述问句序列包括:序列编码和位置编码,所述位置编码为相对位置编码。
  16. 根据权利要求15所述的存储装置,其中,所述对所述问句和所述对话句子插入标签的步骤包括;
    在所述问句的开始处插入开始标签,在所述问句的结尾处插入结束标签,在所述对话句子中插入分隔标签。
  17. 根据权利要求10所述的存储装置,其中,所述对每个所述拼接结果进行打分,选取最高得分对应的所述候选回答作为所述问句序列的最优回答的步骤包括:
    基于联合概率分布算法计算每个所述拼接结果中所述问句序列与所述候选回答的相关性;
    对所述相关性进行打分,所述相关性的程度越高,对应的得分越高;
    选取最高得分对应的所述候选回答作为所述问句序列的最优回答。
  18. 根据权利要求10所述的存储装置,其中,所述问答方法还包括:
    构建所述Transformer模型,所述Transformer模型的网络结构包括解码层和设于所述解码层之后的互信息层;
    采用损失函数对所述Transformer模型进行优化。
  19. 根据权利要求18所述的存储装置,其中,所述采用损失函数对所述Transformer模型进行优化的步骤还包括:
    计算所述解码层的损失偏差值和所述互信息层的损失偏差值;
    选取所述解码层的损失偏差值和所述互信息层的损失偏差值叠加后的最大值作为所述Transformer模型的损失偏差值;
    根据所述Transformer模型的损失偏差值更新所述Transformer模型的参数。
  20. 根据权利要求19所述的存储装置,其中,所述互信息层的损失偏差值按照如下公式进行计算:Loss MMI=Max(P(m/n)),其中,P表示概率,n表示当前输入问句的向量,m表示当前输入问句之前的前序对话信息的向量,P(m/n)表示当前输入问句与前序对话的相关性的概率。
PCT/CN2020/121199 2020-07-28 2020-10-15 基于Transformer模型的问答方法、问答装置及存储装置 WO2021139297A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010737212.3 2020-07-28
CN202010737212.3A CN111881279A (zh) 2020-07-28 2020-07-28 基于Transformer模型的问答方法、问答装置及存储装置

Publications (1)

Publication Number Publication Date
WO2021139297A1 true WO2021139297A1 (zh) 2021-07-15

Family

ID=73201394

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/121199 WO2021139297A1 (zh) 2020-07-28 2020-10-15 基于Transformer模型的问答方法、问答装置及存储装置

Country Status (2)

Country Link
CN (1) CN111881279A (zh)
WO (1) WO2021139297A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704443A (zh) * 2021-09-08 2021-11-26 天津大学 一种融合显式和隐式个性化信息的对话生成方法
CN115080715A (zh) * 2022-05-30 2022-09-20 重庆理工大学 基于残差结构和双向融合注意力的跨度提取阅读理解方法
CN116595339A (zh) * 2023-07-19 2023-08-15 东方空间技术(山东)有限公司 一种航天数据的智能处理方法、装置及设备
CN116737888A (zh) * 2023-01-11 2023-09-12 北京百度网讯科技有限公司 对话生成模型的训练方法和答复文本的确定方法、装置
CN117992599A (zh) * 2024-04-07 2024-05-07 腾讯科技(深圳)有限公司 基于大语言模型的问答方法、装置及计算机设备
CN118093837A (zh) * 2024-04-23 2024-05-28 豫章师范学院 基于Transformer双解码结构的心理支持问答文本生成方法与系统

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612881B (zh) * 2020-12-28 2022-03-25 电子科技大学 基于Transformer的中文智能对话方法
CN113064972A (zh) * 2021-04-12 2021-07-02 平安国际智慧城市科技股份有限公司 智能问答方法、装置、设备及存储介质
CN113704437B (zh) * 2021-09-03 2023-08-11 重庆邮电大学 一种融合多头注意力机制和相对位置编码的知识库问答方法
CN114328908A (zh) * 2021-11-08 2022-04-12 腾讯科技(深圳)有限公司 一种问答语句质检方法、装置及相关产品
CN116737894B (zh) * 2023-06-02 2024-02-20 深圳市客一客信息科技有限公司 基于模型训练的智能机器人服务系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190354567A1 (en) * 2018-05-18 2019-11-21 Google Llc Universal transformers
CN110647619A (zh) * 2019-08-01 2020-01-03 中山大学 一种基于问题生成和卷积神经网络的常识问答方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502627A (zh) * 2019-08-28 2019-11-26 上海海事大学 一种基于多层Transformer聚合编码器的答案生成方法
CN110543552B (zh) * 2019-09-06 2022-06-07 网易(杭州)网络有限公司 对话交互方法、装置及电子设备
CN110543557B (zh) * 2019-09-06 2021-04-02 北京工业大学 一种基于注意力机制的医疗智能问答系统的构建方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190354567A1 (en) * 2018-05-18 2019-11-21 Google Llc Universal transformers
CN110647619A (zh) * 2019-08-01 2020-01-03 中山大学 一种基于问题生成和卷积神经网络的常识问答方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MANTCHS: "Detailed Explanation of the Network Structure of Each Layer of Transformer; Interviewing Essentials; Code Implementation", 26 September 2019 (2019-09-26), pages 1 - 17, XP009528999, Retrieved from the Internet <URL:https://blog.csdn.net/weixin_41510260/article/details/101445016> *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704443A (zh) * 2021-09-08 2021-11-26 天津大学 一种融合显式和隐式个性化信息的对话生成方法
CN113704443B (zh) * 2021-09-08 2023-10-13 天津大学 一种融合显式和隐式个性化信息的对话生成方法
CN115080715A (zh) * 2022-05-30 2022-09-20 重庆理工大学 基于残差结构和双向融合注意力的跨度提取阅读理解方法
CN115080715B (zh) * 2022-05-30 2023-05-30 重庆理工大学 基于残差结构和双向融合注意力的跨度提取阅读理解方法
CN116737888A (zh) * 2023-01-11 2023-09-12 北京百度网讯科技有限公司 对话生成模型的训练方法和答复文本的确定方法、装置
CN116737888B (zh) * 2023-01-11 2024-05-17 北京百度网讯科技有限公司 对话生成模型的训练方法和答复文本的确定方法、装置
CN116595339A (zh) * 2023-07-19 2023-08-15 东方空间技术(山东)有限公司 一种航天数据的智能处理方法、装置及设备
CN117992599A (zh) * 2024-04-07 2024-05-07 腾讯科技(深圳)有限公司 基于大语言模型的问答方法、装置及计算机设备
CN118093837A (zh) * 2024-04-23 2024-05-28 豫章师范学院 基于Transformer双解码结构的心理支持问答文本生成方法与系统

Also Published As

Publication number Publication date
CN111881279A (zh) 2020-11-03

Similar Documents

Publication Publication Date Title
WO2021139297A1 (zh) 基于Transformer模型的问答方法、问答装置及存储装置
US11288593B2 (en) Method, apparatus and device for extracting information
US9753914B2 (en) Natural expression processing method, processing and response method, device, and system
WO2020177282A1 (zh) 一种机器对话方法、装置、计算机设备及存储介质
WO2022095380A1 (zh) 基于ai的虚拟交互模型生成方法、装置、计算机设备及存储介质
US20230385560A1 (en) System and Method for Temporal Attention Behavioral Analysis of Multi-Modal Conversations in a Question and Answer System
CN112214591B (zh) 一种对话预测的方法及装置
CN111241237A (zh) 一种基于运维业务的智能问答数据处理方法及装置
WO2020233131A1 (zh) 问答处理方法、装置、计算机设备和存储介质
CN108897896B (zh) 基于强化学习的关键词抽取方法
CN111813909A (zh) 一种智能问答方法和装置
US20230394247A1 (en) Human-machine collaborative conversation interaction system and method
WO2020192307A1 (zh) 基于深度学习的答案抽取方法、装置、计算机设备和存储介质
CN112818106B (zh) 一种生成式问答的评价方法
US20230092736A1 (en) Intelligent question-answering processing method and system, electronic device and storage medium
CN112559706B (zh) 对话生成模型的训练方法、对话方法、设备以及存储介质
US20230008897A1 (en) Information search method and device, electronic device, and storage medium
CN111079418A (zh) 命名体识别方法、装置、电子设备和存储介质
CN113220856A (zh) 一种基于中文预训练模型的多轮对话系统
CN114648016A (zh) 一种基于事件要素交互与标签语义增强的事件论元抽取方法
CN113342948A (zh) 一种智能问答方法及装置
CN116975288A (zh) 文本处理方法及文本处理模型训练方法
CN116662502A (zh) 基于检索增强的金融问答文本生成方法、设备及存储介质
CN113326367B (zh) 基于端到端文本生成的任务型对话方法和系统
CN114281948A (zh) 一种纪要确定方法及其相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911409

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20911409

Country of ref document: EP

Kind code of ref document: A1