WO2020192307A1 - 基于深度学习的答案抽取方法、装置、计算机设备和存储介质 - Google Patents

基于深度学习的答案抽取方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020192307A1
WO2020192307A1 PCT/CN2020/075553 CN2020075553W WO2020192307A1 WO 2020192307 A1 WO2020192307 A1 WO 2020192307A1 CN 2020075553 W CN2020075553 W CN 2020075553W WO 2020192307 A1 WO2020192307 A1 WO 2020192307A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
document
question
word
processed
Prior art date
Application number
PCT/CN2020/075553
Other languages
English (en)
French (fr)
Inventor
杨雪峰
徐爽
巨颖
孙宁远
Original Assignee
深圳追一科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳追一科技有限公司 filed Critical 深圳追一科技有限公司
Publication of WO2020192307A1 publication Critical patent/WO2020192307A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0281Customer communication at a business location, e.g. providing product or service information, consulting

Definitions

  • This application relates to an answer extraction method, device, computer equipment and storage medium based on deep learning.
  • Document-based automatic question answering systems generally include three modules, namely question processing, text retrieval and answer processing.
  • the workflow is as follows: the user asks questions in natural language, and the question processing module processes the questions; then the chapter retrieval module in the system retrieves relevant documents containing answers from the massive document collection according to the processed questions; finally, the answer processing module passes Some answer extraction techniques extract document blocks containing answers from related documents and return them to users.
  • an answer extraction method, device, computer device, and storage medium based on deep learning are provided.
  • An answer extraction method based on deep learning including:
  • the document content between the extraction start position and the extraction end position is determined as the answer corresponding to the user question, and the answer is displayed.
  • An answer extraction device based on deep learning including:
  • An obtaining module is used to obtain user questions and obtain document content related to the user questions according to the user questions;
  • a processing module which is used to determine the extraction start position and the extraction end position in the document content based on the deep learning model
  • the display module is used to determine the document content between the extraction start position and the extraction end position as the answer corresponding to the user question, and display the answer.
  • a computer device including a memory and one or more processors, the memory stores computer readable instructions, when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:
  • the document content between the extraction start position and the extraction end position is determined as the answer corresponding to the user question, and the answer is displayed.
  • One or more non-volatile storage media storing computer-readable instructions.
  • the computer-readable instructions When executed by one or more processors, the one or more processors perform the following steps:
  • the document content between the extraction start position and the extraction end position is determined as the answer corresponding to the user question, and the answer is displayed.
  • Fig. 1 is a schematic flowchart of an answer extraction method based on deep learning according to one or more embodiments.
  • Fig. 2 is a schematic flowchart of a method for determining an extraction start position and an extraction end position in document content based on a deep learning model according to one or more embodiments.
  • Fig. 3 is a schematic structural diagram of a deep learning network model according to one or more embodiments.
  • Fig. 4 is a schematic structural diagram of an answer extraction device based on deep learning according to one or more embodiments.
  • Fig. 5 is a schematic structural diagram of a processing module in a deep learning-based answer extraction device in one or more embodiments.
  • Fig. 6 is an application scenario diagram of a deep learning-based answer extraction method according to one or more embodiments.
  • Figure 7 is a block diagram of a computer device according to one or more embodiments.
  • Fig. 1 shows a method for extracting answers based on deep learning according to an exemplary embodiment.
  • the method provided in this embodiment includes the following steps:
  • Step S11 obtaining user questions, and obtaining document content related to the user questions according to the user questions;
  • a customer service robot can receive user questions input by the user, the user can input in natural language, and the input form can be text, voice, etc.
  • the user question is "Can stocks be traded on the same day?”.
  • the customer service robot can use related technology to obtain the relevant document content.
  • the relevant document content is the document content containing the user's question.
  • the specific acquisition of the document content related to the user's problem can be achieved by using known technologies, which will not be described in detail here.
  • a related document content is, for example, "T+0 trading is a trading method launched by the Shenzhen Stock Exchange at the end of 1993. It means that investors buy (sell) stocks (or futures) on the same day after confirming the transaction. Buying stocks can be sold on the same day, and selling stocks on the same day can be bought on the same day.
  • Step S12 based on the deep learning model, determine the extraction start position and the extraction end position in the document content
  • Step S13 Determine the document content between the extraction start position and the extraction end position as the answer corresponding to the user question, and display the answer.
  • the extraction start position and the extraction end position are determined to be "1995" and “sell” respectively, then “From January 1, 1995, in order to ensure the stability of the stock market and prevent excessive crises, China
  • the "T+1" trading system is implemented, that is, stocks bought on the same day cannot be sold until the next trading day.” Determined as the answer to the user's question "Can stocks be traded on the same day?" You can show the answer later.
  • the display mode can be text or voice.
  • the user question is first obtained, and the document content related to the user question is obtained according to the user question; then based on the deep learning model, the extraction start position and the extraction end position are determined in the document content ; The document content between the extraction start position and the extraction end position is determined as the answer corresponding to the user question, and the answer is displayed.
  • this answer extraction method does not need to manually extract features to formulate various matching rules to extract answers, directly input the obtained user questions and the document content related to the user questions into the deep learning model, and the user can be obtained from the document content.
  • the most suitable answer that matches the question simplifies the answer extraction process and improves the accuracy of the answer, thereby greatly improving the efficiency and quality of automatic customer service.
  • the determination of the extraction start position and the extraction end position in the document content based on the deep learning model includes the following steps:
  • step S21 the user question to be processed and the document content to be processed are obtained respectively according to the user question and the document content, the user question to be processed and the document content to be processed are respectively segmented, and each Word vector conversion is performed to obtain the first question matrix and the first document matrix;
  • the first question matrix and the first document matrix can be obtained corresponding to the document content and the user question, which are represented by word vectors in FIG.
  • step S22 the first document matrix is processed so that the processed first document matrix contains question information, and the processed first document matrix and the first question matrix are respectively coded to obtain The second document matrix and the second question matrix;
  • the first question matrix and the first document matrix are respectively self-encoded in S32 to obtain the second document matrix and the second question matrix.
  • Step S23 based on the attention mechanism, interactively process the second document matrix and the second question matrix to obtain a third document matrix;
  • Step S24 performing self-matching processing on the third document matrix based on the attention mechanism to obtain a fourth document matrix
  • Step S25 based on the pointer network, determine the extraction start position and the extraction end position in the document content according to the fourth document matrix and the second question matrix.
  • step S21 obtaining the user question to be processed and the document content to be processed respectively according to the user question and the document content, including:
  • the user question is repeated multiple times, and the repeated user questions are spliced together to obtain the user question to be processed, wherein the number of times the user question is repeated is the total number of the document content.
  • the content of the document to be processed is composed of several documents related to the question asked by the user. For example, according to the keywords in the question asked by the user, the content of the document is retrieved and asked from the massive documents on the Internet. For the k documents with the most similar questions, all the contents of the k documents are spliced together to form the document content to be passed into the deep learning model; correspondingly, the user questions to be processed are also spliced by a single question asked by k users Into.
  • the user question and the document content are segmented separately, and the word vector conversion is performed on each word segmentation.
  • the word vector conversion is completed by the pre-trained word vector model.
  • the user question (denoted as q) and the content of the document to be processed (denoted as c) are passed into the pre-trained word vector model.
  • the word vector model maps the user question and each word in the document content to a 300-dimensional full real number
  • the vector is the word vector (or called the parameter matrix), and the first question matrix (denoted as q_emb) and the first document matrix (denoted as c_emb) are obtained; the word vector model is used to represent the mapping of words to word vectors.
  • step S22 the first document matrix is processed so that the processed first document matrix contains problem information, including:
  • the word co-occurrence feature is determined, and the word co-occurrence feature is spliced to the tail of the corresponding document word vector in the first document matrix to obtain the processed first document matrix.
  • the word co-occurrence feature includes: a first word co-occurrence feature and/or a second word co-occurrence feature, the word co-occurrence feature is determined, and the word co-occurrence feature is spliced to the first document
  • the tail of the corresponding document word vector in the matrix includes:
  • the word co-occurrence feature is determined to be the second value, where the first value and the second value are both fixed values, which are used to indicate that the word in the document content is Stating the presence or absence of the user question, and splicing the first word co-occurrence feature to the end of the word vector corresponding to the word in the first document matrix; and/or,
  • the first word co-occurrence feature is used to represent the document content Whether the word of has appeared in the user question
  • the second word co-occurrence feature is used to indicate the similarity between the word in the document content and the word in the user question.
  • first value may be 1, and the second value may be 0.
  • each word vector corresponds to the word in the document content to be processed is the same as the word corresponding to at least one word vector in the first question matrix, it is determined
  • the first word co-occurrence feature (denoted as wiq b ) is the first value, which is 1; otherwise, the first word co-occurrence feature is determined to be the second value, which is 0. That is, it is judged whether the word in the content of the document to be processed has appeared in the user question to be processed. If it has occurred, the first word co-occurrence feature is 1, otherwise it is 0, and the determined first word co-occurrence feature is spliced into The end of the corresponding word vector in the first document matrix.
  • the formula for calculating the co-occurrence feature of the first word is: The meaning of the formula is: if the j-th word in the content of the document to be processed is the same as the i-th word in the user question to be processed, the first word co-occurrence feature corresponding to the j-th word in the document is 1, otherwise 0.
  • the value range of the second word co-occurrence feature is [0, 1]; the calculation formula for the second word co-occurrence feature is as follows:
  • v wiq is the parameter matrix obtained by pre-training
  • x j represents the j-th word vector in the document word vector
  • q i represents the i-th word vector in the question word vector
  • sim i, j represents the document content
  • the processed first document matrix is obtained; for example, the dimension of each word vector is 300, and the dimension of the document word vector before splicing is [batch_size,seq_len,300], then each of the document word vectors The dimension of the document word vector becomes [batch_size,seq_len,302] after concatenating the two word co-occurrence features after the word vector.
  • the processed first document matrix obtained through the above process contains problem information.
  • batch_size represents the number of each batch of the document word vector obtained after batch processing
  • seq_len represents the length of the document word vector.
  • first word co-occurrence feature and second word co-occurrence feature both indicate paragraph length. If a certain word vector in the first document matrix is the same as at least one word vector in the first document matrix, then the first The first word co-occurrence feature added at the end of the corresponding word vector in a document matrix (value is 1), that is, the value corresponding to the added dimension of the corresponding word vector is 1, otherwise, the value corresponding to the added dimension of the word vector is 0 ; Similarly, if the similarity value between a word vector in the first document matrix and a word vector in the first question matrix is 0.6, then the second word added at the end of the corresponding word vector in the first document matrix shares The current feature is 0.6, that is, the value of the added dimension at the end of the word vector is 0.6.
  • the separately encoding the processed first document matrix and the first question matrix to obtain the second document matrix and the second question matrix respectively includes:
  • the processed first document matrix is used as the input of the preset first GRU network, the first GRU network is used to process the processed first document matrix, and the output of the first GRU network is processed
  • the layer output is determined as the second document matrix
  • the input question matrix is determined according to the first question matrix, the input question matrix is used as the input of a preset second GRU network, the second GRU network is used to process the input question matrix, and the second The output of the output layer of the GRU network is determined as the second problem matrix.
  • the determining an input question matrix according to the first question matrix includes:
  • the first GRU network is different from the second GRU network, determine the first question matrix as the input question matrix; or,
  • the first GRU network is the same as the second GRU network, corresponding to each word vector in the first question matrix, splicing preset features at the end of each word vector to obtain the spliced question matrix, And the spliced question matrix is determined as the input question matrix, wherein the number of the preset features is the same as the number of the word co-occurrence features.
  • the processed first document matrix since the processed first document matrix is added with word co-occurrence features, it has a different dimension from the first problem matrix.
  • different encoders are required.
  • the processed first document matrix and the first question matrix are encoded by the same encoder, and the tails of each word vector in the first question matrix need to be spliced with two preset features to obtain the processed first question matrix .
  • the dimension of the obtained first problem matrix after processing is the same as the dimension of the first document matrix after processing.
  • the two preset features can both be but are not limited to 1, that is, the values in the two dimensions added at the end of each word vector in the first question matrix are both 1.
  • the encoder is a Gated Recurrent Unit (GRU).
  • GRU Gated Recurrent Unit
  • the processed first document matrix and the processed first problem matrix are encoded through a preset number of layers of GUR networks to obtain the first document matrix.
  • the second document matrix obtained at this time contains the information of the user's problem.
  • Step S22 is equivalent to "reading with the question for the first time", to understand the question and the document through GRU coding.
  • step S23 based on the attention mechanism, interactively process the second document matrix and the second question matrix to obtain a third document matrix; including:
  • the interaction matrix is used as an input of a preset third GRU network, the interaction matrix is processed by the third GRU network, and the output layer output of the third GRU network is determined as a third document matrix.
  • processing the second document matrix and the second question matrix to obtain an interaction matrix so that the interaction matrix contains document and question comparison information, including:
  • the word pair similarity matrix between the document and the question is calculated, and the first normalized matrix and the second normalized matrix are determined according to the word pair similarity matrix ;
  • the first normalized matrix and the second question matrix are used to perform weighting operations, and the first normalized matrix, the second normalized matrix, and the first normalized matrix are used. Perform a weighting operation on the two document matrices, and respectively calculate the first interactive attention matrix and the second interactive attention matrix;
  • the first interactive attention matrix For the second document matrix, the first interactive attention matrix, the dot product matrix of the second document matrix and the first interactive attention matrix, the second document matrix and the second interactive attention
  • the dot product matrix of the force matrix is spliced in sequence, and the spliced matrix is determined as the interaction matrix.
  • a trilinear function is used to calculate the similarity matrix of the document-question word pair according to the second document matrix and the second question matrix.
  • the weight matrix W 0 is randomly initialized.
  • the row normalization matrix is the first normalization matrix
  • the column normalization matrix is the second normalization matrix
  • the document-question attention matrix is calculated; the calculation formula is: Among them, Q T represents the transposed matrix of the second problem matrix, and A represents the document-question attention matrix;
  • C T represents the transposed matrix of the second document matrix
  • B represents the document-question attention matrix
  • the second document matrix C, the attention matrix A of the document-question, the second document matrix C and the attention matrix A of the document-question are dot-multiplied, and the second document matrix C and the problem-
  • the attention matrix B of the document is the matrix obtained by dot multiplying and concatenating in turn to obtain the concatenated matrix, namely After the spliced matrix is coded by a layer of GRU network, a third document matrix is obtained, which combines the comparison information of the question and the document.
  • the dimension of the document matrix C is [batch_size, paragraph length, hidden]
  • the dimension of the problem matrix is [batch_size, problem length, hidden]
  • the matrix is spliced on the last dimension of the previous matrix.
  • the dimension of the following matrix is [batch_size, paragraph length, hidden*4].
  • hidden represents the number of neurons in the hidden layer after encoding.
  • Step S23 is equivalent to "read the document with the question” and "read the question with the document”.
  • the second document matrix C and the second question matrix Q obtained in the previous step are "compared", that is, through the attention mechanism, the attention of each word in the document content is calculated for each word in the user's question Distribution, that is, the problem word vector is used to measure the document word vector; for each word in the user’s question, the attention distribution of each word in the document content is calculated, that is, the document word vector is used to measure the problem word vector.
  • step S24 based on the attention mechanism, perform self-matching processing on the third document matrix to obtain a fourth document matrix; including:
  • the self-matching matrix is used as the input of the preset bidirectional cyclic neural network, the self-matching matrix is processed by the bidirectional cyclic neural network, and the hidden layer output of the bidirectional cyclic neural network is determined as the fourth document matrix .
  • processing the third document matrix based on the attention mechanism to obtain the self-matching matrix includes:
  • the third document matrix and the self-matching attention matrix are spliced to obtain a spliced matrix, and the self-matching matrix is determined according to the spliced matrix.
  • the self-matching similarity matrix between the document and the document itself is calculated; the formula is:
  • v T represents the vector representation of the third document matrix obtained in step S23 that combines the comparison information of the question and the document
  • j represents the word index
  • the self-matching similarity matrix between the document and the document itself is normalized to obtain the normalized matrix; the formula is: among them, Represents the normalized matrix.
  • the self-matching attention matrix of each word in the document corresponding to the entire document is calculated; its formula is: Its meaning is: the weight and the word vector in the third document matrix corresponding to the weight are summed to obtain a self-matching attention matrix c t , which represents the semantic vector of each word in the document corresponding to the entire document. Join the third document matrix and the self-matching attention matrix to get the joined matrix
  • the determining the self-matching matrix according to the spliced matrix includes:
  • weighting is performed on the spliced matrix, and the weighted matrix is determined as a self-matching matrix.
  • the resulting spliced matrix can be directly determined as a self-matching matrix; or, in order to control the importance of different parts of the document content, an additional gate control mechanism can be introduced to adaptively control the importance of different parts.
  • Sex That is, first calculate the weight matrix (denoted as g t ) according to the sigmoid function and the self-matching attention matrix c t corresponding to each word in the document and the third document matrix to control the importance of different parts of the document. Then, g t And the resulting spliced matrix Perform dot product calculation to obtain a new matrix, and determine the obtained new matrix as a self-matching matrix; its formula is:
  • W g represents the parameter matrix obtained by random initialization, Represents the resulting spliced matrix, Represents the self-matching matrix processed by the gate control mechanism.
  • the self-matching matrix is used as input into the Bi-directional Recurrent Neural Network (BiRNN) to obtain the output value of the hidden layer node of the BiRNN, and the output value is determined as the fourth Document matrix. It is expressed as:
  • BiRNN hidden layer node It is the output of BiRNN hidden layer node at time t.
  • step S24 it is equivalent to reading the document again and doing the "third reading” for the understanding of the document and the understanding of the document for the problem.
  • step S25 based on the pointer network and according to the fourth document matrix and the second question matrix, determining an extraction start position and an extraction end position in the document content includes:
  • an attention matrix is calculated according to the fourth document matrix and the restored question matrix.
  • the attention matrix is used to represent the semantic representation of the question word vector to the document word vector.
  • the force matrix includes the attention matrix at the first moment and the attention matrix at the second moment;
  • the probability value corresponding to each document word is calculated according to the attention matrix, and the document word corresponding to the maximum probability value at the first moment is determined as the extraction start position of the corresponding document content, and the first The document word corresponding to the maximum probability value at the second moment is determined as the end position of the extraction of the corresponding document content;
  • the document content to be selected is determined, and the extraction start position and extraction corresponding to the document content to be selected are determined.
  • the end position is determined as the final extraction start position and extraction end position to be adopted.
  • the user question is composed of a number of individual questions asked by the user at first, here, the user question is first restored to a single question, that is, the encoded second question matrix obtained in step S22 is restored to a single The problem matrix, and then calculate the attention matrix of the single problem matrix for the fourth document matrix obtained in step S24, and then calculate the start/end positions corresponding to the maximum probability value in the attention matrix corresponding to the fourth document matrix, as The start/end position of the answer extraction, because the content of the document is spliced by the contents of several related documents, it will correspond to the start position and end position, and calculate each pair of start position and end position respectively For the product of the corresponding probability values, the group with the largest product of probability is selected as the final answer extraction start/end position.
  • the content of a document is obtained by splicing the contents of 5 related documents, 5 starting positions and ending positions will be finally obtained, and the probability values corresponding to the 5 pairs of actual positions and ending positions will be multiplied to obtain one of the 5 probabilities Product, from which the starting position and ending position corresponding to the value with the highest probability are selected, and the starting/ending position is extracted with the most final answer.
  • the semantic vector of each word vector in the document with respect to the document itself is calculated according to the fourth document matrix, and the semantic vector is used as the input of the RNN to obtain the output value of the hidden layer node of the RNN; the formula is:
  • v T represents the vector representation of the problem matrix obtained after encoding in step S22 and has been restored to the length of a single problem
  • s j represents the similarity matrix between a single problem and the single problem itself
  • a i represents the normalized matrix obtained after normalizing the similarity matrix
  • r Q represents the problem matrix obtained after the weighted summation of a single problem matrix.
  • the similarity matrix between the document and the question is calculated; the formula is:
  • h P represents the fourth document matrix
  • the maximum value of the similarity matrix between the normalized document and the question is calculated, and the position corresponding to the maximum value is determined as the extraction start position and the extraction end position.
  • the formula is:
  • p t is The maximum value in corresponds to the start/end position of the answer in the document content
  • the maximum position corresponding to a 1 is the starting position of answer extraction
  • the maximum position corresponding to a 2 is the ending position of answer extraction
  • n represents the document length.
  • the five sets of probability values are with with with with as well as with Calculate the product of 5 sets of probability values respectively to get as well as select as well as
  • the document words corresponding to a set of probability values corresponding to the maximum value are used as the starting and ending positions of answer extraction.
  • t in the above formula represents the time index; and the value of t in the formula involved in step S25 is two, namely the first time and the second time, and the maximum value of the first time
  • the document word corresponding to the probability value is determined as the extraction start position of the corresponding document content, and the document word corresponding to the maximum probability value at the second moment is determined as the extraction end position of the corresponding document content corresponding to the answer extraction start position.
  • Fig. 4 is a schematic structural diagram of an answer extraction device based on deep learning according to another exemplary embodiment.
  • the answer extraction device based on deep learning includes:
  • the obtaining module 41 is configured to obtain user questions and obtain document content related to the user questions according to the user questions;
  • the processing module 42 is configured to determine the extraction start position and the extraction end position in the document content based on the deep learning model
  • the display module 43 is configured to determine the document content between the extraction start position and the extraction end position as the answer corresponding to the user question, and display the answer.
  • the processing module 42 includes:
  • the first processing unit 421 is configured to obtain the user problem to be processed and the content of the document to be processed according to the user problem and the document content, respectively, and perform processing on the user problem to be processed and the content of the document to be processed respectively.
  • Word segmentation and word vector conversion for each word to obtain the first question matrix and the first document matrix;
  • the second processing unit 422 is configured to process the first document matrix so that the processed first document matrix contains question information, and separately perform processing on the processed first document matrix and the first question matrix Perform coding to obtain the second document matrix and the second question matrix respectively;
  • the third processing unit 423 is configured to perform interactive processing on the second document matrix and the second question matrix based on the attention mechanism to obtain a third document matrix;
  • the fourth processing unit 424 is configured to perform self-matching processing on the third document matrix based on the attention mechanism to obtain a fourth document matrix
  • the fifth processing unit 425 is configured to determine an extraction start position and an extraction end position in the document content based on the pointer network and according to the fourth document matrix and the second question matrix.
  • the first processing unit 421 is specifically configured to: splice all the document content to obtain the document content to be processed; and/or repeat the user question multiple times, and perform repeated user questions Splicing to obtain user questions to be processed, wherein the number of repetitions of the user questions is the total number of document contents.
  • the second processing unit 422 is specifically configured to: determine the word co-occurrence feature, and splice the word co-occurrence feature to the tail of the corresponding document word vector in the first document matrix to obtain the processed first document matrix.
  • a document matrix A document matrix.
  • the word co-occurrence feature includes: a first word co-occurrence feature and/or a second word co-occurrence feature, and the second processing unit 422 is specifically configured to correspond to each of the document content to be processed Word, if the word is the same as at least one word in the user question to be processed, determine that the first word co-occurrence feature corresponding to the word in the document content to be processed is the first value; otherwise, determine the first
  • the word co-occurrence feature is a second value, where the first value and the second value are both fixed values, which are respectively used to indicate whether the word in the document content appears or does not appear in the user question, and
  • the first word co-occurrence feature is spliced to the tail of the word vector corresponding to the word in the first document matrix; and/or each word vector in the first document matrix is calculated separately from the first document matrix.
  • the similarity value between each word vector in the problem matrix, and, corresponding to each word vector in the first document matrix, the similarity value is normalized, and the normalized similarity value As the second word co-occurrence feature, it is spliced to the tail of the corresponding word vector in the first document matrix.
  • the second processing unit 422 is specifically configured to: use the processed first document matrix as an input of a preset first GRU network, and use the first GRU network to perform processing on the processed first
  • the document matrix is processed, and the output layer output of the first GRU network is determined as a second document matrix; and an input question matrix is determined according to the first question matrix, and the input question matrix is used as a preset second GRU
  • the second GRU network is used to process the input problem matrix, and the output of the output layer of the second GRU network is determined as the second problem matrix.
  • the second processing unit 422 is specifically configured to: if the first GRU network is different from the second GRU network, determine the first question matrix as the input question matrix; or, if the first GRU network A GRU network is the same as the second GRU network, corresponding to each word vector in the first question matrix, splicing preset features at the end of each word vector to obtain the spliced question matrix, and compare the The spliced question matrix is determined as the input question matrix, wherein the number of the preset features is the same as the number of the word co-occurrence features.
  • the third processing unit 423 is specifically configured to: based on the attention mechanism, process the second document matrix and the second question matrix to obtain an interaction matrix, so that the interaction matrix contains documents and questions The comparison information; the interaction matrix is used as the input of the preset third GRU network, the interaction matrix is processed by the third GRU network, and the output layer output of the third GRU network is determined as the first Three document matrix.
  • the third processing unit 423 is specifically configured to: calculate a word pair similarity matrix between the document and the question according to the second document matrix and the second question matrix, and according to the word pair similarity matrix Determine the first normalized matrix and the second normalized matrix; based on the attention mechanism, use the first normalized matrix and the second question matrix to perform weighting operations, and use the first normalized matrix Perform weighting operations on the matrix, the second normalized matrix, and the second document matrix to obtain the first interactive attention matrix and the second interactive attention matrix respectively; An interactive attention matrix, the dot product matrix of the second document matrix and the first interactive attention matrix, and the dot product matrix of the second document matrix and the second interactive attention matrix are sequentially spliced, and The spliced matrix is determined as the interaction matrix.
  • the fourth processing unit 424 is specifically configured to: process the third document matrix based on the attention mechanism to obtain a self-matching matrix; use the self-matching matrix as the input of the preset bidirectional cyclic neural network , Using the bidirectional cyclic neural network to process the self-matching matrix, and determining the hidden layer output of the bidirectional cyclic neural network as a fourth document matrix.
  • the fourth processing unit 424 is specifically configured to: calculate a self-matching similarity matrix between the document and the document according to the third document matrix, and determine a self-matching weighting matrix according to the self-matching similarity matrix; Attention mechanism, using the self-matching weighting matrix to perform a weighting operation on the third document matrix to calculate a self-matching attention matrix; splicing the third document matrix and the self-matching attention matrix to obtain a splicing After the matrix, the self-matching matrix is determined according to the spliced matrix.
  • the fourth processing unit 424 is specifically configured to: determine the spliced matrix as a self-matching matrix; or, based on a gate control mechanism, perform weighting processing on the spliced matrix, and weighting the weighted matrix The matrix is determined to be a self-matching matrix.
  • the fifth processing unit 425 is specifically configured to: perform restoration processing on the second problem matrix to obtain a restored problem matrix; corresponding to each document content, according to the fourth document matrix and the restored problem matrix
  • the problem matrix of, the attention matrix is calculated, the attention matrix is used to represent the semantic representation of the problem word vector to the document word vector, the attention matrix includes the attention matrix at the first moment and the attention matrix at the second moment
  • Corresponding to each document content calculate the probability value corresponding to each document word according to the attention matrix, and determine the document word corresponding to the maximum probability value at the first moment as the starting position of the extraction of the corresponding document content, and
  • the document word corresponding to the maximum probability value at the second moment is determined as the extraction end position of the corresponding document content;
  • the product to be selected is determined according to the product of the extraction start position and the extraction end position corresponding to different document content And determine the extraction start position and the extraction end position corresponding to the document content to be selected as the final extraction start position and the extraction end position.
  • the user question is first obtained through the obtaining module, and the document content related to the user question is obtained according to the user question; then the processing module is used to determine the extraction from the document content based on the deep learning model The start position and the extraction end position; the document content between the extraction start position and the extraction end position is determined by the display module as the answer corresponding to the user question, and the answer is displayed.
  • the use of this answer extraction device does not need to manually extract features to formulate various matching rules to extract answers, and directly input the obtained user questions and the document content related to the user questions into the deep learning model, and the user can be obtained from the document content.
  • the most suitable answer that matches the question simplifies the answer extraction process and improves the accuracy of the answer, thereby greatly improving the efficiency and quality of automatic customer service.
  • Each module in the above-mentioned deep learning-based answer extraction device can be implemented in whole or in part by software, hardware and a combination thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • the answer extraction method based on deep learning can be applied to the application environment as shown in FIG. 6.
  • the terminal 61 and the server 62 communicate through the network.
  • the terminal 61 obtains the user question, and obtains the document content related to the user question according to the user question; based on the deep learning model, determines the extraction start position and the extraction end position in the document content; and extracts the The document content between the start position and the extraction end position is determined as the answer corresponding to the user question, and the answer is displayed.
  • the terminal 61 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 62 may be implemented by an independent server or a server cluster composed of multiple servers.
  • a computer device is provided.
  • the computer device may be a terminal 61 or a server 62 as shown in FIG. 6, and its internal structure diagram may be as shown in FIG. 7.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer programs, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer program computer readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store document content data related to user problems.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program computer readable instructions are executed by the processor to realize an answer extraction method based on deep learning.
  • FIG. 7 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device including a memory and one or more processors.
  • the memory stores computer readable instructions.
  • the computer readable instructions are executed by one or more processors, the one or more processors can realize any one of the present application.
  • the steps of the deep learning-based answer extraction method provided in the embodiment.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the one or more processors implement any one of the embodiments of the present application. Provide the steps of the answer extraction method based on deep learning.
  • each part of this application can be implemented by hardware, software, firmware, or a combination thereof.
  • multiple steps or methods can be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a logic gate circuit for implementing logic functions on resource signals
  • PGA programmable gate array
  • FPGA field programmable gate array
  • the functional units in the various embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium.
  • the aforementioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • ROM read only memory
  • PROM programmable ROM
  • EPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Finance (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于深度学习的答案抽取方法、装置和存储介质,该方法包括:获取用户问题,以及,根据所述用户问题,获取与所述用户问题相关的文档内容(S11);基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置(S12);将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定为所述用户问题所对应的答案,并展示所述答案(S13)。该方法不需要人工提取特征来制定各种匹配规则来提取答案,直接将获取到的用户问题以及与用户问题相关的文档内容输入到深度学习模型便可以从文档内容中得到与用户问题相匹配的最合适的答案,简化了答案提取过程,且提高了答案准确性,从而大大提高了自动客服的效率和质量。

Description

基于深度学习的答案抽取方法、装置、计算机设备和存储介质
相关申请的交叉引用
本申请要求于2019年03月22日提交中国专利局,申请号为2019102251350,申请名称为“基于深度学习的答案抽取方法、装置和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种基于深度学习的答案抽取方法、装置、计算机设备和存储介质。
背景技术
目前,许多商家为了减小客服人员的工作量提高办公效率,都会使用“智能客服”来自动回答客户的一些问题,这种“智能客服”大多是一种基于文档的自动问答系统。基于文档的自动问答系统一般包含三个模块,即问题处理、篇章检索和答案处理这三个模块。其工作流程为,用户以自然语言提出问题,问题处理模块对问题进行处理;然后系统中的篇章检索模块根据处理后的问题从海量文档集中检索到包含答案的相关文档;最后,答案处理模块通过一些答案抽取技术从相关文档中提取出包含答案的文档块,并返回给用户。
相关技术中,这种自动问答系统在答案处理模块中,针对不同类型的问题,往往有不同的答案抽取方法。比如,对于简单事实型问题,可以基于词袋模型简单匹配答案,即在文档句段中抽取和预期答案类型相一致的命名实体作为候选答案;也可以基于表层模式匹配答案,其基本思想是问题的答案和问句关键词之间总有某些特定的表层关系,因此,算法不使用太多深层的语言处理,而是从文档句段中抽取出满足表层规则模式的候选答案。这种答案提取方法需要人工提取特征来制定各种匹配规则来提取答案,使得答案提取过程繁琐,且提取到的答案准确性降低,影响自动客服的效率和质量。
发明内容
根据本申请公开的各种实施例,提供一种基于深度学习的答案抽取方法、装置、计算机设备和存储介质。
一种基于深度学习的答案抽取方法,包括:
获取用户问题,以及,根据所述用户问题获取与所述用户问题相关的文档内容;
基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置;及
将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定为所述用户问题所对应的答案,并展示所述答案。
一种基于深度学习的答案抽取装置,包括:
获取模块,用于获取用户问题,以及,根据所述用户问题获取与所述用户问题相关的文档内容;
处理模块,用于基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置;及
显示模块,用于将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定为所述用户问题所对应的答案,并展示所述答案。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:
获取用户问题,以及,根据所述用户问题获取与所述用户问题相关的文档内容;
基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置;及
将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定为所述用户问题所对应的答案,并展示所述答案。
一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
获取用户问题,以及,根据所述用户问题获取与所述用户问题相关的文档内容;
基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置;及
将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定为所述用户问题所对应的答案,并展示所述答案。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1是根据一个或多个实施例示中一种基于深度学习的答案抽取方法的流程示意图。
图2是根据一个或多个实施例示中一种基于深度学习模型在文档内容中确定抽取起始位置和抽取结束位置的方法的流程示意图。
图3是根据一个或多个实施例示中一种深度学习网络模型的结构示意图。
图4是根据一个或多个实施例示中一种基于深度学习的答案抽取装置的结构示意图。
图5是一个或多个实施例示中一种基于深度学习的答案抽取装置中处理模块的结构示意图。
图6为根据一个或多个实施例中一种基于深度学习的答案抽取方法的应用场景图。
图7为根据一个或多个实施例中计算机设备的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的方法和装置的例子。
图1是根据一示例性实施例示出的一种基于深度学习的答案抽取方法。
如图1所示,本实施例提供的方法包括以下步骤:
步骤S11,获取用户问题,以及,根据所述用户问题获取与所述用户问题相关的文档内容;
例如,客服机器人可以接收用户输入的用户问题,用户可以以自然语言输入,输入形式可以是文本,语音等。比如,用户问题是“股票可以当天买卖吗?”。
在获取到用户问题后,客服机器人可以采用相关技术获取到相关的文档内容,比如,相关的文档内容是包含该用户问题的文档内容。具体的获取与用户问题相关的文档内容可以采用已知技术实现,在此不再详述。基于上述的用户问题,一个相关的文档内容比如是“T+0交易是深交所1993年底推出的一种交易办法,意思是,投资者买(卖)股票(或期货)当天确认成交后,当天买入股票可以当天卖出,当天卖出股票又可当天买入的一种 交易。1995年1月1日起,为了保证股票市场的稳定,防止过度危机,中国实行“T+1”交易制度,即当日买进的股票,必须要到下一个交易日才能卖出。同时,对资金仍然实行“T+0”,即当日回笼的资金马上可以使用。B股股票使用T+1,资金适用T+3。”。
步骤S12,基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置;
该步骤的进一步内容可以参见后续相关描述。
步骤S13,将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定为所述用户问题所对应的答案,并展示所述答案。
比如,基于深度学习模型,确定抽取起始位置和抽取结束位置分别是“1995”和“卖出”,则将“1995年1月1日起,为了保证股票市场的稳定,防止过度危机,中国实行“T+1”交易制度,即当日买进的股票,必须要到下一个交易日才能卖出。”确定为用户问题“股票可以当天买卖吗?”所对应的答案。之后可以展示该答案。展示方式可以是文本或语音等形式。
本实施例中,首先获取用户问题,以及,根据所述用户问题,获取与所述用户问题相关的文档内容;然后基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置;将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定为所述用户问题所对应的答案,并展示所述答案。采用这种答案抽取方法不需要人工提取特征来制定各种匹配规则来提取答案,直接将获取到的用户问题以及与用户问题相关的文档内容输入到深度学习模型便可以从文档内容中得到与用户问题相匹配的最合适的答案,简化了答案提取过程,且提高了答案准确性,从而大大提高了自动客服的效率和质量。
进一步的,参见图2,所述基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置,包括以下步骤:
步骤S21,根据所述用户问题和所述文档内容分别得到待处理的用户问题和待处理的文档内容,分别对所述待处理的用户问题和所述待处理的文档内容进行分词,并对各个词进行词向量转换,得到第一问题矩阵和第一文档矩阵;
结合图3所示的深度学习网络模型的结构示意图,在S31中可以对应文档内容和用户问题得到第一问题矩阵和第一文档矩阵,图3中用词向量表示。
步骤S22,对所述第一文档矩阵进行处理,使得处理后的第一文档矩阵包含问题信息,以及,对所述处理后的第一文档矩阵和所述第一问题矩阵分别进行编码,分别得到第二文档矩阵和第二问题矩阵;
如图3所示,在S32中对第一问题矩阵和第一文档矩阵分别进行自编码后可以得到第二文档矩阵和第二问题矩阵。
步骤S23,基于注意力机制,对所述第二文档矩阵和所述第二问题矩阵进行交互处理,得到第三文档矩阵;
如图3所示,在S33中进行基于注意力的混合交互处理后,得到第三文档矩阵。
步骤S24,基于注意力机制,对所述第三文档矩阵进行自匹配处理,得到第四文档矩阵;
如图3所示,在S34中进行自注意力处理,以得到第四文档矩阵。
步骤S25,基于指针网络,根据所述第四文档矩阵和所述第二问题矩阵,在所述文档内容中确定抽取起始位置和抽取结束位置。
如图3所示,在S35中进行预测,以得到最终展示的答案。
在步骤S21中,根据所述用户问题和所述文档内容分别得到待处理的用户问题和待处理的文档内容,包括:
对所有所述文档内容进行拼接,得到待处理的文档内容;和/或,
对所述用户问题重复多次,并对重复的用户问题进行拼接,得到待处理的用户问题,其中,所述用户问题重复的次数为所述文档内容的总个数。
具体的,待处理的文档内容是若干个与用户提问的问题相关的文档的内容拼接而成的,比如,根据用户提问的问题中的关键词,在网络上的海量文档中检索出与提问的问题最相近的k篇文档,将k篇文档的的所有内容拼接在一起,形成要传入深度学习模型的文档内容;相应的,待处理的用户问题也是由k个用户提问的单个问题拼接而成的。
得到待处理的文档内容和待处理的用户问题后,分别对用户问题和文档内容进行分词,并对各个分词进行词向量转换,词向量转换是由预先训练的词向量模型完成的,将待处理的用户问题(记为q)和待处理的文档内容(记为c)传入预先训练的词向量模型,词向量模型将用户问题和文档内容中的每个词映射为一个300维的全实数向量,即词向量(或称为参数矩阵),从而得到第一问题矩阵(记为q_emb)和第一文档矩阵(记为c_emb);其中词向量模型用于表示词到词向量的映射。
在步骤S22中,对所述第一文档矩阵进行处理,使得处理后的第一文档矩阵包含问题信息,包括:
确定词共现特征,并将所述词共现特征拼接到所述第一文档矩阵中的相应文档词向量 的尾部,得到处理后的第一文档矩阵。
进一步的,所述词共现特征包括:第一词共现特征和/或第二词共现特征,所述确定词共现特征,并将所述词共现特征拼接到所述第一文档矩阵中的相应文档词向量的尾部,包括:
对应所述待处理的文档内容中的每个词,如果所述词与待处理的用户问题中的至少一个词相同,则确定所述待处理的文档内容中所述词对应的第一词共现特征为第一值,否则,确定第一词共现特征为第二值,其中,所述第一值和所述第二值均为固定值,分别用于表示文档内容中的词在所述用户问题中出现或不出现,以及,将所述第一词共现特征拼接到所述第一文档矩阵中所述词所对应的词向量的尾部;和/或,
分别计算所述第一文档矩阵中的各个词向量与所述第一问题矩阵中的各个词向量之间的相似度数值,以及,对应所述第一文档矩阵中的每个词向量,对所述相似度数值进行归一化,将归一化后的相似度数值作为第二词共现特征,拼接到所述第一文档矩阵中的相应词向量的尾部。
具体的,为了让待处理的文档内容中能包含一些用户问题的信息,需要在第一文档矩阵中的每一个词向量中加入两种词共现特征,对应于人们正常阅读时的思维,如果用户问题中的词在文档内容中的某个位置出现了,那么该位置附近的内容很有可能就是答案。以添加两个词共现特征为例,在第一文档矩阵中的每个词向量中加入第一词共现特征和第二词共现特征,第一词共现特征用来表示文档内容中的词是否在用户问题中出现过,第二词共现特征用来表示文档内容中的词与用户问题中的词的相似度。
可以理解的是,上述第一值可以是1,第二值可以是0。
在具体实施时,对应第一文档矩阵中的每一个词向量,如果每一个词向量对应于待处理的文档内容中的词与第一问题矩阵中的至少一个词向量对应的词相同,则确定第一词共现特征(记为wiq b)为第一值,即为1;否则,确定第一词共现特征为第二值,即为0。即,判断待处理的文档内容中的单词是否在待处理的用户问题中出现过,如果出现过,第一词共现特征为1,否则为0,将确定的第一词共现特征拼接到第一文档矩阵中相应的词向量尾部。
需要说明的是,计算第一词共现特征的公式为:
Figure PCTCN2020075553-appb-000001
该公式的含义为:如果待处理的文档内容中的第j个单词与待处理的用户问题中的第i个单词相同,则文档中的第j个单词对应的第一词共现特征就为1,否则为0。
然后,分别计算第一文档矩阵中的各个词向量与第一问题矩阵中的各个词向量之间的相似度数值,将计算得到的相似度数值进行归一化,将归一化后的值确定为第二词共现特征。
需要说明的是,第二词共现特征的取值范围是[0,1];第二词共现特征的计算公式如下:
Figure PCTCN2020075553-appb-000002
Figure PCTCN2020075553-appb-000003
其中,v wiq是预先训练得到的参数矩阵,x j表示文档词向量中的第j个词向量,q i表示问题词向量中的第i个词向量,sim i,j表示文档内容中的第j个单词与用户问题中第i个单词的相似度分数,
Figure PCTCN2020075553-appb-000004
表示对sim i,j进行softmax归一化,
Figure PCTCN2020075553-appb-000005
表示文档中的第j个单词的第二词共现特征。
在确定了第一文档矩阵中的每个词向量对应的第一词共现特征和第二词共现特征后,将第一词共现特征和第二词共现特征拼接到第一文档矩阵中相应词向量的尾部,得到处理后的第一文档矩阵;比如,每个词向量维度是300,拼接前文档词向量的维度是[batch_size,seq_len,300],那么将文档词向量中的每个词向量后边拼接两种词共现特征后文档词向量的维度就变成了[batch_size,seq_len,302]。
通过上述过程得到的处理后的第一文档矩阵便包含了问题信息。
需要说明的是,将文档内容和用户问题映射成文档词向量和问题词向量后,还需要对文档词向量和问题词向量进行批量处理。上述batch_size表示把得到的文档词向量按批处理后每一批的个数;seq_len表示文档词向量长度。
需要说明的是,上述第一词共现特征和第二词共现特征均表示段落长度,如果第一文档矩阵中的某个词向量与第一文档矩阵中至少一个词向量相同,那么,第一文档矩阵中相应词向量尾部增加的第一词共现特征(值为1),即相应词向量对应增加的维度上的数值为1,否则,该词向量对应增加的维度上的数值为0;同样的,如果第一文档矩阵中的某个词向量与第一问题矩阵中的某个词向量的相似度值为0.6,那么第一文档矩阵中的相应词向量尾部增加的第二词共现特征为0.6,即,该词向量尾部对应增加的维度上的数值为0.6。
进一步的,所述对所述处理后的第一文档矩阵和所述第一问题矩阵分别进行编码,分别得到第二文档矩阵和第二问题矩阵,包括:
以所述处理后的第一文档矩阵作为预设的第一GRU网络的输入,采用所述第一GRU网络对所述处理后的第一文档矩阵进行处理,将所述第一GRU网络的输出层输出确定为第二文档矩阵;以及,
根据所述第一问题矩阵确定输入问题矩阵,将所述输入问题矩阵作为预设的第二GRU网络的输入,采用所述第二GRU网络对所述输入问题矩阵进行处理,将所述第二GRU网络的输出层输出确定为第二问题矩阵。
进一步的,所述根据所述第一问题矩阵确定输入问题矩阵,包括:
如果所述第一GRU网络与所述第二GRU网络不同,则将所述第一问题矩阵确定为输入问题矩阵;或者,
如果所述第一GRU网络与所述第二GRU网络相同,则对应所述第一问题矩阵中的每个词向量,在每个词向量的尾部拼接预设特征,得到拼接后的问题矩阵,并对所述拼接后的问题矩阵确定为输入问题矩阵,其中,所述预设特征的个数与所述词共现特征的个数相同。
具体的,由于处理后的第一文档矩阵是加入词共现特征之后的,其与第一问题矩阵的维度不同,要想对维度不同的矩阵进行编码处理,需要不同的编码器,要想对处理后的第一文档矩阵和第一问题矩阵使用同一个编码器进行编码,则需要将第一问题矩阵中的每一个词向量的尾部拼接两个预设特征,得到处理后的第一问题矩阵。如此,得到的处理后的第一问题矩阵的维度和处理后的第一文档矩阵的维度相同。两个预设特征均可以但不限于为1,即,在第一问题矩阵中的每个词向量尾部增加的两个维度上的数值均为1。
其中,一个实施例中,编码器为门控循环单元(Gated Recurrent Unit,GRU),将处理后的第一文档矩阵和处理后的第一问题矩阵经过预设数量层GUR网络进行编码,得到第二文档矩阵(记为C)和第二问题矩阵(记为Q)。此时得到的第二文档矩阵中便包含了用户问题的信息了。
需要说明的是,上述预设数目可以但不限于是2,或者,3,或者4。
步骤S22相当于“带着问题第一次阅读”,通过GRU编码来理解问题和文档。
在步骤S23中,基于注意力机制,对所述第二文档矩阵和所述第二问题矩阵进行交互处理,得到第三文档矩阵;包括:
基于注意力机制,对所述第二文档矩阵和所述第二问题矩阵进行处理,得到交互矩阵, 使得所述交互矩阵中包含文档与问题的比对信息;
以所述交互矩阵作为预设的第三GRU网络的输入,采用所述第三GRU网络对所述交互矩阵进行处理,将所述第三GRU网络的输出层输出确定为第三文档矩阵。
进一步的,所述基于注意力机制,对所述第二文档矩阵和所述第二问题矩阵进行处理,得到交互矩阵,使得所述交互矩阵中包含文档与问题的比对信息,包括:
根据所述第二文档矩阵和所述第二问题矩阵,计算得到文档与问题的词对相似度矩阵,并根据所述词对相似度矩阵确定第一归一化矩阵和第二归一化矩阵;
基于注意力机制,分别采用所述第一归一化矩阵和所述第二问题矩阵进行加权运算,以及,采用所述第一归一化矩阵、所述第二归一化矩阵和所述第二文档矩阵进行加权运算,分别计算得到第一交互注意力矩阵和第二交互注意力矩阵;
对所述第二文档矩阵、所述第一交互注意力矩阵、所述第二文档矩阵与所述第一交互注意力矩阵的点乘矩阵、所述第二文档矩阵与所述第二交互注意力矩阵的点乘矩阵进行依次拼接,将拼接后的矩阵确定为交互矩阵。
具体的,采用三线性函数,根据第二文档矩阵和第二问题矩阵计算得到文档-问题单词对的相似度矩阵。其公式为f(q,c)=W 0[q,c,q⊙c];其中,W 0是权重矩阵,q是步骤S22中得到的第二问题矩阵中对应每个词的词向量表示,c是步骤S22中得到的第二文档矩阵中对应每个词的词向量表示,f(q,c)表示文档-问题单词对的相似度矩阵。
一个实施例中,权重矩阵W 0是随机初始化得到的。
然后对文档-问题单词对的相似度矩阵的行和列分别做归一化得到行归一化矩阵(用
Figure PCTCN2020075553-appb-000006
表示)和列归一化矩阵(用
Figure PCTCN2020075553-appb-000007
表示)。
其中,行归一化矩阵为第一归一化矩阵,列归一化矩阵为第二归一化矩阵。
根据所述行归一化矩阵和第二问题矩阵的转置矩阵,计算得到文档-问题的注意力矩阵;其计算公式为:
Figure PCTCN2020075553-appb-000008
其中,Q T表示第二问题矩阵的转置矩阵,A表示文档-问题的注意力矩阵;
根据行归一化矩阵、列归一化矩阵和第二文档矩阵的转置矩阵,计算得到问题-文档的注意力矩阵;其计算公式为
Figure PCTCN2020075553-appb-000009
其中,C T表示第二文档矩阵的转置矩阵,
Figure PCTCN2020075553-appb-000010
表示列归一化矩阵的转置矩阵,B表示文档-问题的注意力矩阵;
然后,将第二文档矩阵C、文档-问题的注意力矩阵A、第二文档矩阵C与文档-问题的注意力矩阵A进行点乘后得到的矩阵,以及,第二文档矩阵C与问题-文档的注意力矩 阵B进行点乘后得到的矩阵,依次拼接,得到拼接后的矩阵,即
Figure PCTCN2020075553-appb-000011
将拼接后的矩阵经过一层GRU网络编码,便得到了第三文档矩阵,该第三文档矩阵融合了问题和文档的比对信息。
需要说明的是,文档矩阵C的维度是[batch_size,段落长度,hidden],问题矩阵的维度是[batch_size,问题长度,hidden],矩阵拼接时依次在前一个矩阵的最后一个维度上拼接,拼接后的矩阵的维度为[batch_size,段落长度,hidden*4]。其中,hidden表示编码后隐层神经元个数。
步骤S23相当于“带着问题读文档”和“带着文档读问题”。将上一步骤得到的第二文档矩阵C和第二问题矩阵Q进行“比对”,即,通过注意力机制,对文档内容中的每个词计算其关于用户问题中每个词的注意力分布,也就是用问题词向量来衡量文档词向量;对用户问题中的每个词计算其关于文档内容中每个词的注意力分布,也就是用文档词向量来衡量问题词向量。如此,建立用户问题和文档相关内容部分的连接,以在文档内容中定位对回答用户问题真正有用的部分。
在步骤S24中,基于注意力机制,对所述第三文档矩阵进行自匹配处理,得到第四文档矩阵;包括:
基于注意力机制,对所述第三文档矩阵进行处理,得到自匹配矩阵;
以所述自匹配矩阵作为预设的双向循环神经网络的输入,采用所述双向循环神经网络对所述自匹配矩阵进行处理,将所述双向循环神经网络的隐藏层输出确定为第四文档矩阵。
进一步的,所述基于注意力机制,对所述第三文档矩阵进行处理,得到自匹配矩阵,包括:
根据所述第三文档矩阵,计算得到文档与文档的自匹配相似度矩阵,并根据所述自匹配相似度矩阵确定自匹配加权矩阵;
基于注意力机制,采用所述自匹配加权矩阵对所述第三文档矩阵进行加权运算,计算得到自匹配注意力矩阵;
对所述第三文档矩阵和所述自匹配注意力矩阵进行拼接,得到拼接后的矩阵,并根据拼接后的矩阵确定自匹配矩阵。
具体的,根据第三文档矩阵和预先训练得到的参数矩阵,计算得到文档与文档本身的自匹配相似度矩阵;其公式为:
Figure PCTCN2020075553-appb-000012
其中,v T
Figure PCTCN2020075553-appb-000013
Figure PCTCN2020075553-appb-000014
均为随机初始化得到的参数矩阵,v P表示步骤S23中得到的融合了问题和文档的比对信息的第三文档矩阵的向量表示,j表示单词索引。
对所述文档与文档本身的自匹配相似度矩阵进行归一化,得到归一化后的矩阵;其公式为:
Figure PCTCN2020075553-appb-000015
其中,
Figure PCTCN2020075553-appb-000016
表示归一化后的矩阵。
根据第三文档矩阵的向量表示和归一化后的矩阵,计算得到文档中每个单词对应整篇文档的自匹配注意力矩阵;其公式为:
Figure PCTCN2020075553-appb-000017
其含义为:对权重和与权重相对应的第三文档矩阵中的词向量求加权和,得到自匹配注意力矩阵c t,其表示文档中每个单词对应整篇文档的语义向量。将第三文档矩阵和自匹配注意力矩阵进行拼接,得到拼接后的矩阵
Figure PCTCN2020075553-appb-000018
进一步的,所述根据拼接后的矩阵确定自匹配矩阵,包括:
将所述拼接后的矩阵确定为自匹配矩阵;或者,
基于门控制机制,对所述拼接后的矩阵进行加权处理,将加权处理后的矩阵确定为自匹配矩阵。
具体的,可以将得到的拼接后的矩阵直接确定为自匹配矩阵;或者,为了控制文档内容中不同部分的重要性,可以额外引入门(gate)控制机制,以自适应的控制不同部分的重要性。即,首先根据sigmoid函数和文档中每个单词对应自匹配注意力矩阵c t以及第三文档矩阵计算得到权重矩阵(记为g t),来控制文档不同部分的重要性,然后,将g t与得到的拼接后的矩阵
Figure PCTCN2020075553-appb-000019
进行点积计算得到新的矩阵,将得到的新的矩阵确定为自匹配矩阵;其公式为:
Figure PCTCN2020075553-appb-000020
Figure PCTCN2020075553-appb-000021
其中,W g表示随机初始化得到的参数矩阵,
Figure PCTCN2020075553-appb-000022
表示得到的拼接后的矩阵,
Figure PCTCN2020075553-appb-000023
表示经过门控制机制处理后的自匹配矩阵。
得到自匹配矩阵后,以自匹配矩阵作为输入量输入带双向循环神经网络(Bi-directional Recurrent Neural Network,BiRNN)中,得到BiRNN的隐层节点的输出值,将所述输出值确定为第四文档矩阵。其表示为:
Figure PCTCN2020075553-appb-000024
其中,
Figure PCTCN2020075553-appb-000025
是t时刻BiRNN隐层节点的输出。
步骤S24中,相当于针对问题对文档的理解和文档对问题的理解,再看一遍文档,做“第三次阅读”。使用文档内容本身的语境来调节文档内容本身的词表示,以得到更多文档内容的上下文信息,相当于对同一文档内容中相距较远的词也做了比较,能够将当前词与文档中其余部分中具有相似含义的其它词区分开。
在步骤S25中,基于指针网络,根据所述第四文档矩阵和所述第二问题矩阵,在所述文档内容中确定抽取起始位置和抽取结束位置,包括:
对所述第二问题矩阵进行还原处理,得到还原后的问题矩阵;
对应每个文档内容,根据所述第四文档矩阵和所述还原后的问题矩阵,计算得到注意力矩阵,所述注意力矩阵用于表征问题词向量对文档词向量的语义表示,所述注意力矩阵包括第一时刻的注意力矩阵和第二时刻的注意力矩阵;
对应每个文档内容,根据所述注意力矩阵计算各个文档词对应的概率值,并将第一时刻的最大概率值所对应的文档词确定为相应文档内容的抽取起始位置,以及,将第二时刻的最大概率值所对应的文档词确定为相应文档内容的抽取结束位置;
根据不同的文档内容所对应的抽取起始位置和抽取结束位置所对应的概率值之积,确定出待选择的文档内容,并将所述待选择的文档内容所对应的抽取起始位置和抽取结束位置,确定为最终采用的抽取起始位置和抽取结束位置。
具体的,由于用户问题是由最初用户提问的若干个单个问题拼接而成的,这里,首先将用户问题还原成单个问题,即,将步骤S22中得到的编码后的第二问题矩阵还原成单个问题矩阵,然后对步骤S24中得到的第四文档矩阵,计算关于单个问题矩阵的注意力矩阵,再分别计算第四文档矩阵对应的注意力矩阵中最大概率值对应的起始/结束位置,作为答案抽取的起始/结束位置,由于文档内容是由若干个相关文档的内容拼接而成的,因此,会对应得到若对起始位置和结束位置,计算每一对起始位置和结束位置分别对应的概率值之积,选取概率之积最大的一组作为最终的答案抽取起始/结束位置。比如,文档内容是由5个相关文档的内容拼接得到的,则最终会相应得到5个起始位置和结束位置,将5对其实位置和结束位置分别对应的概率值相乘得到5个概率之积,从中选择概率最大的值对应的起始位置和结束位置最为最终的答案抽取起始/结束位置。
详细的公式计算过程如下:
根据第四文档矩阵计算得到文档中的每个词向量关于文档本身的语义向量,并将所述语义向量作为RNN的输入量,得到RNN隐层节点的输出值;其公式为:
Figure PCTCN2020075553-appb-000026
Figure PCTCN2020075553-appb-000027
其中,
Figure PCTCN2020075553-appb-000028
表示步骤S24中得到的第四文档矩阵对应的向量表示,c t是t时刻的文档中的每个词向量关于文档本身的语义向量,
Figure PCTCN2020075553-appb-000029
的初始状态是利用文档注意力生成的问题向量
Figure PCTCN2020075553-appb-000030
其中r Q的计算过程如下:
Figure PCTCN2020075553-appb-000031
Figure PCTCN2020075553-appb-000032
Figure PCTCN2020075553-appb-000033
其中,v T
Figure PCTCN2020075553-appb-000034
Figure PCTCN2020075553-appb-000035
均是随机初始化得到的参数,u Q表示步骤S22中编码后得到的,且,已经还原成单个问题长度的问题矩阵的向量表示,s j表示单个问题与单个问题本身的相似度矩阵,a i表示对相似度矩阵进行归一化后得到的归一化矩阵,r Q表示单个问题矩阵加权求和后得到的问题矩阵。
将步骤S22中编码后得到的,且,已经还原成单个问题长度的问题矩阵u Q,经过上述相似度计算、归一化,加权得到的问题向量r Q作为
Figure PCTCN2020075553-appb-000036
的初始状态,与文档中的每个词向量关于文档本身的语义向量c t一起传入RNN。
根据所述第四文档矩阵和所述RNN隐层节点的输出值,计算得到文档与问题的相似度矩阵;其公式为:
Figure PCTCN2020075553-appb-000037
其中,h P表示第四文档矩阵,
Figure PCTCN2020075553-appb-000038
表示更新后的文档向量与还原后的单个问题词向量的相似度矩阵,
Figure PCTCN2020075553-appb-000039
Figure PCTCN2020075553-appb-000040
均为随机初始化得到的参数矩阵。
对文档与问题的相似度矩阵进行归一化,得到归一化后的文档与问题的相似度矩阵;其公式为:
Figure PCTCN2020075553-appb-000041
其中,
Figure PCTCN2020075553-appb-000042
表示归一化后的文档与问题的相似度矩阵。
计算归一化后的文档与问题的相似度矩阵的最大值,将最大值所对应的位置确定为抽取起始位置和抽取结束位置。其公式为:
Figure PCTCN2020075553-appb-000043
其中,p t
Figure PCTCN2020075553-appb-000044
中的最大值,对应答案在文档内容中的起始/结束位置,a 1对应的最大值位置就是答案抽取开始位置,a 2对应的最大值位置就是答案抽取的结束位置,n表示文档长度。
比如,得到5组概率值分别为
Figure PCTCN2020075553-appb-000045
Figure PCTCN2020075553-appb-000046
Figure PCTCN2020075553-appb-000047
Figure PCTCN2020075553-appb-000048
Figure PCTCN2020075553-appb-000049
以及
Figure PCTCN2020075553-appb-000050
Figure PCTCN2020075553-appb-000051
分别计算5组概率值之积,得到
Figure PCTCN2020075553-appb-000052
以及
Figure PCTCN2020075553-appb-000053
选择
Figure PCTCN2020075553-appb-000054
Figure PCTCN2020075553-appb-000055
以及
Figure PCTCN2020075553-appb-000056
中最大值对应的一组概率值分别对应的文档词作为答案抽取起始位置和结束位置。
需要说明的是,上述公式中的t表示时间索引;且,在步骤S25中涉及到的公式中的t的取值为两个,分别为第一时刻和第二时刻,将第一时刻的最大概率值所对应的文档词确定为相应文档内容的抽取起始位置,以及,将第二时刻的最大概率值所对应的文档词确定为相应文档内容的抽取结束位置对应答案抽取起始位置。
需要说明的是,上述公式中,下角标i和j均表示对应向量或矩阵中的单词的索引。
应该理解的是,虽然图1-图2的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图1-图2中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
图4是根据另一示例性实施例示出的一种基于深度学习的答案抽取装置的结构示意图。
如图4所示,本实施例提供的基于深度学习的答案抽取装置包括:
获取模块41,用于获取用户问题,以及,根据所述用户问题获取与所述用户问题相关的文档内容;
处理模块42,用于基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置;
展示模块43,用于将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定 为所述用户问题所对应的答案,并展示所述答案。
进一步的,参见图5,所述处理模块42包括:
第一处理单元421,用于根据所述用户问题和所述文档内容分别得到待处理的用户问题和待处理的文档内容,分别对所述待处理的用户问题和所述待处理的文档内容进行分词,并对各个词进行词向量转换,得到第一问题矩阵和第一文档矩阵;
第二处理单元422,用于对所述第一文档矩阵进行处理,使得处理后的第一文档矩阵包含问题信息,以及,对所述处理后的第一文档矩阵和所述第一问题矩阵分别进行编码,分别得到第二文档矩阵和第二问题矩阵;
第三处理单元423,用于基于注意力机制,对所述第二文档矩阵和所述第二问题矩阵进行交互处理,得到第三文档矩阵;
第四处理单元424,用于基于注意力机制,对所述第三文档矩阵进行自匹配处理,得到第四文档矩阵;
第五处理单元425,用于基于指针网络,根据所述第四文档矩阵和所述第二问题矩阵,在所述文档内容中确定抽取起始位置和抽取结束位置。
进一步的,所述第一处理单元421具体用于:对所有所述文档内容进行拼接,得到待处理的文档内容;和/或,对所述用户问题重复多次,并对重复的用户问题进行拼接,得到待处理的用户问题,其中,所述用户问题重复的次数为所述文档内容的总个数。
进一步的,所述第二处理单元422具体用于:确定词共现特征,并将所述词共现特征拼接到所述第一文档矩阵中的相应文档词向量的尾部,得到处理后的第一文档矩阵。
进一步的,所述词共现特征包括:第一词共现特征和/或第二词共现特征,所述第二处理单元422具体用于:对应所述待处理的文档内容中的每个词,如果所述词与待处理的用户问题中的至少一个词相同,则确定所述待处理的文档内容中所述词对应的第一词共现特征为第一值,否则,确定第一词共现特征为第二值,其中,所述第一值和所述第二值均为固定值,分别用于表示文档内容中的词在所述用户问题中出现或不出现,以及,将所述第一词共现特征拼接到所述第一文档矩阵中所述词所对应的词向量的尾部;和/或,分别计算所述第一文档矩阵中的各个词向量与所述第一问题矩阵中的各个词向量之间的相似度数值,以及,对应所述第一文档矩阵中的每个词向量,对所述相似度数值进行归一化,将归一化后的相似度数值作为第二词共现特征,拼接到所述第一文档矩阵中的相应词向量的尾部。
进一步的,所述第二处理单元422具体用于:以所述处理后的第一文档矩阵作为预设的第一GRU网络的输入,采用所述第一GRU网络对所述处理后的第一文档矩阵进行处理,将所述第一GRU网络的输出层输出确定为第二文档矩阵;以及,根据所述第一问题矩阵确定输入问题矩阵,将所述输入问题矩阵作为预设的第二GRU网络的输入,采用所述第二GRU网络对所述输入问题矩阵进行处理,将所述第二GRU网络的输出层输出确定为第二问题矩阵。
进一步的,所述第二处理单元422具体用于:如果所述第一GRU网络与所述第二GRU网络不同,则将所述第一问题矩阵确定为输入问题矩阵;或者,如果所述第一GRU网络与所述第二GRU网络相同,则对应所述第一问题矩阵中的每个词向量,在每个词向量的尾部拼接预设特征,得到拼接后的问题矩阵,并对所述拼接后的问题矩阵确定为输入问题矩阵,其中,所述预设特征的个数与所述词共现特征的个数相同。
进一步的,所述第三处理单元423具体用于:基于注意力机制,对所述第二文档矩阵和所述第二问题矩阵进行处理,得到交互矩阵,使得所述交互矩阵中包含文档与问题的比对信息;以所述交互矩阵作为预设的第三GRU网络的输入,采用所述第三GRU网络对所述交互矩阵进行处理,将所述第三GRU网络的输出层输出确定为第三文档矩阵。
进一步的,所述第三处理单元423具体用于:根据所述第二文档矩阵和所述第二问题矩阵,计算得到文档与问题的词对相似度矩阵,并根据所述词对相似度矩阵确定第一归一化矩阵和第二归一化矩阵;基于注意力机制,分别采用所述第一归一化矩阵和所述第二问题矩阵进行加权运算,以及,采用所述第一归一化矩阵、所述第二归一化矩阵和所述第二文档矩阵进行加权运算,分别计算得到第一交互注意力矩阵和第二交互注意力矩阵;对所述第二文档矩阵、所述第一交互注意力矩阵、所述第二文档矩阵与所述第一交互注意力矩阵的点乘矩阵、所述第二文档矩阵与所述第二交互注意力矩阵的点乘矩阵进行依次拼接,将拼接后的矩阵确定为交互矩阵。
进一步的,所述第四处理单元424具体用于:基于注意力机制,对所述第三文档矩阵进行处理,得到自匹配矩阵;以所述自匹配矩阵作为预设的双向循环神经网络的输入,采用所述双向循环神经网络对所述自匹配矩阵进行处理,将所述双向循环神经网络的隐藏层输出确定为第四文档矩阵。
进一步的,所述第四处理单元424具体用于:根据所述第三文档矩阵,计算得到文档与文档的自匹配相似度矩阵,并根据所述自匹配相似度矩阵确定自匹配加权矩阵;基于注 意力机制,采用所述自匹配加权矩阵对所述第三文档矩阵进行加权运算,计算得到自匹配注意力矩阵;对所述第三文档矩阵和所述自匹配注意力矩阵进行拼接,得到拼接后的矩阵,并根据拼接后的矩阵确定自匹配矩阵。
进一步的,所述第四处理单元424具体用于:将所述拼接后的矩阵确定为自匹配矩阵;或者,基于门控制机制,对所述拼接后的矩阵进行加权处理,将加权处理后的矩阵确定为自匹配矩阵。
进一步的,所述第五处理单元425具体用于:对所述第二问题矩阵进行还原处理,得到还原后的问题矩阵;对应每个文档内容,根据所述第四文档矩阵和所述还原后的问题矩阵,计算得到注意力矩阵,所述注意力矩阵用于表征问题词向量对文档词向量的语义表示,所述注意力矩阵包括第一时刻的注意力矩阵和第二时刻的注意力矩阵;对应每个文档内容,根据所述注意力矩阵计算各个文档词对应的概率值,并将第一时刻的最大概率值所对应的文档词确定为相应文档内容的抽取起始位置,以及,将第二时刻的最大概率值所对应的文档词确定为相应文档内容的抽取结束位置;根据不同的文档内容所对应的抽取起始位置和抽取结束位置所对应的概率值之积,确定出待选择的文档内容,并将所述待选择的文档内容所对应的抽取起始位置和抽取结束位置,确定为最终采用的抽取起始位置和抽取结束位置。
本实施例中,首先通过获取模块获取用户问题,以及,根据所述用户问题,获取与所述用户问题相关的文档内容;然后通过处理模块基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置;通过展示模块将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定为所述用户问题所对应的答案,并展示所述答案。采用这种答案抽取装置不需要人工提取特征来制定各种匹配规则来提取答案,直接将获取到的用户问题以及与用户问题相关的文档内容输入到深度学习模型便可以从文档内容中得到与用户问题相匹配的最合适的答案,简化了答案提取过程,且提高了答案准确性,从而大大提高了自动客服的效率和质量。
关于基于深度学习的答案抽取装置的具体限定可以参见上文中对于基于深度学习的答案抽取方法的限定,在此不再赘述。上述基于深度学习的答案抽取装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
需要说明的是,本实施例中未做详细说明的部分可参考有关该方法的实施例中的描述,此处不再赘述。
本申请提供的基于深度学习的答案抽取方法,可以应用于如图6所示的应用环境中。其中,终端61与服务器62通过网络进行通信。终端61获取用户问题,以及,根据所述用户问题获取与所述用户问题相关的文档内容;基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置;将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定为所述用户问题所对应的答案,并展示所述答案。其中,终端61可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器62可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是如图6所示的终端61或者服务器62,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序计算机可读指令的运行提供环境。该计算机设备的数据库用于存储用户问题相关的文档内容数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序计算机可读指令被处理器执行时以实现一种基于深度学习的答案抽取方法。
本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个实施例中提供的基于深度学习的答案抽取方法的步骤。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个实施例中提供的基于深度学习的答案抽取方法的步骤。
可以理解的是,上述各实施例中相同或相似部分可以相互参考,在一些实施例中未详细说明的内容可以参见其他实施例中相同或相似的内容。
需要说明的是,在本申请的描述中,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,在本申请的描述中,除非另有说明,“多个”的含义是指至少两个。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对资源信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的, 不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型..
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种基于深度学习的答案抽取方法,包括:
    获取用户问题,以及,根据所述用户问题获取与所述用户问题相关的文档内容;
    基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置;及
    将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定为所述用户问题所对应的答案,并展示所述答案。
  2. 根据权利要求1所述的方法,其特征在于,所述基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置,包括:
    根据所述用户问题和所述文档内容分别得到待处理的用户问题和待处理的文档内容,分别对所述待处理的用户问题和所述待处理的文档内容进行分词,并对各个词进行词向量转换,得到第一问题矩阵和第一文档矩阵;
    对所述第一文档矩阵进行处理,使得处理后的第一文档矩阵包含问题信息,以及,对所述处理后的第一文档矩阵和所述第一问题矩阵分别进行编码,分别得到第二文档矩阵和第二问题矩阵;
    基于注意力机制,对所述第二文档矩阵和所述第二问题矩阵进行交互处理,得到第三文档矩阵;
    基于注意力机制,对所述第三文档矩阵进行自匹配处理,得到第四文档矩阵;及
    基于指针网络,根据所述第四文档矩阵和所述第二问题矩阵,在所述文档内容中确定抽取起始位置和抽取结束位置。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述用户问题和所述文档内容分别得到待处理的用户问题和待处理的文档内容,包括:
    对所有所述文档内容进行拼接,得到待处理的文档内容;和/或,
    对所述用户问题重复多次,并对重复的用户问题进行拼接,得到待处理的用户问题,其中,所述用户问题重复的次数为所述文档内容的总个数。
  4. 根据权利要求2所述的方法,其特征在于,所述对所述第一文档矩阵进行处理,使得处理后的第一文档矩阵包含问题信息,包括:
    确定词共现特征,并将所述词共现特征拼接到所述第一文档矩阵中的相应文档词向量的尾部,得到处理后的第一文档矩阵。
  5. 根据权利要求4所述的方法,其特征在于,所述词共现特征包括:第一词共现特 征和/或第二词共现特征,所述确定词共现特征,并将所述词共现特征拼接到所述第一文档矩阵中的相应文档词向量的尾部,包括:
    对应所述待处理的文档内容中的每个词,如果所述词与待处理的用户问题中的至少一个词相同,则确定所述待处理的文档内容中所述词对应的第一词共现特征为第一值,否则,确定第一词共现特征为第二值,其中,所述第一值和所述第二值均为固定值,分别用于表示文档内容中的词在所述用户问题中出现或不出现,以及,将所述第一词共现特征拼接到所述第一文档矩阵中所述词所对应的词向量的尾部;和/或,
    分别计算所述第一文档矩阵中的各个词向量与所述第一问题矩阵中的各个词向量之间的相似度数值,以及,对应所述第一文档矩阵中的每个词向量,对所述相似度数值进行归一化,将归一化后的相似度数值作为第二词共现特征,拼接到所述第一文档矩阵中的相应词向量的尾部。
  6. 根据权利要求2所述的方法,其特征在于,所述对所述处理后的第一文档矩阵和所述第一问题矩阵分别进行编码,分别得到第二文档矩阵和第二问题矩阵,包括:
    以所述处理后的第一文档矩阵作为预设的第一GRU网络的输入,采用所述第一GRU网络对所述处理后的第一文档矩阵进行处理,将所述第一GRU网络的输出层输出确定为第二文档矩阵;以及,
    根据所述第一问题矩阵确定输入问题矩阵,将所述输入问题矩阵作为预设的第二GRU网络的输入,采用所述第二GRU网络对所述输入问题矩阵进行处理,将所述第二GRU网络的输出层输出确定为第二问题矩阵。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述第一问题矩阵确定输入问题矩阵,包括:
    如果所述第一GRU网络与所述第二GRU网络不同,则将所述第一问题矩阵确定为输入问题矩阵;或者,
    如果所述第一GRU网络与所述第二GRU网络相同,则对应所述第一问题矩阵中的每个词向量,在每个词向量的尾部拼接预设特征,得到拼接后的问题矩阵,并对所述拼接后的问题矩阵确定为输入问题矩阵,其中,所述预设特征的个数与所述词共现特征的个数相同。
  8. 根据权利要求2所述的方法,其特征在于,所述基于注意力机制,对所述第二文档矩阵和所述第二问题矩阵进行交互处理,得到第三文档矩阵,包括:
    基于注意力机制,对所述第二文档矩阵和所述第二问题矩阵进行处理,得到交互矩阵, 使得所述交互矩阵中包含文档与问题的比对信息;及
    以所述交互矩阵作为预设的第三GRU网络的输入,采用所述第三GRU网络对所述交互矩阵进行处理,将所述第三GRU网络的输出层输出确定为第三文档矩阵。
  9. 根据权利要求8所述的方法,其特征在于,所述基于注意力机制,对所述第二文档矩阵和所述第二问题矩阵进行处理,得到交互矩阵,使得所述交互矩阵中包含文档与问题的比对信息,包括:
    根据所述第二文档矩阵和所述第二问题矩阵,计算得到文档与问题的词对相似度矩阵,并根据所述词对相似度矩阵确定第一归一化矩阵和第二归一化矩阵;
    基于注意力机制,分别采用所述第一归一化矩阵和所述第二问题矩阵进行加权运算,以及,采用所述第一归一化矩阵、所述第二归一化矩阵和所述第二文档矩阵进行加权运算,分别计算得到第一交互注意力矩阵和第二交互注意力矩阵;及
    对所述第二文档矩阵、所述第一交互注意力矩阵、所述第二文档矩阵与所述第一交互注意力矩阵的点乘矩阵、所述第二文档矩阵与所述第二交互注意力矩阵的点乘矩阵进行依次拼接,将拼接后的矩阵确定为交互矩阵。
  10. 根据权利要求2所述的方法,其特征在于,所述基于注意力机制,对所述第三文档矩阵进行自匹配处理,得到第四文档矩阵,包括:
    基于注意力机制,对所述第三文档矩阵进行处理,得到自匹配矩阵;及
    以所述自匹配矩阵作为预设的双向循环神经网络的输入,采用所述双向循环神经网络对所述自匹配矩阵进行处理,将所述双向循环神经网络的隐藏层输出确定为第四文档矩阵。
  11. 根据权利要求10所述的方法,其特征在于,所述基于注意力机制,对所述第三文档矩阵进行处理,得到自匹配矩阵,包括:
    根据所述第三文档矩阵,计算得到文档与文档的自匹配相似度矩阵,并根据所述自匹配相似度矩阵确定自匹配加权矩阵;
    基于注意力机制,采用所述自匹配加权矩阵对所述第三文档矩阵进行加权运算,计算得到自匹配注意力矩阵;及
    对所述第三文档矩阵和所述自匹配注意力矩阵进行拼接,得到拼接后的矩阵,并根据拼接后的矩阵确定自匹配矩阵。
  12. 根据权利要求11所述的方法,其特征在于,所述根据拼接后的矩阵确定自匹配 矩阵,包括:
    将所述拼接后的矩阵确定为自匹配矩阵;或者,
    基于门控制机制,对所述拼接后的矩阵进行加权处理,将加权处理后的矩阵确定为自匹配矩阵。
  13. 根据权利要求2所述的方法,其特征在于,所述基于指针网络,根据所述第四文档矩阵和所述第二问题矩阵,在所述文档内容中确定抽取起始位置和抽取结束位置,包括:
    对所述第二问题矩阵进行还原处理,得到还原后的问题矩阵;
    对应每个文档内容,根据所述第四文档矩阵和所述还原后的问题矩阵,计算得到注意力矩阵,所述注意力矩阵用于表征问题词向量对文档词向量的语义表示,所述注意力矩阵包括第一时刻的注意力矩阵和第二时刻的注意力矩阵;
    对应每个文档内容,根据所述注意力矩阵计算各个文档词对应的概率值,并将第一时刻的最大概率值所对应的文档词确定为相应文档内容的抽取起始位置,以及,将第二时刻的最大概率值所对应的文档词确定为相应文档内容的抽取结束位置;及
    根据不同的文档内容所对应的抽取起始位置和抽取结束位置所对应的概率值之积,确定出待选择的文档内容,并将所述待选择的文档内容所对应的抽取起始位置和抽取结束位置,确定为最终采用的抽取起始位置和抽取结束位置。
  14. 一种基于深度学习的答案抽取装置,其特征在于,包括:
    获取模块,用于获取用户问题,以及,根据所述用户问题获取与所述用户问题相关的文档内容;
    处理模块,用于基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置;
    显示模块,用于将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定为所述用户问题所对应的答案,并展示所述答案。
  15. 根据权利要求14所述的装置,其特征在于,所述处理模块,包括:
    第一处理单元,用于根据所述用户问题和所述文档内容分别得到待处理的用户问题和待处理的文档内容,分别对所述待处理的用户问题和所述待处理的文档内容进行分词,并对各个词进行词向量转换,得到第一问题矩阵和第一文档矩阵;
    第二处理单元,用于对所述第一文档矩阵进行处理,使得处理后的第一文档矩阵包含 问题信息,以及,对所述处理后的第一文档矩阵和所述第一问题矩阵分别进行编码,分别得到第二文档矩阵和第二问题矩阵;
    第三处理单元,用于基于注意力机制,对所述第二文档矩阵和所述第二问题矩阵进行交互处理,得到第三文档矩阵;
    第四处理单元,用于基于注意力机制,对所述第三文档矩阵进行自匹配处理,得到第四文档矩阵;
    第五处理单元,用于基于指针网络,根据所述第四文档矩阵和所述第二问题矩阵,在所述文档内容中确定抽取起始位置和抽取结束位置。
  16. 根据权利要求15所述的装置,其特征在于,所述第一处理单元用于:
    对所有所述文档内容进行拼接,得到待处理的文档内容;和/或,
    对所述用户问题重复多次,并对重复的用户问题进行拼接,得到待处理的用户问题,其中,所述用户问题重复的次数为所述文档内容的总个数。
  17. 根据权利要求14所述的装置,其特征在于,所述第二处理单元用于:
    确定词共现特征,并将所述词共现特征拼接到所述第一文档矩阵中的相应文档词向量的尾部,得到处理后的第一文档矩阵。
  18. 根据权利要求17所述的装置,其特征在于,所述词共现特征包括:第一词共现特征和/或第二词共现特征,所述第二处理单元用于:
    对应所述待处理的文档内容中的每个词,如果所述词与待处理的用户问题中的至少一个词相同,则确定所述待处理的文档内容中所述词对应的第一词共现特征为第一值,否则,确定第一词共现特征为第二值,其中,所述第一值和所述第二值均为固定值,分别用于表示文档内容中的词在所述用户问题中出现或不出现,以及,将所述第一词共现特征拼接到所述第一文档矩阵中所述词所对应的词向量的尾部;和/或,
    分别计算所述第一文档矩阵中的各个词向量与所述第一问题矩阵中的各个词向量之间的相似度数值,以及,对应所述第一文档矩阵中的每个词向量,对所述相似度数值进行归一化,将归一化后的相似度数值作为第二词共现特征,拼接到所述第一文档矩阵中的相应词向量的尾部。
  19. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取用户问题,以及,根据所述用户问题获取与所述用户问题相关的文档内容;
    基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置;及
    将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定为所述用户问题所对应的答案,并展示所述答案。
  20. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取用户问题,以及,根据所述用户问题获取与所述用户问题相关的文档内容;
    基于深度学习模型,在所述文档内容中确定抽取起始位置和抽取结束位置;及
    将所述抽取起始位置和所述抽取结束位置之间的文档内容,确定为所述用户问题所对应的答案,并展示所述答案。
PCT/CN2020/075553 2019-03-22 2020-02-17 基于深度学习的答案抽取方法、装置、计算机设备和存储介质 WO2020192307A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910225135.0 2019-03-22
CN201910225135.0A CN109977404A (zh) 2019-03-22 2019-03-22 基于深度学习的答案抽取方法、装置和存储介质

Publications (1)

Publication Number Publication Date
WO2020192307A1 true WO2020192307A1 (zh) 2020-10-01

Family

ID=67080278

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/075553 WO2020192307A1 (zh) 2019-03-22 2020-02-17 基于深度学习的答案抽取方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN109977404A (zh)
WO (1) WO2020192307A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417094A (zh) * 2020-11-17 2021-02-26 华东理工大学 基于网络文本的答案选择方法、装置、服务器及存储介质
CN112541350A (zh) * 2020-12-04 2021-03-23 支付宝(杭州)信息技术有限公司 一种变种文本还原方法、装置以及设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977404A (zh) * 2019-03-22 2019-07-05 深圳追一科技有限公司 基于深度学习的答案抽取方法、装置和存储介质
CN110825870B (zh) * 2019-10-31 2023-07-14 腾讯科技(深圳)有限公司 文档摘要的获取方法和装置、存储介质及电子装置
CN111078854B (zh) * 2019-12-13 2023-10-27 北京金山数字娱乐科技有限公司 问答预测模型的训练方法及装置、问答预测方法及装置

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376222A (zh) * 2018-09-27 2019-02-22 国信优易数据有限公司 问答匹配度计算方法、问答自动匹配方法及装置
CN109977404A (zh) * 2019-03-22 2019-07-05 深圳追一科技有限公司 基于深度学习的答案抽取方法、装置和存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11379736B2 (en) * 2016-05-17 2022-07-05 Microsoft Technology Licensing, Llc Machine comprehension of unstructured text
US10572595B2 (en) * 2017-04-13 2020-02-25 Baidu Usa Llc Global normalized reader systems and methods
CN108415977B (zh) * 2018-02-09 2022-02-15 华南理工大学 一个基于深度神经网络及强化学习的生成式机器阅读理解方法
CN108959246B (zh) * 2018-06-12 2022-07-12 北京慧闻科技(集团)有限公司 基于改进的注意力机制的答案选择方法、装置和电子设备
CN109492227A (zh) * 2018-11-16 2019-03-19 大连理工大学 一种基于多头注意力机制和动态迭代的机器阅读理解方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376222A (zh) * 2018-09-27 2019-02-22 国信优易数据有限公司 问答匹配度计算方法、问答自动匹配方法及装置
CN109977404A (zh) * 2019-03-22 2019-07-05 深圳追一科技有限公司 基于深度学习的答案抽取方法、装置和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG, WENHUI ET AL.: "Gated Self-Matching Networks for Reading Comprehension and Question Answering", PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, vol. 1, 31 July 2017 (2017-07-31), pages 189 - 198, XP055738337 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417094A (zh) * 2020-11-17 2021-02-26 华东理工大学 基于网络文本的答案选择方法、装置、服务器及存储介质
CN112417094B (zh) * 2020-11-17 2024-04-05 华东理工大学 基于网络文本的答案选择方法、装置、服务器及存储介质
CN112541350A (zh) * 2020-12-04 2021-03-23 支付宝(杭州)信息技术有限公司 一种变种文本还原方法、装置以及设备

Also Published As

Publication number Publication date
CN109977404A (zh) 2019-07-05

Similar Documents

Publication Publication Date Title
WO2020192307A1 (zh) 基于深度学习的答案抽取方法、装置、计算机设备和存储介质
CN112732911B (zh) 基于语义识别的话术推荐方法、装置、设备及存储介质
US11816442B2 (en) Multi-turn dialogue response generation with autoregressive transformer models
WO2021027533A1 (zh) 文本语义识别方法、装置、计算机设备和存储介质
US10650311B2 (en) Suggesting resources using context hashing
CN114556443A (zh) 使用基于注意力的融合网络的多媒体数据语义分析系统和方法
CN111695415A (zh) 图像识别模型的构建方法、识别方法及相关设备
WO2024011814A1 (zh) 一种图文互检方法、系统、设备及非易失性可读存储介质
CN112307168B (zh) 基于人工智能的问诊会话处理方法、装置和计算机设备
WO2020151689A1 (zh) 对话生成方法、装置、设备及存储介质
WO2021204017A1 (zh) 文本意图识别方法、装置以及相关设备
WO2021120779A1 (zh) 一种基于人机对话的用户画像构建方法、系统、终端及存储介质
CN110399472B (zh) 面试提问提示方法、装置、计算机设备及存储介质
CN110990555A (zh) 端到端检索式对话方法与系统及计算机设备
CN112699215B (zh) 基于胶囊网络与交互注意力机制的评级预测方法及系统
CN115098700A (zh) 知识图谱嵌入表示方法及装置
US20240037939A1 (en) Contrastive captioning for image groups
CN114492451A (zh) 文本匹配方法、装置、电子设备及计算机可读存储介质
WO2021217866A1 (zh) 用于ai智能面试的识别的方法、装置、计算机设备及存储介质
CN111460113A (zh) 一种数据交互方法及相关设备
CN115510193A (zh) 查询结果向量化方法、查询结果确定方法及相关装置
CN112131363B (zh) 自动问答方法、装置、设备及存储介质
CN112668343A (zh) 文本重写方法以及电子设备、存储装置
CN116991919B (zh) 结合平台数据库的业务数据检索方法及人工智能系统
US11893352B2 (en) Dependency path reasoning for measurement extraction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20776497

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.02.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20776497

Country of ref document: EP

Kind code of ref document: A1