WO2021176714A1 - 学習装置、情報処理装置、学習方法、情報処理方法及びプログラム - Google Patents
学習装置、情報処理装置、学習方法、情報処理方法及びプログラム Download PDFInfo
- Publication number
- WO2021176714A1 WO2021176714A1 PCT/JP2020/009806 JP2020009806W WO2021176714A1 WO 2021176714 A1 WO2021176714 A1 WO 2021176714A1 JP 2020009806 W JP2020009806 W JP 2020009806W WO 2021176714 A1 WO2021176714 A1 WO 2021176714A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- answer
- learning
- basis
- question
- text
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 36
- 230000010365 information processing Effects 0.000 title claims description 7
- 238000003672 processing method Methods 0.000 title claims description 3
- 238000000605 extraction Methods 0.000 claims abstract description 73
- 238000013528 artificial neural network Methods 0.000 claims abstract description 22
- 238000005070 sampling Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 5
- 230000001537 neural effect Effects 0.000 claims 2
- 239000000284 extract Substances 0.000 abstract description 5
- 239000013598 vector Substances 0.000 description 59
- 238000012545 processing Methods 0.000 description 55
- 238000012549 training Methods 0.000 description 34
- 230000008569 process Effects 0.000 description 18
- 238000011156 evaluation Methods 0.000 description 11
- 230000004044 response Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007786 learning performance Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/907—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/908—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Definitions
- the present invention relates to a learning device, an information processing device, a learning method, an information processing method, and a program.
- Non-Patent Document 1 A machine reading model that presents the basis for an answer to this problem has been proposed.
- One embodiment of the present invention has been made in view of the above points, and an object of the present invention is to realize a machine reading comprehension capable of presenting the basis of an answer.
- the learning device uses the model parameter of the first neural network as input of the text and the question related to the text, and the character string included in the text is used.
- a first set showing a set of character strings on which the answer is based is obtained by calculating a basis score representing the plausibility that is the basis of the answer to the question and sampling from a predetermined distribution having the basis score as a parameter.
- Answer extraction means for extracting the answer from the first set by inputting the basis extraction means for extracting from the text, the question, and the first set, and using the model parameters of the second neural network.
- the model parameters of the first neural network and the second by calculating the gradient by error backpropagation using the first loss and continuous relaxation between the answer and the true answer to the question. It is characterized by having a first learning means for learning the model parameters of the neural network of the above.
- Machine reading is realized by a machine reading model composed of neural networks and the like.
- a machine reading model that can present the basis of the answer can be interpreted, and is defined below.
- the machine reading model is interpretable means that the machine reading model is composed of two models with the following inputs and outputs.
- the basis is a set of sentences.
- the basis is not limited to this, and the basis may be a set of character strings longer than the sentence (for example, paragraphs) or a set of character strings shorter than the sentence (for example, phrases). ..
- the interpretable machine reading model only the character strings included in the basis among the character strings included in the reference text are input to the answer model. That is, information other than the basis (for example, the hidden state of the basis model) is not used in the response model. For this reason, (1) it is possible to present the rationale for the answer in a strict sense, and (2) the answer model has only the rationale and question information, so the reason for predicting the answer is short enough (that is, from the reference text). It can be limited to (the grounds that it is a sufficiently short character string), and (3) since the input of the answer model is shortened, there is an advantage that the answer model can tolerate high calculation cost processing. In addition, by learning by unsupervised learning, which will be described later, there is an advantage that it is possible to learn the grounds necessary for the machine reading comprehension model to answer with high accuracy instead of (4) manual annotation. ..
- the present embodiment is at the time of learning to learn the parameters of the machine reading model (that is, the parameters of the basis model and the parameters of the answer model) and at the time of inference to perform machine reading by the machine reading model using the learned parameters.
- the parameters of the machine reading model that is, the parameters of the basis model and the parameters of the answer model
- the learning method will be explained. Therefore, the "inference time”, “learning (supervised learning) time”, and “learning (unsupervised learning) time” of the question answering device 10 will be described below.
- FIG. 1 is a diagram showing an example of the overall configuration of the question answering device 10 at the time of inference.
- the question answering device 10 at the time of inference includes a basis extraction processing unit 101 and an answer extraction processing unit 102 that realize a machine reading comprehension model, and parameters of the basis model (hereinafter, referred to as “foundation model parameters"). It has a basis model parameter storage unit 201 for storing the answer model, and an answer model parameter storage unit 202 for storing the parameters of the answer model (hereinafter, referred to as “answer model parameters”).
- the rationale extraction processing unit 101 is realized by the rationale model, and the rationale is realized by inputting the reference text P and the question Q and using the learned rationale model parameters stored in the rationale model parameter storage unit 201.
- the grounds extraction processing unit 101 includes a language understanding unit 111 and a grounds extraction unit 112.
- Language understanding unit 111 is input with the reference text P and question Q, and outputs the set of all sentence vectors in the query vector q and a reference text P ⁇ s i ⁇ .
- Rationale extracting unit 112 as input and query vector q and sentence vector set ⁇ s i ⁇ , and outputs the rationale ⁇ R.
- the answer extraction processing unit 102 is realized by the answer model, and outputs the answer ⁇ A by inputting the basis ⁇ R and the question Q and using the learned answer model parameters stored in the answer model parameter storage unit 202.
- the answer extraction processing unit 102 includes a language understanding unit 121 and an answer extraction unit 122.
- the language understanding unit 121 outputs the vector system H by inputting the grounds ⁇ R and the question Q.
- the answer extraction unit 122 takes the vector sequence H as an input and outputs the answer ⁇ A (more accurately, the score that is the start point and the end point of the answer range in the basis ⁇ R).
- the basis model parameter storage unit 201 and the answer model parameter storage unit 202 are different storage units, but they may be the same storage unit. Further, among the basis model parameters and the answer model parameters, the parameters used by the language understanding unit 111 and the parameters used by the language understanding unit 121 may be the same (that is, the parameters are set by the language understanding unit 111 and the language understanding unit 121). It may be shared.)
- FIG. 2 is a flowchart showing an example of inference processing according to the present embodiment.
- the language understanding unit 111 of the rationale extraction processing unit 101 takes the reference text P and the question Q as inputs, and uses the learned rationale model parameters stored in the rationale model parameter storage unit 201 to use the question vector q and the sentence. and outputs the vector set ⁇ s i ⁇ (step S101).
- the language comprehension unit 111 sets the reference text P and the question Q as a token sequence ['[CLS Q ]';question;'[SEP Q ]';' [CLS P ]'; sentence 1;'[ Enter in BERT (Bidirectional Encoder Representations from Transformers) as SEP P ]';...;'[CLS P ]'; sentence n;'[SEP P]'].
- SEP P Enter in BERT
- BERT Bidirectional Encoder Representations from Transformers
- the language comprehension unit 111 sets the vector of the position corresponding to'[CLS Q ]'in the output of BERT as the question vector q ⁇ R d , and the vector of the position corresponding to the i-th'[CLS P ]'.
- the sentence vector s i ⁇ R d of the i-th sentence. d is the dimension of the BERT output.
- R d is a d-dimensional real space.
- a query vector q and sentence vector set ⁇ s i ⁇ is obtained.
- BERT see, for example, Reference 1 "Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova," BERT: Pre-training of Deep Bidirectional Transformers for Language ".
- each of the plurality of divided reference texts (and question Q).
- query vector q may be the average of the query vector obtained from each divided reference text.
- grounds extractor 112 of the grounds extraction processing unit 101 is input with the query vector q and sentence vector set ⁇ s i ⁇ , using the learned basis model parameters stored in the basis model parameter storage unit 201 ,
- the basis ⁇ R is output (step S102).
- W p ⁇ R d ⁇ d is a learned parameter included in the learned basis model parameter (that is, W p is a parameter to be learned in the learning process described later).
- R d ⁇ d is a real space of d ⁇ d dimension.
- Step2 Next, evidence extraction unit 112 extracts a sentence ⁇ r t by the following.
- S is a set of whole sentences
- ⁇ R t-1 is a set of sentences extracted by time t-1. That is, the basis extraction unit 112 extracts the sentence having the highest score among the sentences that have not been extracted so far.
- Step3 Next, the basis extraction unit 112 determines whether or not the sentence extracted in Step2 is an EOE sentence s EOE. Then, if the sentence extracted in Step 2 is not EOE sentence s EOE , Step 4 is executed, and if it is EOE sentence s EOE , the process is terminated.
- the sentence vector s EOE is a learned parameter included in the learned basis model parameter (that is, the sentence vector s EOE is a parameter to be learned in the learning process described later).
- Step 4 The basis extraction unit 112 updates the question vector q t by the following using the sentence vector of the sentence extracted in Step 2 above.
- query vector q t is expressed the missing information in order to answer the question.
- Initial state q 0 is all the information needed to answer the question, by the above Step4, extracted sentence ⁇ information in r t is expected to be removed from the q t by GRU.
- the language understanding unit 121 of the answer extraction processing unit 102 inputs the basis ⁇ R and the question Q, and uses the learned answer model parameters stored in the answer model parameter storage unit 202 to generate the vector sequence H. Output (step S103).
- the language comprehension unit 121 sets the basis ⁇ R and the question Q as a token series ['[CLS]';question;'[SEP]'; sentence r 1 ; ...; sentence r T ;'[ Enter in BERT as SEP]'].
- '[CLS]'and'[SEP]' are special tokens
- T is the number of sentences included in the basis ⁇ R.
- pre-learned language models other than BERT may be used.
- k is the series length.
- R k ⁇ d is a real space of k ⁇ d dimension.
- the answer extraction unit 122 of the answer extraction processing unit 102 takes the vector series H as an input and outputs the answer ⁇ A using the learned answer model parameters stored in the answer model parameter storage unit 202 (step). S104).
- the answer extraction unit 122 converts the vector series H into the answer score by the following linear conversion.
- W a ⁇ R 2 ⁇ d and b a ⁇ R 2 is a learned parameter included in the trained Answer model parameters (i.e., W a and b a is a parameter to be learned in the learning process to be described later be.).
- R 2 ⁇ d is a 2 ⁇ d-dimensional real space
- R 2 is a two-dimensional real space.
- the question answering device 10 at the time of inference can obtain the answer ⁇ A by inputting the reference text P and the question Q. Moreover, at this time, the question answering device 10 can also obtain the basis ⁇ R of the answer ⁇ A (that is, the sentence set on which the answer ⁇ A is based). The answer ⁇ A and its basis ⁇ R may be output to any output destination inside or outside the question answering device 10 (for example, a display, a storage device, another device connected via a communication network, etc.). ..
- the question answering device 10 at the time of learning includes a reference text P, a question Q related to the reference text P, a correct answer A indicating the answer range of the true answer to the question Q, and the correct answer.
- a set of training data including the correct answer basis R indicating the true basis of answer A is input.
- FIG. 3 is a diagram showing an example of the overall configuration of the question answering device 10 during learning (supervised learning). Note that FIG. 3 mainly describes the differences from the time of inference, and omits the description of the same components as those at the time of inference.
- the question response device 10 at the time of learning learns the basis extraction processing unit 101 and the answer extraction processing unit 102 that realize the machine reading model, and the basis model parameters and the answer model parameters. It has a parameter learning unit 103, a basis model parameter storage unit 201 for storing the basis model parameters, and an answer model parameter storage unit 202 for storing the answer model parameters.
- the basis model parameter storage unit 201 stores the basis model parameters that have not been learned (that is, is being learned), and the answer model parameters also store the answer model parameters that are being learned.
- the parameter learning unit 103 learns the basis model parameters using the error (loss) between the basis ⁇ R and the correct answer basis R, and uses the error (loss) between the answer ⁇ A and the correct answer A to obtain the answer model parameters. learn.
- FIG. 4 is a flowchart showing an example of supervised learning processing according to the present embodiment.
- the basis model parameter and the answer model parameter are learned by online learning
- this is an example, and batch learning, mini-batch learning, and the like can also be applied.
- the parameter learning unit 103 selects one training data (that is, a set of the reference text P, the question Q, the correct answer A, and the correct answer basis R) from the training data set as the processing target (step S201).
- the language understanding unit 111 of the rationale extraction processing unit 101 stores the reference text P and the question Q included in the training data selected as the processing target in the above step S201 as inputs in the rationale model parameter storage unit 201. using basis model parameters in the training being, and outputs the query vector q and sentence vector set ⁇ s i ⁇ (step S202).
- the language understanding unit 111 outputs the same processing query vector q and sentence vector set by performing ⁇ s i ⁇ and the step S101 of FIG.
- grounds extractor 112 of the grounds extraction processing unit 101 is input with the query vector q and sentence vector set ⁇ s i ⁇ , with a basis model parameters in the learning stored in the basis model parameter storage unit 201 Then, the basis ⁇ R is output (step S203).
- evidence extraction unit 112 extracts sentence ⁇ r t with Teacher-Forcing. That is, the rationale extraction unit 112 extracts sentence ⁇ r t below.
- the basis model is expected to extract (select) sentences in the order in which they contain important information for question Q.
- the parameter learning unit 103 as a loss L R grounds the model to calculate the average of the negative log-likelihood Extraction of sentences which to base at each time t (step S204). That is, the parameter learning unit 103 calculates the loss L R below.
- Pr (i; ⁇ R t-1 ) is the probability that the sentence i is output at time t, that is,
- the language understanding unit 121 of the answer extraction processing unit 102 takes the grounds ⁇ R and the question Q as inputs, and uses the answer model parameters being learned stored in the answer model parameter storage unit 202 to use the vector sequence H. Is output (step S205).
- the language understanding unit 121 performs the same processing as in step S103 of FIG. 2 and outputs the vector sequence H.
- the answer extraction unit 122 of the answer extraction processing unit 102 takes the vector sequence H as an input and outputs the answer ⁇ A using the answer model parameter being learned stored in the answer model parameter storage unit 202 (). Step S206). The answer extraction unit 122 performs the same processing as in step S104 of FIG. 2 and outputs the answer ⁇ A.
- the parameter learning unit 103 calculates the sum of the Cross-Entropy loss answers ⁇ A correct answer Answer A (step S207). That is, the parameter learning unit 103 calculates the loss L A below.
- a s is a vector for each a s, i elements, a e is a vector to each a e, the i element. Also, i s the start point of the answer range indicated correct answers A, the j e is the end point of the reply ranging.
- the parameter learning unit 103 is configured to learn the rationale model parameters using the loss L R calculated in step S204 described above, to learn the answers model parameters using the loss L A calculated in step S207 described above ( Step S208). That is, the parameter learning unit 103 calculates the value and the gradient thereof of loss L R, the value of the loss L R to update the basis model parameters so as to minimize. Similarly, the parameter learning unit 103 calculates the value and the gradient thereof of loss L A, the value of the loss L A updates the answer model parameters so as to minimize.
- the parameter learning unit 103 determines whether or not all the training data in the training data set has been selected as the processing target (step S209). If there is training data that has not yet been selected as the processing target (NO in step S209), the parameter learning unit 103 returns to step S201. As a result, the above steps S201 to S208 are executed for all the training data in the training data set.
- the parameter learning unit 103 determines whether or not the convergence condition is satisfied (step S210). If the convergence condition is satisfied (YES in step S210), the parameter learning unit 103 ends the learning process. On the other hand, if the convergence condition is not satisfied (NO in step S210), the parameter learning unit 103 assumes that all the training data in the training data set has not been selected as the processing target, and then returns to step S201. ..
- the convergence condition for example, the number of times (the number of repetitions) of the above steps S201 to S208 being processed is equal to or more than a predetermined number of times.
- the question answering device 10 at the time of learning learns the basis model parameter and the answer model parameter by inputting the training data including the reference text P, the question Q, the correct answer A, and the correct answer basis R. can do.
- the basis model parameter and the answer model parameter are learned in one learning process, but the present invention is not limited to this, and the basis model parameter and the answer model parameter are learned in separate learning processes. May be good.
- the question answering device 10 at the time of learning includes a reference text P, a question Q related to the reference text P, and a correct answer A indicating the answer range of the true answer to the question Q.
- a set of training data (training data set) is input.
- the correct answer basis R indicating the true basis of the correct answer A is not given (that is, unsupervised means that the correct answer basis R is not given). Therefore, even if the correct answer basis R is not available or does not exist, the parameters of the machine reading model can be learned.
- FIG. 5 is a diagram showing an example of the overall configuration of the question answering device 10 during learning (unsupervised learning). Note that FIG. 5 mainly describes the differences from the time of inference, and omits the description of the same components as those at the time of inference.
- the question response device 10 learns the basis extraction processing unit 101 and the answer extraction processing unit 102 that realize the machine reading model, and the basis model parameters and the answer model parameters. It has a parameter learning unit 103, a basis model parameter storage unit 201 for storing the basis model parameters, and an answer model parameter storage unit 202 for storing the answer model parameters.
- the basis model parameter storage unit 201 stores the basis model parameter being learned, and the answer model parameter also stores the answer model parameter being learned.
- the parameter learning unit 103 learns the basis model parameter and the answer model parameter using the loss of the answer ⁇ A.
- FIG. 6 is a flowchart showing an example of unsupervised learning processing according to the present embodiment.
- the basis model parameter and the answer model parameter are learned by online learning
- this is an example, and batch learning, mini-batch learning, and the like can also be applied.
- the parameter learning unit 103 selects one training data (that is, a set of the reference text P, the question Q, and the correct answer A) from the training data set as the processing target (step S301).
- the language understanding unit 111 of the rationale extraction processing unit 101 stores the reference text P and the question Q included in the training data selected as the processing target in the above step S301 as inputs in the rationale model parameter storage unit 201. using basis model parameters in the training being, and outputs the query vector q and sentence vector set ⁇ s i ⁇ (step S302).
- the language understanding unit 111 outputs the same processing query vector q and sentence vector set by performing ⁇ s i ⁇ and the step S101 of FIG.
- grounds extractor 112 of the grounds extraction processing unit 101 is input with the query vector q and sentence vector set ⁇ s i ⁇ , with a basis model parameters in the learning stored in the basis model parameter storage unit 201 Then, the basis ⁇ R is output (step S303).
- the extraction operation argmax statement on which to base a differential impossible when extracting the aforementioned Step2 Debun ⁇ r t, grounds extracting unit 112, statement underlies sampling by gumbel-softmax trick ⁇ r t Is extracted.
- evidence extraction unit 112 the sentence ⁇ r t be extracted at time t
- this equation means that the text is extracted by sampling from a predetermined first distribution. More specifically, it means extracting text based on a score consisting of the sum of a rationale score and a random variable according to a predetermined second distribution (in this embodiment, a Gumbel distribution as an example).
- a predetermined second distribution in this embodiment, a Gumbel distribution as an example.
- the basis score is the log (Pr (i; ⁇ R t-1 )) in the above formula
- the sentence i is the score indicating the plausibility on which the answer is based.
- the rationale extraction operation argmax is indistinguishable.
- the operation of creating a one-hot vector for extracting a sentence from a sentence set is also non-differentiable. Therefore, when calculating the gradient of the loss L, which will be described later (that is, when back-propagating the loss (error back-propagation)), straight-through gumbel-softmax estimator is used as an approximation of the differential value of the one-hot vector. .. That is, the one-hot vector
- Continuous relaxation that is, relaxation from discrete space to continuous space
- ⁇ ⁇
- y is a vector having y i as an element.
- the language understanding unit 121 of the answer extraction processing unit 102 takes the grounds ⁇ R and the question Q as inputs, and uses the answer model parameters being learned stored in the answer model parameter storage unit 202 to use the vector sequence H. Is output (step S304).
- the language understanding unit 121 performs the same processing as in step S103 of FIG. 2 and outputs the vector sequence H.
- the answer extraction unit 122 of the answer extraction processing unit 102 takes the vector sequence H as an input and outputs the answer ⁇ A using the answer model parameter being learned stored in the answer model parameter storage unit 202 (). Step S305).
- the answer extraction unit 122 performs the same processing as in step S104 of FIG. 2 and outputs the answer ⁇ A.
- the parameter learning unit 103 calculates the loss L including the loss of the answer A (step S306).
- the loss of answer A is originally a probability distribution.
- ⁇ C , ⁇ N and ⁇ E are hyperparameters.
- Regularization term information L C is extracted as grounds represent penalties for not to contain information that questions mentioned. Regularization term L C is,
- L Q is the length of the question
- l R is the length of the sentence in which all the sentences included in the basis are linked.
- L C is the word i in question, is intended to include semantically close words j over one word in the extracted text as the basis.
- the regularization term L N represents a penalty for not including an answer in the rationale.
- the regularization term L N is
- L N is a value obtained by giving ReLU (Rectified Linear Unit) as an activation function to the difference in score (sentence score).
- the loss function used in the ranking problem may be used.
- the regularization term L N is
- RankNet see, for example, Reference 2 "C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and GN Hullender. Learning to rank using gradient descent. In ICML, pp. 89-96, 2005. ”, etc.
- Regularization term L E is the entropy regularization often used in reinforcement learning or the like
- the regularization term L E corresponds to the negative entropy Extraction of sentences which to base in one time. Increasing the entropy has the effect of expanding the search range for sentence extraction and stabilizing learning.
- the parameter learning unit 103 learns the basis model parameter and the answer model parameter using the loss L calculated in step S306 above (step S307). That is, the parameter learning unit 103 calculates the value of the loss L and its gradient, and updates the basis model parameter and the answer model parameter so that the value of the loss L is minimized.
- the parameter learning unit 103 determines whether or not all the training data in the training data set has been selected as the processing target (step S308). If there is training data that has not yet been selected as the processing target (NO in step S308), the parameter learning unit 103 returns to step S301. As a result, the above steps S301 to S307 are executed for all the training data in the training data set.
- the parameter learning unit 103 determines whether or not the convergence condition is satisfied (step S309).
- the convergence condition for example, the number of times (the number of repetitions) of the above steps S301 to S307 being processed is equal to or more than a predetermined number of times.
- the question answering device 10 at the time of learning uses the training data including the reference text P, the question Q, and the correct answer A as input (that is, even if the correct answer basis R is not input). You can learn the rationale model parameters and the answer model parameters.
- unsupervised learning it is preferable to perform pre-learning for stable learning. If the correct answer basis R exists, the above-mentioned supervised learning may be regarded as pre-learning. On the other hand, when the correct answer basis R does not exist, pre-learning may be performed by semi-supervised learning using a pseudo correct answer basis.
- Such a pseudo-correct answer basis may be, for example, a set of sentences in which the value of this label is equal to or higher than a predetermined threshold value after giving each sentence a label indicating the basis of the sentence.
- the value of this label may be determined by an arbitrary formula, and for example, the TF-IDF similarity between the sentence and the question can be used. Further, at least one statement of the statements contained in the set of sentences S A containing the answer, to be included in the pseudo-answer basis.
- Hotpot QA is a dataset with correct answer grounds (that is, teacher data of grounds).
- Question Q is created to ask the content that spans two paragraphs in Wikipedia.
- the reference text P is a text in which the two paragraphs are connected.
- the output is answer A and rationale R.
- Answer A is one of the answer labels ⁇ yes, no, span ⁇ and the answer section (answer range).
- the response interval exists only when the response label is'span'. Therefore, in the response model, in addition to the classification of the response section, the response label was also classified. Since the question Q is limited to questions that ask the content that spans two paragraphs, the basis R is two or more sentences.
- HotpotQA for example, Reference 3 "Z. Yang, P. Qi, S. Zhang, Y. Bengio, WW Cohen, R. Salakhutdinov, and CD Manning. HotpotQA: A dataset for diverse, explainable multi-hop. question answering. In EMNLP, pp. 2369-2380, 2018. ”etc.
- the baseline model is a model without a basis model, and the reference text P and the question Q are directly input to the answer model.
- supervised learning and additional learning in which unsupervised learning was performed after supervised learning were evaluated.
- the batch size is 60
- the number of epochs is 3
- the optimization method is Adam
- the learning rate is 5e-5.
- unsupervised learning the number of epochs is 1, the learning rate is 5e-6, and ⁇ is 0.001.
- Learning was performed using a GPU (Graphics Processing Unit) with ⁇ C as 0, ⁇ N as 1, and ⁇ E as 0.001.
- Table 1 below shows the evaluation results of the answers and grounds when the experiment was conducted with the above data set and experiment settings.
- Table 1 above shows the evaluation results of EM (Exact match) / F1.
- Table 4 above shows the evaluation results of EM / F1.
- the interpretable machine reading model and the learning performance by additional learning were evaluated. It was confirmed that the accuracy of the interpretable machine reading model is improved by extracting the rationale in the previous stage, compared to the normal machine reading model that uses only the answer model. Furthermore, in the additional learning, it was confirmed that the learning to improve the answer accuracy is progressing in each of the answer model and the basis model.
- the machine reading model By making the machine reading model interpretable, it solves the social problems of conventional machine reading, such as convincing the user, clarifying the source, and verifying the facts. be able to. Further, by extending the additional learning described in the present embodiment to unsupervised learning from scratch, it is possible to extract the rationale even in a data set having no rationale teacher data.
- the rationale is extracted by the rationale model and then the answer is extracted by the answer model. More generally, after the first substring is extracted (or searched) by the first model, the first substring is extracted. It can be applied to any task realized by the process of extracting the second substring from the first substring in the second model based on a predetermined condition. For example, it can be applied to tasks such as searching a paragraph from a sentence in the first model and reading comprehension (answer extraction, etc.) for the paragraph in the second model.
- FIG. 7 is a diagram showing an example of the hardware configuration of the question answering device 10 according to the present embodiment.
- the question-and-answer device 10 is realized by a general computer or computer system, and includes an input device 301, a display device 302, an external I / F 303, and a communication I / F 304. It has a processor 305 and a memory device 306. Each of these hardware is communicably connected via bus 307.
- the input device 301 is, for example, a keyboard, a mouse, a touch panel, or the like.
- the display device 302 is, for example, a display or the like.
- the question answering device 10 does not have to have at least one of the input device 301 and the display device 302.
- the external I / F 303 is an interface with an external device.
- the external device includes a recording medium 303a and the like.
- the question answering device 10 can read or write the recording medium 303a via the external I / F 303.
- the recording medium 303a may store one or more programs that realize each functional unit (grounds extraction processing unit 101, answer extraction processing unit 102, and parameter learning unit 103) of the question answering device 10.
- the recording medium 303a includes, for example, a CD (Compact Disc), a DVD (Digital Versatile Disk), an SD memory card (Secure Digital memory card), a USB (Universal Serial Bus) memory card, and the like.
- a CD Compact Disc
- DVD Digital Versatile Disk
- SD memory card Secure Digital memory card
- USB Universal Serial Bus
- the communication I / F 304 is an interface for the question answering device 10 to connect to the communication network.
- One or more programs that realize each functional unit of the question answering device 10 may be acquired (downloaded) from a predetermined server device or the like via the communication I / F 304.
- the processor 305 is, for example, various arithmetic units such as a CPU (Central Processing Unit) and a GPU. Each functional unit included in the question answering device 10 is realized, for example, by a process in which one or more programs stored in the memory device 306 are executed by the processor 305.
- a CPU Central Processing Unit
- GPU GPU
- the memory device 306 is, for example, various storage devices such as HDD (Hard Disk Drive), SSD (Solid State Drive), RAM (Random Access Memory), ROM (Read Only Memory), and flash memory.
- the basis model parameter storage unit 201 and the answer model parameter storage unit 202 included in the question answering device 10 can be realized by using, for example, the memory device 306.
- a storage device (for example, a database server) in which at least one of the basis model parameter storage unit 201 and the answer model parameter storage unit 202 is connected to the question answering device 10 via a communication network is used. It may be realized.
- the question answering device 10 can realize the above-mentioned inference processing, supervised learning processing, and unsupervised learning processing.
- the hardware configuration shown in FIG. 7 is an example, and the question answering device 10 may have another hardware configuration.
- the question answering device 10 may have a plurality of processors 305 or a plurality of memory devices 306.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
・回答モデル:根拠と質問とを入力として、回答を出力するモデル
ここで、根拠は参照テキストの部分文字列の集合である。本実施形態では、根拠は文の集合であるものとする。ただし、これに限られず、根拠は、文よりも長い文字列(例えば、段落等)の集合であってもよいし、文よりも短い文字列(例えば、フレーズ等)の集合であってもよい。
まず、機械読解モデルのパラメータが学習済みであるものとして、学習済みのパラメータを用いて機械読解モデルにより機械読解を行う場合について説明する。推論時における質問応答装置10には、参照テキストPと、この参照テキストPに関連する質問Qとが入力される。
推論時における質問応答装置10の全体構成について、図1を参照しながら説明する。図1は、推論時における質問応答装置10の全体構成の一例を示す図である。
次に、本実施形態に係る推論処理について、図2を参照しながら説明する。図2は、本実施形態に係る推論処理の一例を示すフローチャートである。
Step1:根拠抽出部112は、質問ベクトルqtを用いて、以下により文iのスコアを得る。
次に、機械読解モデルのパラメータは学習済みでないものとして、このパラメータを教師あり学習により学習する場合について説明する。学習(教師あり学習)時における質問応答装置10には、参照テキストPと、この参照テキストPに関連する質問Qと、この質問Qに対する真の回答の回答範囲を示す正解回答Aと、この正解回答Aの真の根拠を示す正解根拠Rとが含まれる訓練データの集合(訓練データセット)が入力される。
学習(教師あり学習)時における質問応答装置10の全体構成について、図3を参照しながら説明する。図3は、学習(教師あり学習)時における質問応答装置10の全体構成の一例を示す図である。なお、図3では、主に、推論時との相違点について説明し、推論時と同様の構成要素についてはその説明を省略する。
次に、本実施形態に係る教師あり学習処理について、図4を参照しながら説明する。図4は、本実施形態に係る教師あり学習処理の一例を示すフローチャートである。以降では、オンライン学習により根拠モデルパラメータ及び回答モデルパラメータを学習する場合について説明するが、これは一例であって、バッチ学習やミニバッチ学習等も適用可能である。
次に、教師なし学習により機械読解モデルのパラメータを学習する場合について説明する。学習(教師なし学習)時における質問応答装置10には、参照テキストPと、この参照テキストPに関連する質問Qと、この質問Qに対する真の回答の回答範囲を示す正解回答Aとが含まれる訓練データの集合(訓練データセット)が入力される。このように、教師なし学習時には、正解回答Aの真の根拠を示す正解根拠Rが与えられない(つまり、教師なしとは正解根拠Rが与えられないことを意味する。)。このため、正解根拠Rが入手できない又は存在しない場合であっても、機械読解モデルのパラメータを学習することができる。
学習(教師なし学習)時における質問応答装置10の全体構成について、図5を参照しながら説明する。図5は、学習(教師なし学習)時における質問応答装置10の全体構成の一例を示す図である。なお、図5では、主に、推論時との相違点について説明し、推論時と同様の構成要素についてはその説明を省略する。
次に、本実施形態に係る教師なし学習処理について、図6を参照しながら説明する。図6は、本実施形態に係る教師なし学習処理の一例を示すフローチャートである。以降では、オンライン学習により根拠モデルパラメータ及び回答モデルパラメータを学習する場合について説明するが、これは一例であって、バッチ学習やミニバッチ学習等も適用可能である。
以下、本実施形態の評価について説明する。
正解根拠(つまり、根拠の教師データ)を持つデータセットであるHotpotQAを用いて評価した。HotpotQAでは、質問QはWikipedia中の2段落に跨る内容を問うように作成される。参照テキストPは当該2段落を繋げたテキストとした。出力は回答Aと根拠Rである。回答Aは回答ラベル{yes,no,span}のいずれかと回答区間(回答範囲)である。回答区間は、回答ラベルが'span'のときのみ存在する。よって、回答モデルでは回答区間の分類に加えて回答ラベルの分類も行なった。質問Qが2段落に跨る内容を問う質問に限定されているため、根拠Rは2文以上となる。以降では、便宜上、根拠Rに含まれる文のうち、回答Aを含む文を回答文、回答を含まないが回答するために必要な文を補助文と呼ぶ。なお、HotpotQAについては、例えば、参考文献3「Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In EMNLP, pp. 2369-2380, 2018.」等を参照されたい。
本評価では、BERTBaseを用いた3つの手法を比較した。ベースラインモデルは根拠モデルなしのモデルであり、回答モデルに直接参照テキストPと質問Qとを入力する。本実施形態の手法としては、教師あり学習と、教師あり学習の後に教師なし学習を行った追加学習とを評価した。教師あり学習ではバッチサイズを60、エポック数を3、最適化手法をAdam、学習率を5e-5とし、教師なし学習ではエポック数を1、学習率を5e-6、τを0.001、λCを0、λNを1、λEを0.001として、GPU(Graphics Processing Unit)を用いて学習を行った。
以上のデータセットと実験設定で実験を行った場合における回答と根拠の評価結果を以下の表1に示す。
回答精度に関して、本実施形態の手法(教師あり学習、追加学習)はベースラインの性能を上回った。特にベースラインの性能を上回ったことで、回答モデルの前に根拠モデルを用いる解釈可能な機械読解モデルが、回答モデル単体に比べて精度よく回答できることを確認した。これは、根拠モデルが不要なテキストを取り除く効果があり、回答モデルでの推論が容易になるためであると考えられる。
教師あり学習と追加学習での根拠の出力の変化を以下の表2に示す。
回答モデル単体としての性能を評価するため、根拠の予測結果で開発データを4つのドメイン「全て」、「完全一致」、「過剰」、「不足」に分類して評価を行った。「全て」は全ての開発データのドメイン、「完全一致」は教師あり学習と追加学習とで根拠の抽出結果^Rが真の根拠Rに対して完全一致(R=^R)したデータのドメイン、「過剰」は教師あり学習と追加学習とで根拠の抽出結果^Rが真の根拠Rに対して過剰(Rは^Rの真部分集合)となったデータのドメイン、「不足」は教師あり学習と追加学習とで根拠の抽出結果^Rが真の根拠Rに対して不足(^RはRの真部分集合)なったデータのドメインである。回答ラベルが'span'かつ回答文を抽出していないサンプルでは回答が不可能であるため、分析に用いなかった。このときの評価結果を以下の表4に示す。
本実施形態では、解釈可能な機械読解モデルを定義し、教師なし学習での学習手法を初めて提案した。
最後に、本実施形態に係る質問応答装置10のハードウェア構成について、図7を参照しながら説明する。図7は、本実施形態に係る質問応答装置10のハードウェア構成の一例を示す図である。
101 根拠抽出処理部
102 回答抽出処理部
103 パラメータ学習部
111 言語理解部
112 根拠抽出部
121 言語理解部
122 回答抽出部
201 根拠モデルパラメータ記憶部
202 回答モデルパラメータ記憶部
Claims (7)
- テキストと、前記テキストに関連する質問とを入力として、第1のニューラルネットワークのモデルパラメータを用いて、前記テキストに含まれる文字列が前記質問に対する回答の根拠となる尤もらしさを表す根拠スコアを算出し、前記根拠スコアをパラメータとして持つ所定の分布からのサンプリングによって、前記回答の根拠となる文字列の集合を示す第1の集合を前記テキストから抽出する根拠抽出手段と、
前記質問と、前記第1の集合とを入力として、第2のニューラルネットワークのモデルパラメータを用いて、前記回答を前記第1の集合から抽出する回答抽出手段と、
前記回答と前記質問に対する真の回答との間の第1の損失と連続緩和とを用いて誤差逆伝搬により勾配を計算することで、前記第1のニューラルネットワークのモデルパラメータと前記第2のニューラルネットワークのモデルパラメータとを学習する第1の学習手段と、
を有することを特徴とする学習装置。 - 前記第1の学習手段は、
前記根拠抽出手段により抽出された文字列が表す情報が、前記質問が言及する情報を包含しないことへの罰則に関する項と、前記根拠抽出手段により抽出された文字列中に前記回答が含まれないことへの罰則に関する項とが含まれる損失関数により前記第1の損失を計算する、ことを特徴とする請求項1に記載の学習装置。 - 前記根拠抽出手段は、
前記テキストと、前記テキストに関連する質問と、前記質問に対する回答の正解根拠である文字列の集合を示す第2の集合とを入力として、前記第1のニューラルネットワークのモデルパラメータを用いて、前記第1の集合に含まれる文字列を、前記第2の集合から抽出し、
前記学習装置は、前記第1の集合と前記第2の集合との間の第2の損失を用いて、前記第1のニューラルネットワークのモデルパラメータを学習すると共に、前記回答と前記真の回答との間の第3の損失を用いて、前記第2のニューラルネットワークのモデルパラメータを学習する第2の学習手段を更に有し、
前記第1の学習手段による学習は前記第2の学習手段による学習の後に行われる、ことを特徴とする請求項1又は2に記載の学習装置。 - テキストと、前記テキストに関連する質問とを入力として、第1のニューラルネットワークのモデルパラメータを用いて、前記質問に対する回答の根拠となる文字列の集合を前記テキストから抽出する根拠抽出手段と、
前記質問と、前記集合とを入力として、第2のニューラルネットワークのモデルパラメータを用いて、前記回答を前記集合から抽出する回答抽出手段と、
を有することを特徴とする情報処理装置。 - テキストと、前記テキストに関連する質問とを入力として、第1のニューラルネットワークのモデルパラメータを用いて、前記テキストに含まれる文字列が前記質問に対する回答の根拠となる尤もらしさを表す根拠スコアを算出し、前記根拠スコアをパラメータとして持つ所定の分布からのサンプリングによって、前記回答の根拠となる文字列の集合を示す第1の集合を前記テキストから抽出する根拠抽出手順と、
前記質問と、前記第1の集合とを入力として、第2のニューラルネットワークのモデルパラメータを用いて、前記回答を前記第1の集合から抽出する回答抽出手段と、
前記回答と前記質問に対する真の回答との間の第1の損失と連続緩和とを用いて誤差逆伝搬により勾配を計算することで、前記第1のニューラルネットワークのモデルパラメータと前記第2のニューラルネットワークのモデルパラメータとを学習する第1の学習手順と、
をコンピュータが実行することを特徴とする学習方法。 - テキストと、前記テキストに関連する質問とを入力として、第1のニューラルネットワークのモデルパラメータを用いて、前記質問に対する回答の根拠となる文字列の集合を前記テキストから抽出する根拠抽出手順と、
前記質問と、前記集合とを入力として、第2のニューラルネットワークのモデルパラメータを用いて、前記回答を前記集合から抽出する回答抽出手順と、
をコンピュータが実行することを特徴とする情報処理方法。 - コンピュータを、請求項1乃至3の何れか一項に記載の学習装置における各手段、又は、請求項4に記載の情報処理装置における各手段、として機能させるためのプログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2020/009806 WO2021176714A1 (ja) | 2020-03-06 | 2020-03-06 | 学習装置、情報処理装置、学習方法、情報処理方法及びプログラム |
JP2022504937A JP7452623B2 (ja) | 2020-03-06 | 2020-03-06 | 学習装置、情報処理装置、学習方法、情報処理方法及びプログラム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2020/009806 WO2021176714A1 (ja) | 2020-03-06 | 2020-03-06 | 学習装置、情報処理装置、学習方法、情報処理方法及びプログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021176714A1 true WO2021176714A1 (ja) | 2021-09-10 |
Family
ID=77613967
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2020/009806 WO2021176714A1 (ja) | 2020-03-06 | 2020-03-06 | 学習装置、情報処理装置、学習方法、情報処理方法及びプログラム |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP7452623B2 (ja) |
WO (1) | WO2021176714A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220245322A1 (en) * | 2021-01-29 | 2022-08-04 | Salesforce.Com, Inc. | Machine-learning based generation of text style variations for digital content items |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019220142A (ja) * | 2018-06-18 | 2019-12-26 | 日本電信電話株式会社 | 回答学習装置、回答学習方法、回答生成装置、回答生成方法、及びプログラム |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6721179B2 (ja) | 2016-10-05 | 2020-07-08 | 国立研究開発法人情報通信研究機構 | 因果関係認識装置及びそのためのコンピュータプログラム |
-
2020
- 2020-03-06 WO PCT/JP2020/009806 patent/WO2021176714A1/ja active Application Filing
- 2020-03-06 JP JP2022504937A patent/JP7452623B2/ja active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019220142A (ja) * | 2018-06-18 | 2019-12-26 | 日本電信電話株式会社 | 回答学習装置、回答学習方法、回答生成装置、回答生成方法、及びプログラム |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220245322A1 (en) * | 2021-01-29 | 2022-08-04 | Salesforce.Com, Inc. | Machine-learning based generation of text style variations for digital content items |
US11694018B2 (en) * | 2021-01-29 | 2023-07-04 | Salesforce, Inc. | Machine-learning based generation of text style variations for digital content items |
Also Published As
Publication number | Publication date |
---|---|
JPWO2021176714A1 (ja) | 2021-09-10 |
JP7452623B2 (ja) | 2024-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Obamuyide et al. | Zero-shot relation classification as textual entailment | |
Weston et al. | Memory networks | |
CN111914067B (zh) | 中文文本匹配方法及系统 | |
JP7247878B2 (ja) | 回答学習装置、回答学習方法、回答生成装置、回答生成方法、及びプログラム | |
Schuster et al. | Humpty dumpty: Controlling word meanings via corpus poisoning | |
Liang et al. | Distractor generation with generative adversarial nets for automatically creating fill-in-the-blank questions | |
JP7276498B2 (ja) | 情報処理装置、情報処理方法及びプログラム | |
Lebret | Word embeddings for natural language processing | |
JPWO2019106965A1 (ja) | 情報処理装置、情報処理方法、及びプログラム | |
Nassiri et al. | Arabic L2 readability assessment: Dimensionality reduction study | |
CN112581327A (zh) | 基于知识图谱的法律推荐方法、装置和电子设备 | |
Shah et al. | Simple App Review Classification with Only Lexical Features. | |
Jeon et al. | Dropout prediction over weeks in MOOCs via interpretable multi-layer representation learning | |
WO2021176714A1 (ja) | 学習装置、情報処理装置、学習方法、情報処理方法及びプログラム | |
EP4030355A1 (en) | Neural reasoning path retrieval for multi-hop text comprehension | |
Seilsepour et al. | Self-supervised sentiment classification based on semantic similarity measures and contextual embedding using metaheuristic optimizer | |
Zhao et al. | Finding answers from the word of god: Domain adaptation for neural networks in biblical question answering | |
Mahmoodvand et al. | Semi-supervised approach for Persian word sense disambiguation | |
Sen et al. | Support-BERT: predicting quality of question-answer pairs in MSDN using deep bidirectional transformer | |
JP6899973B2 (ja) | 意味関係学習装置、意味関係学習方法、及び意味関係学習プログラム | |
Arifin et al. | Automatic essay scoring for Indonesian short answers using siamese Manhattan long short-term memory | |
WO2022079826A1 (ja) | 学習装置、情報処理装置、学習方法、情報処理方法及びプログラム | |
WO2018066083A1 (ja) | 学習プログラム、情報処理装置および学習方法 | |
US20220318230A1 (en) | Text to question-answer model system | |
Mustar et al. | IRnator: A Framework for Discovering Users Needs from Sets of Suggestions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20922654 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022504937 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 17908898 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20922654 Country of ref document: EP Kind code of ref document: A1 |