KR102355591B1

KR102355591B1 - Method and apparatus of generating question-answer learning model through reinforcement learning

Info

Publication number: KR102355591B1
Application number: KR1020190176327A
Authority: KR
Inventors: 김동환; 정우태; 성기봉
Original assignee: 주식회사 포티투마루
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2022-01-26
Also published as: KR20210083731A

Abstract

본 발명은 질의에 대한 응답을 생성하는 장치에서, 강화 학습을 통한 질의응답 모델을 운용하는 방법에 대한 것으로, 제 1 에이전트에서, 임의의 문단에서 잠재변수(Latent variable)를 샘플링하는 단계; 상기 잠재변수를 기초로 상기 문단으로부터 질의 및 응답의 데이터 셋을 추출하는 단계; 제 2 에이전트에서, 추출된 질의 및 응답의 데이터 셋을 임의의 질의에 대한 응답을 생성하는 질의응답 모델의 학습에 적용할지 여부를 결정하는 단계; 및 상기 질의응답 모델의 성능의 변경값을 상기 제 1 에이전트 및 상기 제 2 에이전트에 리워드로 적용하는 단계를 포함하는 것을 특징으로 한다.The present invention relates to a method of operating a question-and-answer model through reinforcement learning in an apparatus for generating a response to a query, comprising: sampling a latent variable in an arbitrary paragraph in a first agent; extracting a data set of questions and answers from the paragraphs based on the latent variables; determining, in the second agent, whether to apply the extracted query and response data set to training of a question-and-answer model that generates a response to an arbitrary query; and applying the performance change value of the question-and-answer model to the first agent and the second agent as a reward.

Description

Method and device for generating a question-and-answer learning model through reinforcement learning {METHOD AND APPARATUS OF GENERATING QUESTION-ANSWER LEARNING MODEL THROUGH REINFORCEMENT LEARNING}

본 발명은 질의에 대한 응답을 생성하는 QA 모델의 성능을 높이기 위한 데이터 셋 생성 방법 및 장치에 대한 것이다. 보다 구체적으로 본 발명은 임의의 컨텍스트 (context)로부터 질의와 응답 페어를 추출하되, QA 모델이 상기 질의 응답 페어를 학습하여 성능을 높일 수 있도록 질의 응답 학습 데이터를 추출하는 방법에 대한 것이다. The present invention relates to a method and apparatus for generating a data set for improving the performance of a QA model that generates a response to a query. More specifically, the present invention relates to a method of extracting a question-and-answer pair from an arbitrary context, but extracting question-and-answer learning data so that a QA model can learn the question-and-answer pair to improve performance.

최근 기술의 발달로 인해 딥러닝 기술을 응용한 다양한 질의응답 시스템이 개발되어 왔다. 최근에는, 사용자가 자연어로 질의를 시스템에 부여하면 질의응답(QA) 모델을 이용하여 질의에 맞는 대답이 자동으로 이루어 지도록 하는 시스템이 시연되고 있다. 그런데 이러한 딥러닝 기반의 질의응답(QA) 모델은 결국은 많은 양의 학습 데이터를 필요하므로, 일부 특정된 분야나 주제에 대한 응답을 자동으로 하도록 할 수밖에 없었다. 나아가 질의응답(QA) 모델의 신뢰도가 일정 수준 이상 도달하기 위해 필요한 학습데이터를 구축하기에는 상당히 많은 시간과 비용이 필요한 문제가 있었다.Due to the recent development of technology, various question-and-answer systems using deep learning technology have been developed. Recently, when a user assigns a question to the system in natural language, a system has been demonstrated that automatically provides an answer to the question using a question-and-answer (QA) model. However, this deep learning-based question and answer (QA) model eventually requires a large amount of learning data, so it had no choice but to automatically respond to some specific fields or topics. Furthermore, there was a problem in that a considerable amount of time and money was required to construct the training data required to reach a certain level or higher of reliability of the question-and-answer (QA) model.

한편 근래의 질의응답 학습모델의 경우, 대량의 말뭉치에서 언어모델객체(language model objective) 등을 이용하여 사전 트레이닝을 수행하기 때문에, 이미 성능이 상당히 높다. 따라서, 단순히 더 많은 데이터를 학습에 사용할수록 선형적으로 질의응답 학습모델의 성능이 비약적으로 향상되는 것은 아니다. 따라서, 기계학습 성능을 향상시키기 위한 효율적인 샘플 데이터를 확보하는 것이 매우 중요해지고 있다.On the other hand, in the case of a recent question-and-answer learning model, since pre-training is performed using a language model objective in a large corpus, the performance is already quite high. Therefore, the performance of the Q&A learning model does not improve linearly by simply using more data for learning. Therefore, it is becoming very important to secure efficient sample data to improve machine learning performance.

등록특허공보 제10-1605430호, 2016.03.22.Registered Patent Publication No. 10-1605430, 2016.03.22.

본 발명은 질의에 대한 응답을 생성하는 QA 모델의 학습 데이터를 임의의 컨텍스트로부터 효율적으로 추출하는 방법을 제공하는 것을 목적으로 한다. 나아가 본 발명은 업데이트된 QA 모델의 성능을 질의, 응답 페어에 대한 데이터를 추출하는 기능을 수행하는 모델에 리워드로 부여하여, QA 모델의 성능을 효율적으로 높이는 학습 데이터 세트를 생성하는 방법을 제공하는 것을 목적으로 한다. An object of the present invention is to provide a method for efficiently extracting training data of a QA model that generates a response to a query from an arbitrary context. Furthermore, the present invention provides a method of generating a training data set that efficiently improves the performance of the QA model by giving the performance of the updated QA model as a reward to a model that performs a function of extracting data for a query and response pair. aim to

추가적으로 본 발명이 해결하고자 하는 과제는 질의와 응답 쌍의 결합 분포를 사용자가 수작업으로 수집하여 학습시켜야 했던 기존의 질의응답(QA) 모델과는 다르게, 응답 후보 군이 주어지지 않아도 입력된 문단을 기초로 나올 수 있는 질의응답 데이터 셋을 동시에 생성하는 질의응답 생성모델 및 이를 이용하여 자동으로 학습 가능한 질의응답 학습모델을 생성하는 방법 및 장치를 제공하는 것이다. In addition, the problem to be solved by the present invention is different from the existing question-and-answer (QA) model, in which the user had to manually collect and learn the joint distribution of question and response pairs, based on the inputted paragraph even if no response candidate group is given. To provide a question-and-answer generation model that simultaneously generates a question-and-answer data set that can appear as

본 발명의 실시예를 따르는 질의에 대한 응답을 생성하는 장치에서, 강화 학습을 통한 질의응답 모델을 운용하는 방법은, 제 1 에이전트에서, 임의의 문단에서 잠재변수(Latent variable)를 샘플링하는 단계; 상기 잠재변수를 기초로 상기 문단으로부터 질의 및 응답의 데이터 셋을 추출하는 단계; 제 2 에이전트에서, 추출된 질의 및 응답의 데이터 셋을 임의의 질의에 대한 응답을 생성하는 질의응답 모델의 학습에 적용할지 여부를 결정하는 단계; 및 상기 질의응답 모델의 성능의 변경값을 상기 제 1 에이전트 및 상기 제 2 에이전트에 리워드로 적용하는 단계를 포함할 수 있다.In the apparatus for generating a response to a query according to an embodiment of the present invention, a method of operating a question-and-answer model through reinforcement learning includes, in a first agent, sampling a latent variable in an arbitrary paragraph; extracting a data set of questions and answers from the paragraphs based on the latent variables; determining, in the second agent, whether to apply the extracted query and response data set to training of a question-and-answer model that generates a response to an arbitrary query; and applying the performance change value of the Q&A model to the first agent and the second agent as a reward.

나아가 상술한 과제를 해결하기 위한 본 발명의 일 면에 따른 적대적 학습을 통한 질의응답 학습모델 생성 방법은, 입력되는 문단에서 제약조건을 기초로, 잠재변수(Latent variable)를 샘플링하는 단계; 상기 잠재변수를 기초로 응답을 생성하는 단계; 상기 응답을 기초로 질의를 생성하는 단계; 및 상기 생성된 질의와 응답의 데이터 셋을 이용하여 질의응답 학습모델을 기계 학습하는 단계를 포함하고, 상기 제약조건은, 상기 잠재변수가 질의응답 학습모델의 로스(loss)는 높이면서도, 데이터 다양체(manifold) 내에 존재하도록 제어 될 수 있다.Furthermore, a method for generating a question-and-answer learning model through adversarial learning according to an aspect of the present invention for solving the above-described problem includes: sampling a latent variable based on a constraint in an input paragraph; generating a response based on the latent variable; generating a query based on the response; and machine-learning a question-and-answer learning model using the generated question-and-answer data set, wherein the constraint is that the latent variable increases the loss of the question-and-answer learning model while increasing the data diversity. (manifold) can be controlled to exist.

또한, 상기 잠재변수를 샘플링 하는 단계는, 입력되는 문단에서 각 단어에 대한 은닉표현(Hidden representation)을 생성하여 상기 메모리에 저장하는 단계; 및 상기 은닉표현을 기반으로 잠재변수(Latent variable)를 샘플링하는 단계를 더 포함할 수 있다.In addition, the sampling of the latent variable may include: generating a hidden representation for each word in an input paragraph and storing it in the memory; and sampling a latent variable based on the hidden expression.

또한, 상기 응답을 생성하는 단계는, 상기 샘플링 된 잠재변수를 쿼리로 하는 어텐션(Attention)을 통해 상기 메모리에 저장된 은닉표현의 중요도를 계산하여 가중합 벡터를 생성하는 단계; 및 상기 잠재변수와 상기 가중합 벡터를 기초로 응답 스팬(span)을 생성하는 단계를 더 포함할 수 있다.In addition, generating the response may include: generating a weighted sum vector by calculating the importance of the hidden expression stored in the memory through attention using the sampled latent variable as a query; and generating a response span based on the latent variable and the weighted sum vector.

또한, 상기 입력되는 문단은 웹사이트에서 크롤링된 데이터에서 추출된 인포박스 데이터를 구조화할 수 있다.In addition, the inputted paragraph may structure the infobox data extracted from the crawled data on the website.

또한, 상기 크롤링된 데이터에서 인포박스 데이터가 추출되지 않은 경우, 컬럼인식을 통해 인식된 테이블에서 데이터 빈도를 기반으로 정형화된 단락이 상기 문단으로 입력될 수 있다.In addition, when the infobox data is not extracted from the crawled data, a standardized paragraph based on the data frequency in the table recognized through column recognition may be input as the paragraph.

또한, 상기 입력되는 문단은, 기계 번역이 완료된 데이터이고, 상기 생성된 질의와 대답 데이터 셋을 기반으로, 대답의 정확도를 평가하여 상기 기계 번역의 품질을 평가하는 단계를 더 포함할 수 있다.The method may further include evaluating the quality of the machine translation by evaluating the accuracy of an answer based on the inputted paragraph being data on which machine translation has been completed, and based on the generated query and answer data set.

본 발명의 다른 일면에 따른 적대적 학습을 통한 질의응답 학습모델 생성장치는, 하나 이상의 프로세서; 및 상기 하나 이상의 프로세서에 의한 실행 시, 상기 하나 이상의 프로세서가 연산을 수행하도록 하는 명령들이 저장된 하나 이상의 메모리를 포함하고, 상기 하나 이상의 프로세서에 의해 수행되는 상기 연산은, 입력되는 문단에서 제약조건(constraints)을 기초로 잠재변수(Latent variable)를 샘플링하는 연산; 상기 잠재변수를 기초로 응답을 생성하는 연산; 상기 응답을 기초로 질의를 생성하는 연산; 및 상기 생성된 질의와 응답의 데이터 셋을 이용하여 질의응답 학습모델을 기계 학습하는 연산을 포함할 수 있다.An apparatus for generating a question-and-answer learning model through adversarial learning according to another aspect of the present invention includes: one or more processors; and one or more memories storing instructions that, when executed by the one or more processors, cause the one or more processors to perform an operation, wherein the operation performed by the one or more processors is performed according to constraints in an input paragraph. ) for sampling a latent variable based on; an operation for generating a response based on the latent variable; an operation for generating a query based on the response; and machine learning the question-and-answer learning model using the generated query and response data set.

또한, 상기 제약조건은, 상기 잠재변수가 질의응답 학습모델의 로스(loss)는 높이면서도, 데이터 다양체(manifold) 내에 존재하도록 제어될 수 있다.In addition, the constraint may be controlled so that the latent variable exists in a data manifold while increasing the loss of the Q&A learning model.

또한, 상기 잠재변수를 샘플링 하는 연산은, 상기 입력되는 문단에서 각 단어에 대한 은닉표현(Hidden representation)을 생성하여 메모리에 저장하는 연산; 및 상기 은닉표현을 기반으로 잠재변수(Latent variable)를 샘플링하는 연산을 더 포함할 수 있다.In addition, the operation of sampling the latent variable may include an operation of generating a hidden representation for each word in the input paragraph and storing it in a memory; and an operation of sampling a latent variable based on the hidden expression.

또한, 상기 응답을 생성하는 연산은, 상기 샘플링 된 잠재변수를 쿼리로 하는 어텐션(Attention)을 통해 상기 메모리에 저장된 은닉표현의 중요도를 계산하여 가중합 벡터를 생성하는 연산; 및 상기 잠재변수와 상기 가중합 벡터를 기초로 응답(span)을 생성하는 연산을 더 포함할 수 있다.In addition, the operation for generating the response may include: an operation of generating a weighted sum vector by calculating the importance of the hidden expression stored in the memory through attention using the sampled latent variable as a query; and an operation of generating a span based on the latent variable and the weighted sum vector.

또한, 다른 실시 예에 따라 하드웨어인 컴퓨터와 결합되어, 문단입력에 기초한 질의응답 데이터셋 생성을 실행시키기 위하여 매체에 저장된, 프로그램이 포함될 수 있다.Also, according to another embodiment, a program stored in a medium may be included in combination with a computer, which is hardware, to generate a question-and-answer data set based on a paragraph input.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the invention are included in the detailed description and drawings.

본 발명의 실시예를 따르면, 임의의 컨텍스트로부터 QA 모델의 학습을 위한 질의와 응답의 데이터셋을 자동으로 생성하기 때문에, 하나의 문단 입력 만으로도 여러 쌍의 질의 응답의 쌍을 샘플링 할 수 있어, 이전의 모델보다 더 다양한 데이터를 구축할 수 있다. 또한 이렇게 구축된 데이터를 기계학습 함으로써 질의응답 학습모델의 성능을 향상시킬 수 있다.According to an embodiment of the present invention, since a data set of questions and responses for training QA model is automatically generated from an arbitrary context, it is possible to sample several pairs of questions and responses with only one paragraph input. It is possible to build more diverse data than the model of In addition, the performance of the Q&A learning model can be improved by machine learning the constructed data.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1a은 일 실시 예에 따른 기계 학습 질의응답 학습모델을 개략적으로 설명하기 위한 개념도이다.
도 1b는 실시예에 따른 적대적 학습을 통한 질의응답 학습 모델을 개략적으로 설명하기 위한 개념도이다
도 2는 일 실시 예에 따른 질의 응답 모델의 성능이 향상되는 방법을 설명하기 위한 도이다.
도 3a는 문단으로부터 질의응답 데이터셋을 추출하고 QA 모델을 학습하는 방법을 설명하는 순서도이다.
도 3b은 일 실시 예에 따른 문단입력에 기초한 질의응답 데이터셋을 생성하는 방법을 설명하는 순서도이다.
도 4는 문단입력에 기초한 질의응답 데이터셋 생성모델을 개략적으로 설명하기 위한 개념도이다.
도 5는 일 실시 예에 따른 단락에서 질의응답 데이터셋을 생성하는 방법의 일 예를 설명하는 도이다.
도 6은 일 실시 예에 따라 정형화된 인포박스에서 질의응답 데이터셋을 생성하는 방법을 설명하기 위한 도이다.
도 7은 일 실시 예에 따라 비정형화된 페이지에서 질의응답 데이터셋을 생성하는 방법을 설명하기 위한 도이다.
도 8은 일 실시 예에 따른 질의응답 데이터셋 생성장치의 내부 구성을 개략적으로 나타낸 블록 도이다.1A is a conceptual diagram schematically illustrating a machine learning Q&A learning model according to an embodiment.
1B is a conceptual diagram schematically illustrating a question-and-answer learning model through adversarial learning according to an embodiment.
2 is a diagram for explaining a method for improving the performance of a question and answer model according to an embodiment.
3A is a flowchart illustrating a method of extracting a Q&A dataset from a paragraph and learning a QA model.
3B is a flowchart illustrating a method of generating a question-and-answer data set based on a paragraph input according to an exemplary embodiment.
4 is a conceptual diagram schematically illustrating a question-and-answer dataset generation model based on a paragraph input.
5 is a diagram for explaining an example of a method of generating a question and answer dataset in a paragraph according to an embodiment.
6 is a diagram for explaining a method of generating a question and answer data set in a standardized infobox according to an embodiment.
7 is a diagram for explaining a method of generating a question-and-answer data set from an unstructured page according to an embodiment.
8 is a block diagram schematically illustrating an internal configuration of an apparatus for generating a question-and-answer data set according to an embodiment.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시 예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시 예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention, and a method for achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only those of ordinary skill in the art to which the present invention pertains, to complete the disclosure of the present invention. It is provided to fully understand the scope of the present invention, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시 예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terminology used herein is for the purpose of describing the embodiments and is not intended to limit the present invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” and/or “comprising” does not exclude the presence or addition of one or more other components in addition to the stated components. Like reference numerals refer to like elements throughout, and "and/or" includes each and every combination of one or more of the recited elements. Although "first", "second", etc. are used to describe various elements, these elements are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first component mentioned below may be the second component within the spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used herein will have the meaning commonly understood by those of ordinary skill in the art to which this invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

공간적으로 상대적인 용어인 "아래(below)", "아래(beneath)", "하부(lower)", "위(above)", "상부(upper)" 등은 도면에 도시되어 있는 바와 같이 하나의 구성요소와 다른 구성요소들과의 상관관계를 용이하게 기술하기 위해 사용될 수 있다. 공간적으로 상대적인 용어는 도면에 도시되어 있는 방향에 더하여 사용시 또는 동작 시 구성요소들의 서로 다른 방향을 포함하는 용어로 이해되어야 한다. 예를 들어, 도면에 도시되어 있는 구성요소를 뒤집을 경우, 다른 구성요소의 "아래(below)"또는 "아래(beneath)"로 기술된 구성요소는 다른 구성요소의 "위(above)"에 놓여 질 수 있다. 따라서, 예시적인 용어인 "아래"는 아래와 위의 방향을 모두 포함할 수 있다. 구성요소는 다른 방향으로도 배향될 수 있으며, 이에 따라 공간적으로 상대적인 용어들은 배향에 따라 해석될 수 있다.Spatially relative terms "below", "beneath", "lower", "above", "upper", etc. It can be used to easily describe the correlation between a component and other components. Spatially relative terms should be understood as terms including different directions of components during use or operation in addition to the directions shown in the drawings. For example, when a component shown in the drawing is turned over, a component described as "beneath" or "beneath" of another component is placed "above" of the other component. can get Accordingly, the exemplary term “below” may include both directions below and above. Components may also be oriented in other orientations, and thus spatially relative terms may be interpreted according to orientation.

이하, 첨부된 도면을 참조하여 본 발명의 실시 예를 상세하게 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 일 실시 예에 따른 기계 학습을 통한 질의응답 학습모델을 개략적으로 설명하기 위한 개념도이다.1 is a conceptual diagram schematically illustrating a question-and-answer learning model through machine learning according to an embodiment.

도 1a을 참조하면, 본 발명의 실시예에 따른 강화 학습을 이용하는 질의응답 학습모델 (100)은, 문단에서 질의 응답 세트를 추출하는 기능을 수행하는 디코더(140), 임의의 질의에 대한 응답을 제공하는 기능을 수행하는 QA 모델(170) 및 문단에서 잠재변수를 추출하는 제 1 에이전트 (122)와 문단에서 추출한 응답 및 질의 데이터 셋을 QA 모델 (170)에 적용할지 여부를 결정하는 제 2 에이전트 (160)를 포함할 수 있다. Referring to FIG. 1A , a question-and-answer learning model 100 using reinforcement learning according to an embodiment of the present invention includes a decoder 140 that performs a function of extracting a question-and-answer set from a paragraph, and a response to an arbitrary query. The first agent 122 that extracts the latent variable from the QA model 170 and the paragraph that performs the provided function, and the second agent that determines whether to apply the response and query data set extracted from the paragraph to the QA model 170 (160).

보다 구체적으로, 문단 (101)이 입력되면, 제 1 에이전트 (122)는 상기 문단의 컨텍스트로부터 잠재변수 (103)를 생성할 수 있다. 상기 잠재 변수 (103)는 문단 (101)에 포함된 컨텍스트들을 처리하여 문단을 벡터 형태로 표현한 것을 포함한다. More specifically, when a paragraph 101 is input, the first agent 122 may generate a latent variable 103 from the context of the paragraph. The latent variable 103 includes processing the contexts included in the paragraph 101 and expressing the paragraph in a vector form.

잠재변수(103)가 디코더(140)에 입력되면, 디코더(140)는 잠재변수(103)를 기반으로 질의 및 응답의 데이터 셋을 생성할 수 있다. When the latent variable 103 is input to the decoder 140 , the decoder 140 may generate a data set of questions and answers based on the latent variable 103 .

제 2 에이전트 (160)는 생성된 질의 및 응답의 데이터 셋을 QA 모델 (170)의 학습에 적용할지 여부를 결정할 수 있다. The second agent 160 may determine whether to apply the generated data set of the query and response to the learning of the QA model 170 .

디코더(140)에서 생성한 질의 및 응답 페어, 즉 QA 페어를 바로 QA 모델 (170)의 학습에 사용하는 것은 두가지 문제가 있다. 첫째로, 생성된 QA 페어 중의 대부분은 QA 모델이 이미 학습한 데이터와 비슷한 것이 많기 때문에 생성된 QA 페어를 추가하여 QA 모델 (170)을 학습시키면 오버피팅 (overfitting)의 위험이 있다. 둘째로 생성된 QA 페어는 노이즈가 존재하기 때문에 노이즈를 정제하지 않는다면 QA 모델의 성능을 저하시킬 위험이 있다.There are two problems in using the query and response pair generated by the decoder 140 , that is, the QA pair, for training the QA model 170 . First, since most of the generated QA pairs are similar to the data already trained by the QA model, there is a risk of overfitting when the QA model 170 is trained by adding the generated QA pairs. Second, since noise is present in the generated QA pair, there is a risk of degrading the performance of the QA model if the noise is not refined.

따라서 본 발명의 실시예를 따르면 제 2 에이전트 (160)에서 디코더 (140)에서 생성한 QA 페어를 QA 모델 (170)의 학습에 적용할지 여부를 결정할 수 있다. 즉, 본 발명의 실시예를 따르면 제 2 에이전트에서 QA 모델의 성능을 높일 수 있는 QA 페어만 남기고 나머지 데이터는 노이즈로 필터링하기 때문에, 전술한 노이즈에 대한 문제들이 해결될 수 있다. Therefore, according to an embodiment of the present invention, the second agent 160 may determine whether to apply the QA pair generated by the decoder 140 to the learning of the QA model 170 . That is, according to the embodiment of the present invention, since the second agent leaves only a QA pair capable of improving the performance of the QA model and filters the remaining data with noise, the above-described noise problems can be solved.

나아가 본 발명의 실시예를 따르면 제 1 에이전트 (122)는 미리 정의된 룰 (role)을 따라 컨텍스트에서 잠재 변수 (latent variable)를 생성하는 것이 아니라, QA 모델의 학습에 도움이 되는 잠재변수를 생성하도록 학습하는 기계학습 모델에 해당한다. 따라서 제 1 에이전트에서 QA 모델의 성능을 높일 수 있는 잠재변수 공간을 형성하기 때문에 전술한 오버피팅 문제를 해결할 수 있다.Furthermore, according to an embodiment of the present invention, the first agent 122 does not generate a latent variable in the context according to a predefined rule, but generates a latent variable helpful in learning the QA model. It corresponds to a machine learning model that learns to do so. Therefore, since the first agent forms a latent variable space that can increase the performance of the QA model, the above-mentioned overfitting problem can be solved.

이후 QA 모델(170)은 제 2 에이전트 (160)에서 제공한 QA 페어를 학습하여 업데이트될 수 있다. Thereafter, the QA model 170 may be updated by learning the QA pair provided by the second agent 160 .

한편, 본 발명의 실시예를 따르면, 제 1 에이전트 (122)는 임의의 문단 (101)에 포함된 컨텍스트들을 처리하여 벡터 형태로 표현하여 잠재 변수 (103)를 생성하는 기능을 수행하는 기계학습 모델이다. 나아가 제 2 에이전트 (160)는 생성된 질의 및 응답의 데이터 셋을 QA 모델 (170)의 학습에 적용할지 여부를 결정하는 기능을 수행하는 기계학습 모델이다.On the other hand, according to the embodiment of the present invention, the first agent 122 is a machine learning model that performs a function of generating the latent variable 103 by processing the contexts included in the arbitrary paragraph 101 and expressing it in a vector form. to be. Furthermore, the second agent 160 is a machine learning model that performs a function of determining whether to apply the generated query and response data set to the learning of the QA model 170 .

즉, 제 1 에이전트는 QA 모델의 학습에 필요한 QA 페어를 생성하기 위하여 문단으로부터 잠재변수를 생성하며, 제 2 에이전트는 QA 모델의 학습에 생성된 QA 페어를 적용할지 여부를 결정하기 때문에 제 1 에이전트 (122)와 제 2 에이전트 (160)의 성능은 QA 모델 (170)의 성능과 직결될 것이다. That is, the first agent creates a latent variable from a paragraph in order to generate a QA pair necessary for training the QA model, and the second agent determines whether to apply the generated QA pair to the training of the QA model, so that the first agent The performance of 122 and the second agent 160 will be directly related to the performance of the QA model 170 .

따라서 본발명의 실시예를 따르는 질의응답 학습모델 (100)은 제 1 에이전트 모델 (122), 제 2 에이전트 모델 (160) 및 QA 모델 (170)의 성능을 유기적으로 연결하여 상승시키기 위하여 업데이트된 QA 모델의 성능을 제 1 에이전트 모델 (122) 및 제 2 에이전트 모델 (160)에 리워드로 부여하여 제 1 에이전트 모델 및 제 2 에이전트 모델을 강화 학습하는 특징이 있다. 이를 따르면 질의응답 학습 모델 (100)은 QA 모델 (170)의 성능을 효율적으로 높이는 학습 데이터 세트를 생성할 수 있는 효과가 있다. Therefore, the QA learning model 100 according to the embodiment of the present invention is updated to increase the performance of the first agent model 122, the second agent model 160, and the QA model 170 by organically connecting them. The performance of the model is given as a reward to the first agent model 122 and the second agent model 160 to reinforce-learning the first agent model and the second agent model. According to this, the question-and-answer learning model 100 has the effect of generating a training data set that efficiently increases the performance of the QA model 170 .

다시 도 1a을 참조하면, 상술한 강화학습 기반의 질의응답 학습모델 (100)은 다음의 수학식 1의 목적함수를 가질 수 있다.Referring back to FIG. 1A , the above-described reinforcement learning-based question-and-answer learning model 100 may have the objective function of Equation 1 below.

여기서, x, y, c, z는 각각 질의, 응답, 문단의 컨텍스트 (context) 및 잠재변수에 해당한다. 이를 이용하여 목적함수

를 최대화하는 파라미터

를 추정할 수 있다.Here, x, y, c, and z correspond to the context and latent variables of questions, responses, and paragraphs, respectively. Using this, the objective function

parameters that maximize

can be estimated.

수학식 1의 목적 함수는 QA 페어를 생성할 수 있는 모델을 학습하기 위한 것으로, 질의응답 학습 모델 (100)의 강화 학습과는 직접 관련이 없다. 강화학습은 제 1 에이전트 모델 (122), 제 2 에이전트 모델 (160)의 성능을 높이기 위한 것이다.The objective function of Equation 1 is for learning a model capable of generating a QA pair, and is not directly related to reinforcement learning of the question-and-answer learning model 100 . Reinforcement learning is to improve the performance of the first agent model 122 and the second agent model 160 .

도 1a의 제 1 에이전트 모델, 제 2 에이전트 모델은 다음의 수학식 2의 목적함수를 가질 수 있다. The first agent model and the second agent model of FIG. 1A may have the objective function of Equation 2 below.

여기서, F1 score는 QA 모델의 성능 또는 정확도를 의미하며, QA 모델이 임의의 질의에 대한 응답을 얼마나 높은 정확도로 추출할 수 있는지를 계산한 값에 해당한다. 수학식 2에 따르면, 제 1 에이전트와 제 2 에이전트의 학습에는 QA 모델의 성능을 리워드로 적용되므로, QA 모델의 성능이 높아지는 방향으로 학습될 것이다. 즉, 제 1 에이전트는 문단에서 QA 모델의 성능을 높일 수 있는 잠재변수를 추출하도록 학습될 것이며, 제 2 에이전트는 디코더에서 생성한 QA 페어에서 QA 모델의 성능을 높일 수 있는 데이터 셋만 남기고 나머지 노이즈는 필터링할 수 있도록 학습될 것이다. Here, the F1 score refers to the performance or accuracy of the QA model, and corresponds to a value calculated by how high accuracy the QA model can extract a response to an arbitrary query. According to Equation 2, since the performance of the QA model is applied as a reward to the learning of the first agent and the second agent, the performance of the QA model will be learned in a direction to increase. That is, the first agent will be trained to extract latent variables that can improve the performance of the QA model from the paragraph, and the second agent will leave only the data set that can improve the performance of the QA model from the QA pair generated by the decoder, and the rest of the noise is You will learn to filter.

도 3a는 도 1a의 질의응답 학습모델 (100)에서 문단으로부터 질의응답 데이터셋을 추출하고 QA 모델을 학습하는 방법을 설명하는 순서도이다.3A is a flowchart illustrating a method of extracting a QA dataset from a paragraph in the Q&A learning model 100 of FIG. 1A and learning the QA model.

단계 310에서 질의응답 학습 모델은 임의의 문단으로부터 잠재 변수를 샘플링할 수 있다. 상기 잠재 변수를 샘플링하기 위하여, 질의응답 학습 모델은 문단에 포함된 컨텍스트들을 처리하여 벡터 형태로 표현할 수 있다. In step 310, the Q&A learning model may sample latent variables from arbitrary paragraphs. In order to sample the latent variable, the Q&A learning model may process contexts included in a paragraph and express it in a vector form.

이후 질의응답 학습 모델은 문단에서 추출한 잠재 변수를 기초로 질의 및 응답의 데이터 셋, 즉 QA 페어를 생성할 수 있다. (단계 320) 생성된 QA 페어는 임의의 질의에 대한 응답을 생성하는 기능을 수행하는 QA 모델을 업데이트하기 위한 학습 데이터로 사용될 수 있다. Thereafter, the Q&A learning model can generate a QA pair, that is, a data set of Q&A, based on the latent variables extracted from the paragraph. (Step 320) The generated QA pair may be used as training data for updating a QA model that performs a function of generating a response to an arbitrary query.

한편, 질의응답 학습 모델은 생성된 질의 및 응답의 데이터, 즉 QA 페어를 QA 모델 의 학습에 적용할지 여부를 결정할 수 있다. (단계 330) Meanwhile, the question-and-answer learning model may determine whether to apply the generated query and response data, ie, a QA pair, to the learning of the QA model. (Step 330)

이후 QA 모델은 QA 페어를 학습하여 업데이트될 수 있다. (단계 340)Thereafter, the QA model may be updated by learning the QA pair. (Step 340)

이후 업데이트된 QA 모델의 성능을 측정하여 (단계 350), 측정된 성능을 단계 310 및 단 330에 리워드로 적용할 수 있다. (단계 360) 따라서 잠재 변수의 샘플링을 수행하는 제 1 에이전트 모델은 QA 모델의 성능이 올라가는 방향으로 업데이트될 것이며, QA 페어를 QA 모델에 적용할지 여부를 결정하는 제 2 에이전트 모델 역시 QA 모델의 성능이 올라가는 방향으로 업데이트될 것이다. Thereafter, the performance of the updated QA model may be measured (step 350), and the measured performance may be applied to steps 310 and 330 as a reward. (Step 360) Therefore, the first agent model that performs the sampling of the latent variable will be updated in the direction that the performance of the QA model increases, and the second agent model that determines whether to apply the QA pair to the QA model is also the performance of the QA model It will be updated in this upward direction.

한편, 도 3a에 별도로 도시된 것은 아니나, 단계 360은 단계 350에서 측정된 QA 모델의 성능이 QA 페어의 계속된 학습에도 불구하고 포화 (saturation) 되어 변경 범위가 미비하거나, QA 모델의 성능이 미리 설정된 범위 이상으로 올라간 경우 수행되지 않을 수 있다. 전자는 QA 페어의 학습이 QA 모델의 성능 향상에 더 이상 도움이 되지 않기 때문이며, 후자는 QA 모델의 성능이 충분하여 QA 페어의 학습이 필요가 없기 때문이다. On the other hand, although not separately shown in FIG. 3A , in step 360, the performance of the QA model measured in step 350 is saturated despite the continuous learning of the QA pair, so that the change range is insufficient, or the performance of the QA model is in advance It may not be performed if it rises above the set range. The former is because learning of the QA pair is no longer conducive to improving the performance of the QA model, and the latter is because the performance of the QA model is sufficient and there is no need to learn the QA pair.

한편, 도 1b는 본 발명의 추가적인 실시예에 따른 적대적 학습을 통한 질의응답 학습 모델을 개략적으로 설명하기 위한 개념도이다. Meanwhile, FIG. 1B is a conceptual diagram schematically illustrating a question-and-answer learning model through adversarial learning according to an additional embodiment of the present invention.

도 1b을 참조하면, 일 실시 예에 따른 적대적 학습을 이용하는 질의응답 학습모델은, 인코더(120), 응답디코더(140), 질의디코더(150) 및 질의응답 학습모델(170)을 포함할 수 있다.Referring to FIG. 1B , a question-and-answer learning model using adversarial learning according to an embodiment may include an encoder 120 , a response decoder 140 , a query decoder 150 , and a question-and-answer learning model 170 . .

우선, 질의응답 학습모델은 문단(101)이 입력되면, 전처리(121) 단계의 제약조건(122)을 참고하여 인코더(120)에서 잠재변수(103)를 생성한다. First, in the Q&A learning model, when the paragraph 101 is input, the encoder 120 generates the latent variable 103 with reference to the constraint 122 of the preprocessing 121 step.

생성된 잠재변수(103)가 응답디코더(140)에 입력되면, 응답디코더(140)는 잠재변수(103)를 기반으로 응답을 생성한다. 다음으로 생성된 응답이 질의디코더(150)로 입력되면, 질의디코더(150)는 서로 쌍이 되는 질의를 생성한다.When the generated latent variable 103 is input to the response decoder 140 , the response decoder 140 generates a response based on the latent variable 103 . Next, when the generated response is input to the query decoder 150 , the query decoder 150 generates a pair of queries.

그리고, 질의응답 학습모델(170)은 응답디코더(140) 및 질의디코더(150) 생성된 질의와 응답쌍을 입력하여 질의응답 학습모델을 학습시킨다.In addition, the question-and-answer learning model 170 learns the question-and-answer learning model by inputting the query and response pairs generated by the response decoder 140 and the query decoder 150 .

그런데, 일 실시 예에 따라 응답디코더(140)및 질의디코더(150)에서 생성되는 질의응답 데이터셋 생성모델을 통하여 자동으로 생성된 것으로, 대부분은 이미 질의응답 학습모델이 맞출 수 있는 것이기에 크게 도움이 되지 않을 수 있어 기계학습 성능을 향상시키기 위한 방법이 필요하다. However, according to an embodiment, it is automatically generated through the question-and-answer dataset generation model generated by the response decoder 140 and the question-and-answer decoder 150, and most of it is of great help because the question-and-answer learning model can already match it. It may not be possible, so a method to improve machine learning performance is needed.

따라서, 적대적 학습 (adversarial learning)을 통하여 현재의 질의응답 학습모델이 맞추지 못하는 질의응답 데이터셋을 생성하는 방법을 개시한다.Accordingly, a method of generating a question-and-answer dataset that the current question-and-answer learning model cannot fit through adversarial learning is disclosed.

보다 구체적으로 도 2는 일 실시 예에 따른 적대적 학습을 통해 질의 응답 모델의 성능이 향상되는 방법을 설명하기 위한 도이다.More specifically, FIG. 2 is a diagram for explaining a method for improving the performance of a question-and-answer model through adversarial learning according to an exemplary embodiment.

도 2를 참조하면, 적대적 학습을 수행하기 위해서는 현재 질의응답 모델의 결정경계 밖에 있으면서 데이터 다양체(data manifold)에 있는 데이터를 샘플링 해야 한다. 구체적으로, 일 실시 예에 따른 문단입력에 기초한 질의응답 데이터셋 생성방법은 인코더를 통해 잠재변수를 선택하게 되는데, 샘플링한 잠재변수로부터 생성된 질의응답 데이터 쌍이 질의응답 학습모델을 통해 정답을 맞추지 못하는 것이어야 한다. 그렇다면, 질의응답 학습모델은 정답을 맞추지 못한 질의응답 쌍을 학습하여 결정 경계를 교정하게 되고, 이러한 과정을 반복하게 되면 질의응답학습 모델의 결정경계가 실제 결정 경계(True decision boundary)에 가까워 지며, 성능 또한 향상 될 수 있다.Referring to FIG. 2 , in order to perform adversarial learning, it is necessary to sample data in a data manifold while outside the decision boundary of the current question-and-answer model. Specifically, in the method for generating a question-and-answer dataset based on a paragraph input according to an embodiment, a latent variable is selected through an encoder. it should be If so, the Q&A learning model corrects the decision boundary by learning the question-and-answer pair that did not give the correct answer. If this process is repeated, the decision boundary of the Q&A learning model approaches the true decision boundary Performance can also be improved.

한편, 이러한 적대적 학습을 위한 질의응답 데이터셋은 질의응답 학습모델의 로스(loss)와 데이터 다양체(manifold)에 위치하게 하기 위한 제약조건(constraints)을 조절하여 결정할 수 있다.On the other hand, the Q&A dataset for such adversarial learning can be determined by adjusting the loss of the Q&A learning model and the constraints to be located in the data manifold.

다시 도 1b을 참조하면, 상술한 적대적 방식의 질의응답 학습모델은 다음의 수학식 3의 목적함수를 가질 수 있다.Referring back to FIG. 1B , the above-described adversarial question-and-answer learning model may have the objective function of Equation 3 below.

여기서, x, y, c, z는 각각 질의, 응답, 단락(passage) 및 잠재변수에 해당한다. 위 수학식 3을 도 1b에 적용하면

는 질의응답 학습모델의 전처리(121),

는 인코더(120),

는 각각 응답디코더(140), 질의디코더(150)에 이용되는 함수이다. 또한 질의응답학습모델(170)로는 Conditional Variational AutoEncoder (CVAE)가 이용될 수 있다.Here, x, y, c, and z correspond to a question, a response, a passage, and a latent variable, respectively. If Equation 3 above is applied to FIG. 1B,

is the preprocessing 121 of the question-and-answer learning model,

is the encoder 120,

is a function used in the response decoder 140 and the query decoder 150, respectively. In addition, a Conditional Variational AutoEncoder (CVAE) may be used as the question-and-answer learning model 170 .

인코더(120)는 잠재변수

를 샘플링 하는데, 이로부터 생성된 질문과 정답이 질의응답모델의 로스(

)를 높여야 한다. 그러나 이렇게 생성된 질문과 정답 사이에는 관련성이 전혀 없을 수 있기 때문에

를 선택하는 데 있어서 제약조건이 필요하다. 즉, 이전의 학습된

의 분포에서 멀어질수록 유효하지 않은 질문 정답 쌍이 생성되기 때문에, 이를 막기 위해, 두 분포 사이의 거리가 멀어지면 패널티(penalty)를 부여하는 방식으로 값을 조정할 수 있다. (예컨대: KL divergence를 이용).Encoder 120 is a latent variable

is sampled, and the questions and answers generated from it are the loss (

) should be increased. However, since there may not be any relationship between the generated question and the correct answer,

Constraints are required to select That is, previously learned

Since an invalid question answer pair is generated as the distance increases from the distribution of , to prevent this, the value can be adjusted in a manner that imposes a penalty when the distance between the two distributions increases. (eg: using KL divergence).

따라서, 일 실시 예에 따른 질의응답 학습모델은 로스(loss,123)와 제약조건(constraints,122)을 조절하여 유효하면서도 상기 질의응답 학습모델이 정답을 맞추지 못하는 잠재변수를 생성하도록 한다.Accordingly, the question-and-answer learning model according to an embodiment generates a latent variable that is effective but the question-and-answer learning model does not answer the correct answer by adjusting the loss 123 and constraints 122 .

그리고, 생성된 잠재변수를 이용하여 생성된 적대적 질의와 응답의 데이터 셋을 이용하여 적대적(adversarial) 기계학습을 수행함으로써, 질의응답 학습모델의 성능을 향상 시킬 수 있다.And, by performing adversarial machine learning using the adversarial query and response data set generated using the generated latent variable, the performance of the question-and-answer learning model can be improved.

도 3b을 참조하면, 적대적 학습을 통한 질의응답 학습모델 생성 방법은, 단계 S300에서, 입력되는 문단에서 제약조건을 기초로, 잠재변수(Latent variable)를 샘플링을 수행한다. 여기서 제약조건은, 상기 잠재변수가 질의응답 학습모델의 로스(loss)는 높이면서도, 데이터 다양체(manifold) 내에 존재하도록 제어 될 수 있다. 예를 들어, 생성된 질문과 정답을 질의응답모델의 적대적으로 학습하기 위해서는 로스(

)를 높여야 한다. 그러나 이렇게 생성된 질문과 정답 사이에는 관련성이 전혀 없을 수 있기 때문에 잠재변수를 선택하는 데 있어서 제약조건이 필요하다. 즉, 이전의 학습된 잠재변수의 분포에서 멀어질수록 유효하지 않은 질문 정답 쌍이 생성되기 때문에, 이를 막기 위해, 두 분포 사이의 거리가 멀어지면 패널티(penalty)를 부여하는 방식으로 값을 조정할 수 있다.Referring to FIG. 3B , in the method of generating a question-and-answer learning model through adversarial learning, in step S300, a latent variable is sampled based on a constraint in an input paragraph. Here, the constraint may be controlled so that the latent variable exists in the data manifold while increasing the loss of the Q&A learning model. For example, to learn the generated question and the correct answer adversarially of the question-and-answer model, the loss (

) should be increased. However, since there may be no relationship between the generated question and the correct answer, constraints are required in selecting the latent variable. In other words, as the distance from the distribution of the previously learned latent variable increases, an invalid question answer pair is generated. .

다음으로, 단계 S310에서, 잠재변수를 기초로 응답을 생성하고, 단계 S320에서 응답을 기초로 질의를 생성한다. 단계 S310 및 단계 S320을 반복하여 생성된 질의와 응답의 데이터 셋을 이용하여 단계 S330에서 질의응답 학습모델을 기계 학습하는 동작을 수행한다.Next, in step S310, a response is generated based on the latent variable, and a query is generated based on the response in step S320. In step S330, machine learning of the question-and-answer learning model is performed using the query and response data set generated by repeating steps S310 and S320.

한편, 이하에서는 도 4 내지 도 7을 참조하여 일 실시 예에 따른 질의응답 데이터셋을 생성하는 방법을 보다 상세히 설명한다.Meanwhile, a method of generating a Q&A dataset according to an embodiment will be described in more detail below with reference to FIGS. 4 to 7 .

도 4는 일 실시 예에 따른 문단입력에 기초한 질의응답 데이터셋 생성모델을 개략적으로 설명하기 위한 개념도이다.4 is a conceptual diagram schematically illustrating a question-and-answer dataset generation model based on a paragraph input according to an embodiment.

도 4을 참조하면, 일 실시 예에 따른 질의응답 데이터셋 생성모델은, 메모리(110), 인코더(120), 어텐션(130), 응답디코더(140) 및 질의디코더(150)를 포함할 수 있다.Referring to FIG. 4 , the question and answer dataset generation model according to an embodiment may include a memory 110 , an encoder 120 , an attention 130 , a response decoder 140 , and a query decoder 150 . .

일 실시 예에 따른 질의응답 데이터셋 생성방법에 따라 여러 문장이 포함된 문단(passage, 101)이 그대로 인코더(120)로 입력된다. 여기서, 문단(101)은 구절, 대목, 단락 등을 포함할 수 있다.According to the question-and-answer dataset generating method according to an embodiment, a passage 101 including several sentences is directly input to the encoder 120 . Here, the paragraph 101 may include a phrase, a passage, a paragraph, and the like.

인코더(120)는 입력된 문단(101)에서 각 단어에 대한 은닉표현(Hidden representation)을 생성하여 메모리(110)에 저장한다. 여기서, 인코더(120)는 입력으로부터 신경망 네트워크를 통해 은닉표현을 구성할 수 있다. 따라서, 은닉표현은 신경망 네트워크의 은닉레이어(Hidden Layer)에서 학습된, 기계가 판독가능한 데이터 표현을 의미할 수 있으며, 키(Key)와 값(value)이라는 두 개의 쌍으로 구성될 수 있다. 예를 들면, 인코더(120)는 문장의 구성요소(예: 단어)가 나타내는 의미에 관한 정보를 은닉표현으로 저장할 수 있다. The encoder 120 generates a hidden representation for each word in the input paragraph 101 and stores it in the memory 110 . Here, the encoder 120 may construct a hidden expression from the input through a neural network. Therefore, the hidden representation may mean a machine-readable data representation learned from the hidden layer of the neural network, and may consist of two pairs of a key and a value. For example, the encoder 120 may store information about the meaning indicated by a component (eg, a word) of a sentence as a hidden expression.

다음으로, 인코더(120)는 은닉표현을 기반으로 샘플링(102)을 수행하여, 잠재변수(Latent variable, 103)를 생성하고, 생성된 잠재변수(103)는 응답디코더(140)에 입력한다.Next, the encoder 120 performs sampling 102 based on the hidden expression to generate a latent variable 103 , and the generated latent variable 103 is input to the response decoder 140 .

응답디코더(140)는 잠재변수(103)를 쿼리(query)로 하여 어텐션(130)을 통해 메모리(110)의 각 구성요소(component)의 중요도를 계산하여 메모리(110)의 가중합 벡터를 생성한다. 예들 들면, 어텐션(130)을 통해 응답디코더(140)에서 출력 단어를 예측하는 매 시점(time step)마다, 인코더(120)에서의 전체 입력 문장을 다시 한 번 참고할 수 있다. 단, 전체 입력 문장을 전부 다 동일한 비율로 참고하는 것이 아니라, 해당 시점에서 예측해야 할 단어와 연관이 있는 입력 단어 부분을 좀 더 집중(attention)해서 참고하게 된다. 따라서, 어텐션 함수는 주어진 '쿼리(Query)'에 대해서 모든 은닉표현으로부터 획득한 '키(Key)'와의 유사도를 각각 구한다. 그리고 구해낸 이 유사도를 키와 맵핑되어있는 각각의 '값(Value)'에 반영해준다. 그리고 유사도가 반영된 '값(Value)'을 모두 더해서 가중합 벡터를 구할 수 있다.The response decoder 140 generates a weighted sum vector of the memory 110 by calculating the importance of each component of the memory 110 through the attention 130 using the latent variable 103 as a query. do. For example, at every time step of predicting an output word from the response decoder 140 through the attention 130 , the entire input sentence from the encoder 120 may be referred to once again. However, not all input sentences are referenced at the same rate, but rather the part of the input word related to the word to be predicted at that point in time is referred to with more attention. Therefore, the attention function calculates the degree of similarity with the 'Key' obtained from all hidden expressions for a given 'Query'. And the obtained similarity is reflected in each 'Value' mapped with the key. Then, the weighted sum vector can be obtained by adding all the 'values' reflecting the similarity.

응답디코더(140)는 가중합 벡터를 기반으로 응답스팬(104, answer span)을 예측한다. 여기서 응답 스팬은 입력문단에서 응답이 위치하는 시작점과 끝점을 의미할 수 있다.The response decoder 140 predicts an answer span 104 based on the weighted sum vector. Here, the response span may mean the starting point and the ending point where the response is located in the input paragraph.

다음으로, 질의디코더(150)는 앞서 생성된 응답스팬(104)을 입력 받고, 메모리(110)의 어텐션(130) 계산을 수행하여 최종적으로 질의(105)을 생성할 수 있다.Next, the query decoder 150 may receive the previously generated response span 104 , calculate the attention 130 of the memory 110 , and finally generate the query 105 .

일 실시 예에 따라 생성된 질의와 응답의 쌍으로 이루어진 질의응답 데이터셋은 메모리(110)에 저장될 수 있으며, 저장된 질의응답 데이터 셋은 질의응답 학습모델을 기계 학습하는데 이용될 수 있다.According to an embodiment, the Q&A dataset including the generated pair of Q&A may be stored in the memory 110, and the stored Q&A data set may be used for machine learning the Q&A learning model.

따라서, 일 실시 예에 따른 데이터셋 생성모델은, 입력된 문단에서 질의와 응답의 데이터셋을 자동으로 생성하기 때문에, 하나의 문단 입력 만으로도 여러 쌍의 질의 응답의 쌍을 샘플링 할 수 있어, 이전의 수동 입력 모델보다 더욱 다양한 데이터를 구축할 수 있다.Therefore, since the dataset generation model according to an embodiment automatically creates a dataset of questions and responses from an inputted paragraph, it is possible to sample several pairs of question-and-answer pairs with only one paragraph input. It can build more diverse data than manual input model.

보다 구체적으로, 상술한 질의응답 데이터셋 생성 방법은 다음의 수학식 4에 나타난 목적함수(Objective Function)를 통해 구현될 수 있다.More specifically, the above-described method for generating a question-and-answer dataset may be implemented through the objective function shown in Equation 4 below.

여기서, x,y,c,z는 각각 질의, 응답, 단락(passage) 및 잠재변수에 해당한다.

는 질의응답 학습모델의 인코더,

는 각각 응답디코더, 질의디코더에 이용되는 함수이다.

는 잠재변수의 사전 분포(marginal prior distribution)이며,

는 잠재변수의 사후 분포(marginal posterior distribution)이다. 예를 들면,

는 입력으로부터 생성한 잠재변수의 분포를 생성하고,

는 잠재변수를 이용하여 입력에 가까운 출력을 생성할 수 있다. 한편, 최적화를 위한(상술한 두 분포의 차이를 줄이기 위한) KL(Kullback-Leibler) divergence를 구하는 것은 매우 어려우므로 다음의 수학식 5과 같이 근사할 수도 있다.Here, x, y, c, and z correspond to a question, a response, a passage, and a latent variable, respectively.

is the encoder of the question-and-answer learning model,

is a function used in response decoder and query decoder, respectively.

is the marginal prior distribution of the latent variable,

is the marginal posterior distribution of the latent variable. For example,

creates a distribution of latent variables generated from the input,

can generate an output close to the input using the latent variable. Meanwhile, since it is very difficult to obtain a Kullback-Leibler (KL) divergence for optimization (for reducing the difference between the two distributions described above), it may be approximated as in Equation 5 below.

한편, 문단입력은 어떻게 이루어 지는지 나아가 일 실시예에 따른 질의응답 데이터셋 구축모델이 어떻게 활용될 수 있는지 도 5 내지 도 7을 참조하여 설명한다.Meanwhile, how the paragraph input is made and how the question and answer dataset construction model according to an embodiment can be utilized will be described with reference to FIGS. 5 to 7 .

도 5는 일 실시 예에 따른 단락에서 질의응답 데이터셋을 생성하는 방법의 일 예를 설명하는 도이다.5 is a diagram for explaining an example of a method of generating a question and answer dataset in a paragraph according to an embodiment.

도 5를 참조하면, 일 실시 예에 따른 단락(301)이 입력되면 샘플링(302)을 통해 응답(303)을 먼저 생성한다. 그리고 생성된 응답(303)을 이용하여 응답과 쌍을 이루는 질의(304)을 생성할 수 있다.Referring to FIG. 5 , when a short 301 is input according to an embodiment, a response 303 is first generated through sampling 302 . Then, a query 304 paired with the response may be generated using the generated response 303 .

예를 들어, 입력 문단에 "이순신 1545년 4월 28일 ~ 1598년 12월 16일 (음력 11월 19일)은 조선 중기의 무신이다." 라는 문장이 있는 경우 샘플링을 통해 "1545년 4월 28일"이라는 응답을 먼저 생성할 수 있다. 그리고, 일 실시 예에 따른 질의응답 데이터셋을 생성하는 방법에 따라 생성된 응답을 기초로 하여 "충무공 이순신의 탄신일은" 또는 "이순신은 언제 태어났어?"와 같은 다양한 형태의 질의를 생성할 수 있다. 따라서, 질의응답 생성모델을 이용하면, 질의응답 모델을 기계학습 시키기 위한 다양한 데이터 셋을 구축할 수 있게 된다.For example, in the input paragraph, "April 28, 1545 ~ December 16, 1598 (November 19 in the lunar calendar) of Yi Sun-sin is a warrior in the mid-Joseon period." , you can first generate the response "April 28, 1545" by sampling. And, based on the response generated according to the method of generating a question-and-answer dataset according to an embodiment, various types of queries such as “the birthday of Chungmugong Yi Sun-sin” or “When was Yi Sun-sin born?” can be created. . Therefore, by using the Q&A generation model, it is possible to construct various data sets for machine learning the Q&A model.

도 6은 일 실시 예에 따라 정형화된 인포박스에서 질의응답 데이터셋을 생성하는 방법을 설명하기 위한 도이다.6 is a diagram for explaining a method of generating a question and answer data set in a standardized infobox according to an embodiment.

도 6을 참조하면, 일 실시 예에 따른 질의응답 데이터셋 생성모델은 온라인 지식백과 사이트로 유명한 사이트에서 크롤링을 통해 인포박스 데이터(400)를 획득할 수 있다. Referring to FIG. 6 , the question-and-answer dataset generation model according to an embodiment may acquire infobox data 400 through crawling at a site famous for an online knowledge encyclopedia.

특히, 인포박스 데이터(400)는 정형화되어 있기 때문에 질의와 응답의 말뭉치 형태로 쉽게 구조화 할 수 있을 뿐 아니라, 이러한 지식백과 사이트들은 분야나 대상에 대한 상세한 분류 데이터를 제공하기 때문에, 원하는 분야를 타겟 하여 질의 응답을 생성하고, 기계학습 하도록 전처리 단계에서 입력 문단을 생성할 수 있다.In particular, since the infobox data 400 is standardized, it can be easily structured in the form of a corpus of questions and answers, and since these knowledge encyclopedia sites provide detailed classification data for a field or object, a desired field is targeted to generate a question and answer, and an input paragraph can be created in the preprocessing step for machine learning.

그러나, 모든 웹사이트가 도 6에 도시된 것과 같이 정형화된 인포박스 데이터(400)를 제공하는 것은 아니기 때문에, 일반적으로는 크롤링된 데이터는 비정형화 되어 있다. 따라서 이러한 경우 입력될 문단을 어떻게 생성하는지 살펴본다.However, since not all websites provide the standardized infobox data 400 as shown in FIG. 6 , in general, crawled data is unstructured. Therefore, in this case, let's look at how to create a paragraph to be input.

도 7은 일 실시 예에 따라 비정형화된 페이지에서 질의응답 데이터셋을 생성하는 방법을 설명하기 위한 도이다.7 is a diagram for explaining a method of generating a question-and-answer data set from an unstructured page according to an embodiment.

도 7을 참조하면 단계 S500에서 질의응답 데이터셋 구축모델은 크롤링된 데이터에서 웹페이지의 제목 및 본문을 인식한다. 그리고, 단계 S510에서 본문에서 컬럼인식 기반으로 테이블을 인식한다. 즉, 정형화된 사이트에서 획득 가능한 인포박스와 유사한 형태의 데이터를 획득하기 위해 테이블을 인식한다.Referring to FIG. 7 , in step S500, the question-and-answer dataset construction model recognizes the title and body of a web page from crawled data. Then, in step S510, the table is recognized based on column recognition in the body. That is, the table is recognized to obtain data in a form similar to that of an infobox that can be obtained from a standardized site.

다음으로 S520단계에서 미리설정된 데이터 빈도를 만족하는 단어가 있는지 확인하여 단락을 정형화 한다. 예컨대, "국가명", "수도"와 같이 반복 등장하는 단어를 확인하고, 이를 기초로 단락을 정형화 할 수 있다.Next, in step S520, it is checked whether there is a word satisfying a preset data frequency, and the paragraph is formalized. For example, words that appear repeatedly, such as "country name" and "capital", can be identified, and a paragraph can be standardized based on this.

단계 S530에서 일실시예에 따른 질의응답 데이터셋 생성 모델은 정형화된 단락을 입력 받아 질의 및 대답 데이터 셋을 생성함으로써, 기존의 웹사이트에서 정형화된 형태로 제공하는 정보박스 또는 테이블 형태로 제공하는 데이터 집합에서 다양한 질의와 응답의 데이터셋을 구축할 수 있다.In step S530, the question-and-answer dataset generation model according to an embodiment receives a standardized paragraph and generates a question-and-answer data set, thereby providing data provided in the form of an information box or table provided in a standardized form from an existing website. A dataset of various queries and responses can be built from the set.

다른 실시 예에 따른 상술한 질의응답 데이터셋 생성모델은, 단순히 질의응답 데이터셋을 생성하는 것을 너머 학습 또는 기계 번역이 완료된 데이터에서 질의와 대답 데이터 셋을 추출하여, 질의에 대응하는 대답의 정확도를 평가하고, 평가 결과를 통해 학습 또는 번역의 품질을 검증하는데 활용 할 수도 있다.The above-described question-and-answer dataset generation model according to another embodiment does not simply generate a question-and-answer dataset, but extracts a question and answer dataset from data that has been trained or machine translated, and increases the accuracy of an answer corresponding to a query. It can be evaluated and used to verify the quality of learning or translation through the evaluation results.

예를 들면, 기계번역이 완료된 데이터에서 일 실시 예에 따른 질의응답 데이터셋 생성모델이 정확한 질의응답 데이터셋을 생성할 수 있다면, 기계독해가 잘 되고 있는 것으로 평가할 수 있다.For example, if the question-and-answer dataset generation model according to an embodiment can generate an accurate question-and-answer dataset from machine translation-completed data, it can be evaluated that machine reading is going well.

도 8은 일 실시 예에 따른 질의응답 학습모델 생성장치의 내부 구성을 개략적으로 나타낸 블록 도이다.8 is a block diagram schematically illustrating an internal configuration of an apparatus for generating a question-and-answer learning model according to an embodiment.

도 8을 참조하면 일 실시 예에 따른 질의응답 학습모델 생성장치 (100)는, 하나 이상의 메모리(110)와 프로세서(190)를 포함할 수 있다. 질의 응답 데이터셋 생성 장치(100)의 동작은 메모리(110)에 저장된 프로그램을 프로세서(190)를 통해서 실행시킴으로 수행될 수 있다. Referring to FIG. 8 , the apparatus 100 for generating a question-and-answer learning model according to an embodiment may include one or more memories 110 and a processor 190 . The operation of the Q&A dataset generating apparatus 100 may be performed by executing a program stored in the memory 110 through the processor 190 .

일 실시 예에 따른 프로세서(190)에서 수행되는 연산은, 입력되는 문단에서 각 단어에 대한 은닉표현(Hidden representation)을 생성하여 메모리(110)에 저장하는 연산을 수행한다. In the operation performed by the processor 190 according to an embodiment, a hidden representation for each word in an input paragraph is generated and stored in the memory 110 .

또한, 프로세서(190)는, 은닉표현을 기반으로 잠재변수(Latent variable)를 샘플링하고, 샘플링된 잠재변수를 쿼리로 하는 어텐션(Attention)을 통해 상기 메모리에 저장된 은닉표현의 중요도를 계산하여 가중합 벡터를 생성한다.In addition, the processor 190 samples a latent variable based on the hidden expression, and calculates and weights the importance of the hidden expression stored in the memory through attention using the sampled latent variable as a query. create a vector

또한, 프로세서(190)는, 잠재변수와 상기 가중합 벡터를 기초로 응답 스팬(span)을 생성하고, 생성된 응답 스팬과 메모리의 은닉표현에 대한 어텐션을 기초로 질의를 생성하는 연산을 수행할 수 있다.In addition, the processor 190 generates a response span based on the latent variable and the weighted sum vector, and generates a query based on the generated response span and attention to the hidden expression of the memory. can

또한, 프로세서(190)는, 생성된 질의와 응답의 데이터 셋을 메모리(110)에 저장하는 연산 및 저장된 질의와 응답의 데이터 셋을 이용하여 질의응답 학습모델을 기계 학습하는 연산을 더 수행할 수 있다.In addition, the processor 190 may further perform an operation of storing the generated query and response data set in the memory 110 and an operation of machine learning the question and answer learning model using the stored query and response data set. have.

또한, 프로세서(19)는, 질의응답 학습모델의 로스(loss)와 제약조건(constraints)을 조절하여 유효하면서도 상기 질의응답 학습모델이 정답을 맞추지 못하는 잠재변수를 생성하는 연산 및 생성된 잠재변수를 이용하여 생성된 질의응답 데이터셋을 이용하여 적대적(adversarial) 기계학습을 수행하는 연산을 더 수행할 수 있다.In addition, the processor 19 controls the loss and constraints of the question-and-answer learning model to generate a latent variable that is effective but the question-and-answer learning model does not answer the correct answer, and the generated latent variable. An operation for performing adversarial machine learning may be further performed using the generated Q&A dataset.

한편, 상술한 데이터셋 생성 장치(100)는 하나 이상의 프로세서(190) 및/또는 하나 이상의 메모리(110)를 포함할 수 있다. 또한, 프로세서(190)는 인코더(120), 응답디코더(140) 및 질의 디코더(150)를 포함할 수 있으며, 각 인코더(120), 응답디코더(150) 및 질의디코더(150)에 관하여 도 1내지 도 7에서 상술한 동작을 수행할 수도 있다.Meanwhile, the above-described data set generating apparatus 100 may include one or more processors 190 and/or one or more memories 110 . In addition, the processor 190 may include an encoder 120 , a response decoder 140 , and a query decoder 150 , each of the encoder 120 , the response decoder 150 and the query decoder 150 in FIG. 1 . The operation described above with reference to FIG. 7 may be performed.

또한, 메모리(110)는, 휘발성 및/또는 비휘발성 메모리를 포함할 수 있다. 하나 이상의 메모리(110)는, 하나 이상의 프로세서(190)에 의한 실행 시, 하나 이상의 프로세서(190)가 연산을 수행하도록 하는 명령들을 저장할 수 있다. 본 개시에서, 프로그램 내지 명령은 메모리(110)에 저장되는 소프트웨어로서, 서버(100)의 리소스를 제어하기 위한 운영체제, 어플리케이션 및/또는 어플리케이션이 장치의 리소스들을 활용할 수 있도록 다양한 기능을 어플리케이션에 제공하는 미들 웨어 등을 포함할 수 있다.Also, the memory 110 may include volatile and/or non-volatile memory. The one or more memories 110 may store instructions that, when executed by the one or more processors 190 , cause the one or more processors 190 to perform an operation. In the present disclosure, programs or commands are software stored in the memory 110 , and provide various functions to applications so that an operating system, an application, and/or an application for controlling the resources of the server 100 can utilize the resources of the device. middleware and the like.

하나 이상의 프로세서(190)는, 소프트웨어(예: 프로그램, 명령)를 구동하여 프로세서(190)에 연결된 장치(100)의 적어도 하나의 구성요소를 제어할 수 있다. 또한 프로세서(190)는 본 개시와 관련된 다양한 연산, 처리, 데이터 생성, 가공 등의 동작을 수행할 수 있다. 또한 프로세서(190)는 데이터 등을 메모리(110)로부터 로드 하거나, 메모리(110)에 저장할 수 있다The one or more processors 190 may control at least one component of the device 100 connected to the processor 190 by driving software (eg, a program or a command). In addition, the processor 190 may perform various operations, processing, data generation, processing, etc. related to the present disclosure. In addition, the processor 190 may load data or the like from the memory 110 or store it in the memory 110 .

일 실시예에서는, 장치(100)의 구성요소들 중 적어도 하나가 생략되거나, 다른 구성요소가 추가될 수 있다. 또한 추가적으로(additionally) 또는 대체적으로(alternatively), 일부의 구성요소들이 통합되어 구현되거나, 단수 또는 복수의 개체로 구현될 수 있다. In an embodiment, at least one of the components of the device 100 may be omitted or another component may be added. Also, additionally or alternatively, some components may be integrated and implemented, or may be implemented as a singular or a plurality of entities.

한편, 본 발명의 실시 예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의 형태의 컴퓨터 추출가능 기록매체에 상주할 수도 있다.On the other hand, the steps of the method or algorithm described in relation to the embodiment of the present invention may be implemented directly as hardware, as a software module executed by hardware, or by a combination thereof. A software module may contain random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, hard disk, removable disk, CD-ROM, or It may reside in any type of computer-extractable recording medium well known in the art to which the present invention pertains.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시 예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다. Above, although embodiments of the present invention have been described with reference to the accompanying drawings, those of ordinary skill in the art to which the present invention pertains know that the present invention may be embodied in other specific forms without changing the technical spirit or essential features thereof. you will be able to understand Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

100 : 질의응답 학습모델 생성장치
110 : 메모리
120 : 인코더
130 : 어텐션
140 : 응답디코더
150 : 질의디코더100: Q&A learning model generating device
110: memory
120: encoder
130: attention
140: response decoder
150: query decoder

Claims

A method of operating a question-and-answer model through reinforcement learning in an apparatus for generating a response to a question, the method comprising:
at the first agent, sampling a latent variable in an arbitrary paragraph;
extracting a data set of questions and answers from the paragraphs based on the latent variables;
determining, in the second agent, whether to apply the extracted data set of questions and responses to training of a question-and-answer model for generating a response to an arbitrary query; and
Applying the performance change value of the question and answer model to the first agent and the second agent as a reward,
The sampling of the latent variable may include: generating a hidden representation for each word in the paragraph and storing it in a memory; and sampling the latent variable based on the hidden expression,
The hidden expression includes information indicating the meaning of the word and includes a key used to calculate the importance of the hidden expression and a value associated with the key,
and the second agent is configured to filter at least a portion of the extracted query and response data set to improve performance of the question and answer model based on the reward.

According to claim 1,
The first agent is
Based on the reward, the method characterized in that it is a machine learning model that forms a latent variable space and learns to generate the latent variable from the context included in the paragraph.

The method of claim 2, wherein the extracting of the data set comprises:
generating a weighted sum vector by calculating the importance of the hidden expression stored in the memory through attention using the sampled latent variable as a query; and
Further comprising the step of generating a response span (span) based on the latent variable and the weighted sum vector,
The step of generating the weighted sum vector comprises:
obtaining the key from the hidden expression;
calculating a similarity between the query and the key;
assigning the calculated similarity to the value associated with the key; and
calculating the weighted sum vector based on the value included in the hidden expression.

A method of operating a question-and-answer model through adversarial learning in an apparatus for generating a response to a question, the method comprising:
sampling a latent variable based on a constraint in an arbitrary paragraph;
generating a response based on the latent variable;
generating a query based on the response; and
and machine learning the question and answer learning model using the generated question and answer data set,
The step of sampling the latent variable is
generating a hidden representation for each word in the paragraph and storing it in a memory; and
Further comprising the step of sampling the latent variable based on the hidden expression,
The hidden expression includes information indicating the meaning of the word and includes a key used to calculate the importance of the hidden expression and a value associated with the key,
The constraint is, by giving a penalty when the latent variable is distributed at a location more than a predetermined distance from the distribution of the previously learned latent variable, the latent variable is the loss of the question-and-answer learning model is the height while being controlled to exist within a data manifold.

5. The method of claim 4,
The step of generating the response comprises:
generating a weighted sum vector by calculating the importance of the hidden expression stored in the memory through attention using the sampled latent variable as a query; and
Further comprising the step of generating a response span (span) based on the latent variable and the weighted sum vector,
The step of generating the weighted sum vector comprises:
obtaining the key from the hidden expression;
calculating a similarity between the query and the key;
assigning the calculated similarity to the value associated with the key; and
calculating the weighted sum vector based on the value included in the hidden expression.

5. The method of claim 1 or 4,
The method of claim 1, wherein the paragraph is a structure of the infobox data extracted from the crawled data on the website.

7. The method of claim 6,
When the infobox data is not extracted from the crawled data, a standardized paragraph based on the data frequency in the table recognized through column recognition is input as the paragraph.

8. The method of claim 7,
The above paragraph is data that has been machine translated,
The method further comprising the step of evaluating the quality of the machine translation by evaluating the accuracy of the answer based on the generated query and response data set

one or more processors; and
Comprising one or more memories storing instructions that, when executed by the one or more processors, cause the one or more processors to perform an operation,
The one or more processors,
Based on a constraint in an arbitrary paragraph, a latent variable is sampled, a response is generated based on the latent variable, a query is generated based on the response, and a data set of the generated query and response It is configured to machine learning the question-and-answer learning model using
The one or more processors,
A hidden representation for each word in the paragraph is generated and stored in a memory, and the hidden representation-the hidden representation includes information indicating the meaning of the word and is used to calculate the importance of the hidden representation and sample the latent variable based on - including a key and a value associated with the key;
The constraint is, by giving a penalty when the latent variable is distributed at a location more than a predetermined distance from the distribution of the previously learned latent variable, the latent variable is the loss of the question-and-answer learning model is the height while being controlled to exist within a data manifold,
A device for generating a question-and-answer learning model.

Combined with a computer that is hardware, and stored in a medium to execute the method of any one of claims 1 to 5, a question-and-answer learning model generation program through adversarial learning.

delete