KR102486440B1

KR102486440B1 - Method and apparatus for training unsupervised question generation model

Info

Publication number: KR102486440B1
Application number: KR1020200148808A
Authority: KR
Inventors: 맹성현; 강준모; 푸에르토 산 로만 아리츠; 홍기원
Original assignee: 한국과학기술원
Priority date: 2020-11-09
Filing date: 2020-11-09
Publication date: 2023-01-09
Also published as: WO2022097909A1; KR20220062986A

Abstract

학습 장치의 동작 방법으로서, 문서와 정답에 대한 질의 생성 과정에서, 현재까지 추출된 단어 토큰들의 질의 타입을 판단하는 단계, 복수의 질의 생성 모델들 중에서, 판단한 질의 타입과 다른 타입의 특정 질의 생성 모델을, 다음 단어 토큰을 생성할 모델로 결정하는 단계, 상기 특정 질의 생성 모델이 입력 정보로부터, 어휘에 대해 예측한 확률 분포를 획득하는 단계, 그리고 상기 확률 분포를, 상기 입력 정보에 대한 정규화된 레이블로 생성하고, 상기 입력 정보와 상기 정규화된 레이블을 이용하여, 신규 질의 생성 모델을 학습시키는 단계를 포함한다.A method of operating a learning device, in a process of generating a query for a document and a correct answer, determining a query type of word tokens extracted so far, and a specific query generation model of a different type from the determined query type among a plurality of query generation models. Determining , as a model to generate the next word token, obtaining a probability distribution predicted for a vocabulary by the specific query generation model from input information, and using the probability distribution as a normalized label for the input information. and learning a new query generation model using the input information and the normalized label.

Description

Method and apparatus for learning unsupervised query generation model {METHOD AND APPARATUS FOR TRAINING UNSUPERVISED QUESTION GENERATION MODEL}

본 발명은 질의 생성에 관한 것이다.The present invention relates to query generation.

질의 생성(Question Generation, QG) 모델은, 주어진 문서(context)와 정답(answer)에 가장 알맞은 질의(question)를 생성하는 것을 목표로 하며, 주어진 문서와 질의로부터 정답을 유추하는 질의 응답(Question Answering, QA)과 관련된다. The question generation (QG) model aims to generate the most appropriate question for a given document (context) and correct answer (answer), and question answering (Question Answering) inferring the correct answer from the given document and question. , QA).

일반적으로 질의 응답 모델은 문서, 질의, 정답 구조로 미리 구축되어있는 질의 응답 데이터셋을 통해 학습할 수 있다. 데이터셋의 품질에 따라 질의 응답 모델의 성능이 달라지는데, 고품질의 데이터셋을 생성하는 비용이 만만치 않다.In general, a question-answering model can be learned through a question-and-answer dataset pre-constructed in the structure of documents, questions, and answers. The performance of the query response model varies depending on the quality of the dataset, and the cost of creating a high-quality dataset is not insignificant.

한편, 질의 응답 데이터셋 없이, 주어진 문서로부터 정답과 질의를 생성하는 비지도 학습이 제안되었고, 이를 통해 질의 응답 데이터셋을 생성 및 확장할 수 있다. 하지만, 종래의 비지도 기반 질의 생성 모델은, 역번역(back-translation) 기반으로 질의를 생성하므로, 질의가 문서와 같은 단어와 순서로 생성된다. 결과적으로 비지도 기반으로 생성된 질의 응답 데이터셋의 난이도가 낮아서, 강인한 질의 응답 모델을 학습시키기 어렵다.On the other hand, unsupervised learning that generates correct answers and questions from a given document without a question-answering dataset has been proposed, and through this, a question-answering dataset can be created and expanded. However, since the conventional unsupervised query generation model generates a query based on back-translation, the query is generated with the same words and order as in the document. As a result, the difficulty of the query-answering dataset generated on an unsupervised basis is low, making it difficult to train a robust query-response model.

해결하고자 하는 과제는 서로 다른 질의 생성 모델들을 비편향적으로 정규화(regularization)하는 교사 모델, 그리고 교사 모델의 정규화 과정에서 전달된 정보를 이용하여 질의 생성을 학습하는 학생 모델을 제공하는 것이다.The task to be solved is to provide a teacher model that regularizes different query generation models in an unbiased way, and a student model that learns to generate a query using information transmitted during the regularization process of the teacher model.

해결하고자 하는 과제는 교사 모델이 현재까지 생성된 단어 토큰들의 편향성을 기초로 서로 다른 질의 생성 모델들 중에서 다음 단어 토큰의 확률 분포를 생성할 모델을 결정하는 단어 토큰 수준 정규화(Word Token-level regularization) 방법을 제공하는 것이다.The task to be solved is Word Token-level regularization, in which the teacher model determines a model to generate the probability distribution of the next word token among different query generation models based on the bias of the word tokens generated so far. is to provide a way

해결하고자 하는 과제는 학생 모델이 교사 모델로부터 순차적으로 전달된 단어 토큰의 확률 분포를 이용하여 비지도 학습하는 방법을 제공하는 것이다.The problem to be solved is to provide a method for unsupervised learning using the probability distribution of word tokens sequentially passed from the teacher model to the student model.

한 실시예에 따른 학습 장치의 동작 방법으로서, 문서와 정답에 대한 질의 생성 과정에서, 현재까지 추출된 단어 토큰들의 질의 타입을 판단하는 단계, 복수의 질의 생성 모델들 중에서, 판단한 질의 타입과 다른 타입의 특정 질의 생성 모델을, 다음 단어 토큰을 생성할 모델로 결정하는 단계, 상기 특정 질의 생성 모델이 입력 정보로부터, 어휘에 대해 예측한 확률 분포를 획득하는 단계, 그리고 상기 확률 분포를, 상기 입력 정보에 대한 정규화된 레이블로 생성하고, 상기 입력 정보와 상기 정규화된 레이블을 이용하여, 신규 질의 생성 모델을 학습시키는 단계를 포함한다.A method of operating a learning apparatus according to an embodiment, in a process of generating a query for a document and a correct answer, determining a query type of word tokens extracted so far, and a type different from the determined query type among a plurality of query generation models. Determining a specific query generation model of as a model to generate the next word token, obtaining a probability distribution predicted by the specific query generation model for a vocabulary from input information, and using the probability distribution as the input information and generating a normalized label for , and learning a new query generation model using the input information and the normalized label.

상기 입력 정보는 상기 문서, 상기 정답, 그리고 상기 현재까지 추출된 단어 토큰들을 포함할 수 있다.The input information may include the document, the correct answer, and word tokens extracted up to the present.

상기 동작 방법은 상기 특정 질의 생성 모델이 상기 입력 정보로부터 예측한 새로운 단어 토큰을 추출하고, 상기 현재까지 추출된 단어 토큰들에 상기 새로운 단어 토큰을 추가하고, 상기 질의 생성 과정을 반복하는 단계를 더 포함할 수 있다.The operating method further includes extracting a new word token predicted by the specific query generation model from the input information, adding the new word token to the word tokens extracted so far, and repeating the query generation process. can include

상기 정규화된 레이블은 상기 복수의 질의 생성 모델들 중에서 선택된 질의 생성 모델에서 생성될 수 있다.The normalized label may be generated from a query generation model selected from among the plurality of query generation models.

상기 복수의 질의 생성 모델들은 언어 모델 타입의 질의 생성 모델, 그리고 역번역 기반으로 질의를 생성하는 복사 타입의 질의 생성 모델을 포함할 수 있다.The plurality of query generation models may include a language model type query generation model and a copy type query generation model that generates a query based on reverse translation.

상기 다음 단어 토큰을 생성할 모델로 결정하는 단계는 상기 현재까지 생성된 단어 토큰들이 상기 복수의 질의 생성 모델들 중에서 어느 질의 생성 모델로 편향되어 있는지 판별하고, 편향을 제거하는 방향으로 다음 단어 토큰을 생성할 질의 생성 모델을 선택할 수 있다.The step of determining a model to generate the next word token may include determining which query generation model among the plurality of query generation models the word tokens generated so far are biased, and selecting the next word token in a direction to remove the bias. You can select a query generation model to create.

다른 실시예에 따른 학습 장치의 동작 방법으로서, 서로 다른 타입의 복수의 질의 생성 모델들을 조합하여, 문서로부터 정답에 대한 질의를 구성하는 단어 토큰들을 순차적으로 추출하는 단계, 상기 단어 토크들이 순차적으로 추출될 때마다, 상기 복수의 질의 생성 모델들 중에서 해당 단어 토크를 예측한 질의 생성 모델의 확률 분포를 획득하는 단계, 그리고 상기 단어 토크들이 순차적으로 추출될 때마다 획득한 상기 확률 분포를 상기 문서에 대한 질의 생성을 학습하는 신규 질의 생성 모델로 제공하는 단계를 포함한다.A method of operating a learning apparatus according to another embodiment, comprising sequentially extracting word tokens constituting a query for a correct answer from a document by combining a plurality of query generation models of different types, and sequentially extracting the word talk. obtaining a probability distribution of a query generation model predicting a corresponding word talk among the plurality of query generation models whenever the word talk is extracted, and obtaining the probability distribution obtained whenever the word talk is sequentially extracted for the document. and providing a new query generation model to learn query generation.

상기 단어 토큰들을 순차적으로 추출하는 단계는 현재까지 추출된 단어 토큰들이 상기 복수의 질의 생성 모델들 중에서 어느 질의 생성 모델로 편향되어 있는지 판별하고, 편향을 제거하는 방향으로 다음 단어 토큰을 생성할 특정 질의 생성 모델을 결정하는 단계, 그리고 상기 특정 질의 생성 모델이 입력 정보로부터 예측한 새로운 단어 토큰을 추출하는 단계를 포함할 수 있다.The step of sequentially extracting the word tokens determines which query generation model among the plurality of query generation models the word tokens extracted so far are biased, and generates a specific query to generate the next word token in the direction of removing the bias. Determining a generation model, and extracting a new word token predicted by the specific query generation model from input information.

상기 특정 질의 생성 모델을 결정하는 단계는 상기 현재까지 추출된 단어 토큰들의 질의 타입을 판단하고, 상기 복수의 질의 생성 모델들 중에서, 판단한 질의 타입과 다른 타입을 상기 특정 질의 질의 생성 모델로 결정할 수 있다.The determining of the specific query generation model may include determining a query type of the word tokens extracted so far, and determining a type different from the determined query type as the specific query generation model, among the plurality of query generation models. .

상기 단어 토크들이 순차적으로 추출될 때마다 획득한 상기 확률 분포는, 상기 복수의 질의 생성 모델들을 정규화한 레이블로써 상기 신규 질의 생성 모델의 학습에 사용될 수 있다.The probability distribution obtained each time the word talk is sequentially extracted may be used to learn the new query generation model as a label obtained by normalizing the plurality of query generation models.

또 다른 실시예에 따른 적어도 하나의 프로세서에 의해 동작하는 학습 장치로서, 서로 다른 타입의 복수의 질의 생성 모델들을 조합하여, 문서로부터 정답에 대한 질의를 구성하는 단어 토큰들을 순차적으로 추출하고, 상기 단어 토크들이 순차적으로 추출될 때마다, 상기 복수의 질의 생성 모델들 중에서 해당 단어 토크를 예측한 질의 생성 모델의 확률 분포를 획득하는 교사 모델, 그리고 상기 교사 모델로부터, 상기 단어 토큰들이 순차적으로 추출될 때마다 획득한 상기 확률 분포를, 입력 정보에 대한 레이블로 전달받고, 상기 입력 정보로부터 예측한 다음 단어 토큰의 확률 분포와 상기 레이블과의 손실을 학습하는 학생 모델을 포함한다.A learning apparatus operated by at least one processor according to another embodiment, wherein a plurality of query generation models of different types are combined to sequentially extract word tokens constituting a query for a correct answer from a document, and the word tokens are sequentially extracted. Whenever talk is sequentially extracted, a teacher model that obtains a probability distribution of a query generation model that predicted a corresponding word talk among the plurality of query generation models, and when the word tokens are sequentially extracted from the teacher model and a student model that receives the probability distribution obtained each time as a label for input information, predicts from the input information, and then learns a probability distribution of a word token and a loss with the label.

상기 교사 모델은 현재까지 추출된 단어 토큰들이 상기 복수의 질의 생성 모델들 중에서 어느 질의 생성 모델로 편향되어 있는지 판별하고, 편향을 제거하는 방향으로 다음 단어 토큰을 생성할 특정 질의 생성 모델을 결정하며, 상기 특정 질의 생성 모델이 입력 정보로부터 예측한 새로운 단어 토큰을 추출할 수 있다.The teacher model determines which query generation model among the plurality of query generation models the word tokens extracted so far are biased, and determines a specific query generation model to generate the next word token in a direction to remove the bias, A new word token predicted by the specific query generation model may be extracted from input information.

상기 교사 모델은 상기 현재까지 추출된 단어 토큰들의 질의 타입을 판단하고, 상기 복수의 질의 생성 모델들 중에서, 판단한 질의 타입과 다른 타입을 상기 특정 질의 질의 생성 모델로 결정할 수 있다.The teacher model may determine the query type of the word tokens extracted so far, and determine a different type from the determined query type as the specific query query generation model, among the plurality of query generation models.

상기 복수의 질의 생성 모델들은 언어 모델 타입의 질의 생성 모델, 그리고 역번역 기반으로 질의를 생성하는 복사 타입의 질의 생성 모델을 포함하고, 상기 교사 모델과 상기 학생 모델은 파이프라인으로 연결될 수 있다.The plurality of query generation models include a language model type query generation model and a copy type query generation model that generates queries based on reverse translation, and the teacher model and the student model may be connected through a pipeline.

실시예에 따르면 비지도 환경에서 다양한 질의 생성 모델들을 조합하고 정규화할 수 있다.According to the embodiment, various query generation models may be combined and normalized in an unsupervised environment.

실시예에 따르면 교사 모델과 학생 모델 구조를 기초로, 다양한 질의 생성 모델들의 특징이 일반화된 질의 생성 모델을 생성할 수 있다. According to the embodiment, a generalized query generation model may be generated based on the structure of the teacher model and the student model, and features of various query generation models are generalized.

실시예에 따라 생성된 모델은 챗봇과 같은 대화 시스템, QA 시스템, 정보 검색 시스템 등의 자연어 처리 분야에 폭 넓게 적용될 수 있다. The model generated according to the embodiment can be widely applied to natural language processing fields such as conversation systems such as chatbots, QA systems, and information search systems.

실시예에 따라 생성된 모델은 특히 비지도 학습 환경에서의 질의 생성 및 질의 응답을 제공하므로, 여러 도메인과 언어에 적용될 수 있는 확장성을 가진다.Since the model created according to the embodiment provides query generation and query response in an unsupervised learning environment, it has extensibility applicable to various domains and languages.

실시예에 따라 생성된 모델은 질의 응답 데이터셋이 부족한 언어에 적용되어, 해당 언어를 사용하는 사람들을 위한 QA 시스템 개발에 활용될 수 있다.The model generated according to the embodiment can be applied to a language lacking in a query response dataset and used to develop a QA system for people who use the language.

도 1은 한 실시예에 따른 학습 장치의 구성도이다.
도 2는 인스턴스 수준 정규화(Instance-level regularization) 방법과, 단어 토큰 수준 정규화(Word Token-level regularization) 방법을 설명하는 도면이다.
도 3은 한 실시예에 따른 비지도 기반 질의 생성 모델의 학습을 설명하는 도면이다.
도 4는 한 실시예에 따른 학습 장치의 동작 방법을 설명하는 흐름도이다.
도 5는 한 실시예에 따른 교사 모델의 동작 방법을 설명하는 흐름도이다.
도 6은 한 실시예에 따른 학생 모델의 동작 방법을 설명하는 흐름도이다.1 is a configuration diagram of a learning device according to an embodiment.
2 is a diagram illustrating an instance-level regularization method and a word token-level regularization method.
3 is a diagram illustrating learning of an unsupervised query generation model according to an embodiment.
4 is a flowchart illustrating a method of operating a learning device according to an exemplary embodiment.
5 is a flowchart illustrating a method of operating a teacher model according to an exemplary embodiment.
6 is a flowchart illustrating a method of operating a student model according to an exemplary embodiment.

아래에서는 첨부한 도면을 참고로 하여 본 개시의 실시예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments of the present disclosure will be described in detail so that those skilled in the art can easily carry out the present disclosure. However, the present disclosure may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly describe the present disclosure in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

설명에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. In the description, when a part is said to "include" a certain component, it means that it may further include other components without excluding other components unless otherwise stated.

설명에서, "전송 또는 제공"은 직접적인 전송 또는 제공하는 것뿐만 아니라 다른 장치를 통해 또는 우회 경로를 이용하여 간접적으로 전송 또는 제공도 포함할 수 있다. In the description, “transfer or provision” may include not only direct transmission or provision but also indirect transmission or provision through another device or by using a detour path.

설명에서, 단수로 기재된 표현은 "하나" 또는 "단일" 등의 명시적인 표현을 사용하지 않은 이상, 단수 또는 복수로 해석될 수 있다. In the description, expressions written in the singular may be interpreted in the singular or plural unless explicit expressions such as “a” or “a single” are used.

설명에서, 흐름도에 기재된 동작 순서는 변경될 수 있고, 여러 동작들이 병합되거나, 어느 동작이 분할될 수 있고, 특정 동작은 수행되지 않을 수 있다.In the description, the order of operations described in the flowcharts may be changed, several operations may be merged, certain operations may be divided, and certain operations may not be performed.

설명에서, "…부", "…기", "…모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In the description, terms such as “… unit”, “… unit”, and “… module” refer to a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. .

설명에서, 장치는 적어도 하나의 프로세서가 명령어들(instructions)을 실행함으로써, 본 개시의 동작을 수행할 수 있도록 구성 및 연결된다. 컴퓨터 프로그램은 프로세서가 본 개시의 동작을 실행하도록 기술된 명령어들(instructions)을 포함하고, 비일시적-컴퓨터 판독가능 저장매체(non-transitory computer readable storage medium)에 저장될 수 있다. 컴퓨터 프로그램은 네트워크를 통해 다운로드되거나, 제품 형태로 판매될 수 있다.In the description, an apparatus is configured and connected such that at least one processor can perform the operations of the present disclosure by executing instructions. The computer program includes instructions described to cause a processor to execute the operations of the present disclosure, and may be stored in a non-transitory computer readable storage medium. The computer program may be downloaded through a network or sold in the form of a product.

본 개시의 “모델”은 적어도 하나의 태스크(task)를 학습하는 기계학습모델(machine learning model)로서, 프로세서에 의해 실행되는 컴퓨터 프로그램으로 구현될 수 있다. 본 개시의 “모델”은 입력 데이터, 태스크 종류, 학습 방법 등에 맞게, 신경망(neural network) 기반의 다양한 모델을 이용하여 구성될 수 있다.A “model” of the present disclosure is a machine learning model that learns at least one task, and may be implemented as a computer program executed by a processor. The “model” of the present disclosure may be constructed using various neural network-based models according to input data, task type, learning method, and the like.

질의 생성(Question Generation, QG) 모델은, 문서(context, C)와 정답(answer, A)에 해당하는 질의(question)를 생성하는 모델로서, 다양한 방식으로 질의를 생성할 수 있다. 본 개시에서는, 비지도 환경에서 질의 생성하는 모델로서, 역번역(back-translation) 기반으로 질의를 생성하는 복사 타입의 질의 생성 모델(copy-type QG), 그리고 언어 모델 타입의 질의 생성 모델(Language Model(LM)-type QG)을 예로 들어 설명하나, 질의 생성 모델 타입이나 조합되는 질의 생성 모델들의 수는 다양하게 변경될 수 있다.A question generation (QG) model is a model that generates a question corresponding to a document (context, C) and an answer (answer, A), and the query can be generated in various ways. In the present disclosure, as a query generation model in an unsupervised environment, a copy-type query generation model (copy-type QG) that generates a query based on back-translation, and a language model-type query generation model (Language Model(LM)-type QG) is described as an example, but the type of query generation model or the number of query generation models that are combined can be changed in various ways.

먼저, 복사 타입의 질의 생성 모델(copy-type QG)은 역번역(back-translation) 기반으로 질의를 생성한다. 따라서, 복사 타입의 질의 생성 모델(copy-type QG)은 문서와 같은 단어와 순서로 질의를 생성하게 된다. 예를 들면, “~Level 1 of DDM Architecture was formally published in 1986. ~”를 포함하는 문서(context)로부터, “When level 1 of DDM Architecture was formally published?”가 생성되는데, “level 1 of DDM Architecture was formally published”가 문서로부터 그대로 복사된 단어들이다. First, the copy-type query generation model (copy-type QG) creates a query based on back-translation. Therefore, the copy-type query generation model (copy-type QG) creates a query with the same words and order as in the document. For example, from a context containing “~Level 1 of DDM Architecture was formally published in 1986. ~”, “When level 1 of DDM Architecture was formally published?” is generated, “level 1 of DDM Architecture was formally published” are words copied verbatim from the document.

언어 모델 타입의 질의 생성 모델(LM-type QG)은 사전 학습된 언어 모델(Pre-trained Language Model)에 문서를 입력해서 질의를 생성한다. 하지만, 언어 모델이 질의 생성을 학습한 것이 아니라서, 문서와 너무 다른 질의를 생성하게 된다. 예를 들면, “~Level 1 of DDM Architecture was formally published in 1986. ~”를 포함하는 문서로부터, “When did the rst level 1 of DDM Architecture come out?”가 생성될 수 있다. A language model type query generation model (LM-type QG) generates a query by inputting a document to a pre-trained language model. However, since the language model has not learned to generate queries, it generates queries that are too different from the document. For example, from a document including "~Level 1 of DDM Architecture was formally published in 1986. ~", "When did the rst level 1 of DDM Architecture come out?" can be generated.

하지만, 비지도 환경에서는 기준(ground truth) 질문이 제공되지 않기 때문에 복사 타입의 질의 생성 모델(copy-type QG)과 언어 모델 타입의 질의 생성 모델(LM-type QG)의 문제를 해결하는 것이 쉽지 않다.However, in an unsupervised environment, since ground truth questions are not provided, it is easy to solve the problem of copy-type QG and language model type query generation models (LM-type QG). not.

본 개시는 서로 다른 종류의 질의 생성 모델들을 조합 및 정규화(regularization)해서 각 질의 생성 모델들이 가지는 단점을 해결하고, 정규화 과정에서 생성된 정보를 이용하여 질의 생성 모델들의 특징을 하나의 질의 생성 모델(학생 모델)로 일반화하는 방법에 대해 자세히 설명한다.The present disclosure solves the disadvantages of each query generation model by combining and regularizing different types of query generation models, and combines the characteristics of the query generation models into one query generation model (regularization) using information generated in the regularization process. The generalization method to the student model) is described in detail.

설명에서, 생성 모델로서, 질의 생성 모델을 예로 들어 설명하지만, 본 개시에서 제안된 프레임워크는 텍스트 생성(text generation)과 같은 다양한 정보 생성 분야에 일반화될 수 있다. 예를 들어, 질의 생성 모델들을 조합 및 정규화하는 방법이 텍스트 생성 모델들을 앙상블하는 방법으로 활용될 수 있다. In the description, a query generation model is taken as an example as the generation model, but the framework proposed in this disclosure can be generalized to various information generation fields such as text generation. For example, a method of combining and normalizing query generation models may be utilized as a method of ensembling text generation models.

도 1은 한 실시예에 따른 학습 장치의 구성도이다.1 is a configuration diagram of a learning device according to an embodiment.

도 1을 참고하면, 학습 장치(10)는 적어도 하나의 프로세서에 의해 동작하는 컴퓨팅 장치로 구현될 수 있다. 학습 장치(10)는 하나 이상의 프로세서(11), 프로세서(11)에 의하여 수행되는 컴퓨터 프로그램을 로드하는 메모리(13), 컴퓨터 프로그램 및 각종 데이터를 저장하는 저장 장치(15), 통신 인터페이스(17), 그리고 이들을 연결하는 버스(19)를 포함할 수 있다. 이외에도, 학습 장치(10)는 다양한 구성 요소가 더 포함될 수 있다. Referring to FIG. 1 , the learning device 10 may be implemented as a computing device operated by at least one processor. The learning device 10 includes at least one processor 11, a memory 13 for loading a computer program executed by the processor 11, a storage device 15 for storing the computer program and various data, and a communication interface 17 , and a bus 19 connecting them. In addition, the learning device 10 may further include various components.

프로세서(11)는 학습 장치(10)의 동작을 제어하는 장치로서, 컴퓨터 프로그램에 포함된 명령어들을 처리하는 다양한 형태의 프로세서일 수 있고, 예를 들면, CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 또는 본 개시의 기술 분야에 잘 알려진 임의의 형태의 프로세서 중 적어도 하나를 포함하여 구성될 수 있다. The processor 11 is a device for controlling the operation of the learning device 10, and may be various types of processors that process instructions included in a computer program, for example, a Central Processing Unit (CPU) or a Micro Processor (MPU). Unit), a Micro Controller Unit (MCU), a Graphic Processing Unit (GPU), or any type of processor well known in the art of the present disclosure.

메모리(13)는 각종 데이터, 명령 및/또는 정보를 저장한다. 메모리(13)는 본 개시의 동작을 실행하도록 기술된 명령어들이 프로세서(11)에 의해 처리되도록 해당 컴퓨터 프로그램을 저장 장치(15)로부터 로드할 수 있다. 메모리(13)는 예를 들면, ROM(read only memory), RAM(random access memory) 등 일 수 있다. The memory 13 stores various data, commands and/or information. The memory 13 may load a corresponding computer program from the storage device 15 so that the instructions described to execute the operations of the present disclosure are processed by the processor 11 . The memory 13 may be, for example, read only memory (ROM) or random access memory (RAM).

저장 장치(15)는 컴퓨터 프로그램, 각종 데이터를 비임시적으로 저장할 수 있다. 저장 장치(15)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 개시가 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체를 포함하여 구성될 수 있다.The storage device 15 may non-temporarily store a computer program and various data. The storage device 15 may be a non-volatile memory such as a read only memory (ROM), an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a hard disk, a removable disk, or a It may be configured to include any well-known form of computer-readable recording medium.

통신 인터페이스(17)는 유/무선 통신을 지원하는 유/무선 통신 모듈일 수 있다. The communication interface 17 may be a wired/wireless communication module supporting wired/wireless communication.

버스(19)는 학습 장치(10)의 구성 요소 간 통신 기능을 제공한다. The bus 19 provides communication between components of the learning device 10 .

컴퓨터 프로그램은, 프로세서(11)에 의해 실행되는 명령어들(instructions)을 포함하고, 비일시적-컴퓨터 판독가능 저장매체(non-transitory computer readable storage medium)에 저장되며, 명령어들은 프로세서(11)가 본 개시의 동작을 실행하도록 만든다. 컴퓨터 프로그램은 네트워크를 통해 다운로드되거나, 제품 형태로 판매될 수 있다. The computer program includes instructions executed by the processor 11, and is stored in a non-transitory computer readable storage medium, and the instructions are stored in a non-transitory computer readable storage medium, and the instructions are Makes the action of initiation executed. The computer program may be downloaded through a network or sold in the form of a product.

본 개시에서 설명하는 “모델”은 프로세서(11)에 의해 실행되는 컴퓨터 프로그램으로 구현될 수 있다. 설명에서는, 학습 장치(10), 프로세서(11), 또는 교사 모델, 학생 모델, 생성 모델, 정규화 모델을 동작의 주체로 설명할 수 있다.The “model” described in this disclosure may be implemented as a computer program executed by the processor 11 . In the description, the learning device 10, the processor 11, or a teacher model, a student model, a generative model, or a normalization model may be described as a subject of operation.

도 2는 인스턴스 수준 정규화(Instance-level regularization) 방법과, 단어 토큰 수준 정규화(Word Token-level regularization) 방법을 설명하는 도면이다.2 is a diagram illustrating an instance-level regularization method and a word token-level regularization method.

도 2의 (a)를 참고하면, 서로 다른 종류의 생성 모델들을 사용하여 생성 모델들의 특징을 하나의 생성 모델(설명에서 “학생 모델”이라고 함)로 일반화하는 방법으로서, 인스턴스 수준 정규화가 가능하다.Referring to (a) of FIG. 2, as a method of generalizing the characteristics of generative models into one generative model (referred to as “student model” in the description) using different types of generative models, instance-level normalization is possible. .

인스턴스 수준 정규화 방법은 서로 다른 생성 모델들(예를 들면, copy-type QG 및 LM-type QG)의 데이터셋을 통합하고, 각 생성 모델에서 생성된 인스턴스들(예를 들면, 질의 문장들)을 선택하여 학생 모델(student QG)을 학습시킬 수 있다. The instance-level normalization method integrates datasets of different generative models (e.g., copy-type QG and LM-type QG) and classifies the instances (e.g., query statements) generated from each generative model. Select to train the student model (student QG).

도 2의 (b)를 참고하면, 전체 문장을 선택하는 인스턴스 수준 정규화 대신, 좀더 세분화된 단어 토큰 수준 정규화 방법을 통해, 서로 다른 종류의 생성 모델들을 정규화할 수 있다. 이때, 학생 모델이 균형 있는 데이터로 학습해야, 문서와 너무 유사하거나 문서와 너무 다른 질의를 생성하는 각 생성 모델의 단점을 해결할 수 있다. Referring to (b) of FIG. 2 , different types of generative models may be normalized through a more detailed word-token-level normalization method instead of instance-level normalization that selects entire sentences. At this time, the student model should be trained with balanced data to solve the disadvantage of each generative model that generates a query that is too similar to or different from the document.

따라서, 교사 모델은 학생 모델이 복수의 생성 모델들 중에서 특정 생성 모델로 편향되지 않도록, 질문을 구성하는 단어 토큰들이 서로 다른 생성 모델들에서 고르게 생성되도록 판단한다. 다음에서, 학생 모델이 복수의 생성 모델들 중에서 특정 생성 모델로 편향되지 않도록, 교사 모델이 복수의 생성 모델들을 정규화 방법, 그리고 이를 통한 학생 모델의 학습에 대해 자세히 설명한다.Therefore, the teacher model determines that the word tokens constituting the question are evenly generated in different generative models so that the student model is not biased toward a specific generative model among a plurality of generative models. Next, a method for normalizing a plurality of generative models by a teacher model and learning of the student model through the normalization method will be described in detail so that the student model is not biased toward a specific generative model among the plurality of generative models.

도 3은 한 실시예에 따른 비지도 기반 질의 생성 모델의 학습을 설명하는 도면이다.3 is a diagram illustrating learning of an unsupervised query generation model according to an embodiment.

도 3을 참고하면, 학습 장치(10)는 교사 모델(Teacher model)(100)을 이용하여 학생 모델(Student model)(200)을 학습시킨다. 학생 모델(200)이 복수의 생성 모델들 중에서 특정 생성 모델로 편향되지 않도록, 교사 모델(100)이 복수의 생성 모델들을 정규화한다. 교사 모델(100)과 학생 모델(200)은 파이프라인으로 연결될 수 있다. 참고로, 학생 모델이 교사 모델을 모방하면서 학습하는 일반적인 지식 증류(Knowledge distillation)와 달리, 교사 모델(100)이 학생 모델(200)을 비편향된 생성 모델로 만들기 위해 확률 분포를 선별하는 특징이 있고, 이러한 관계를 교사 모델과 학습 모델로 명명한 것이다.Referring to FIG. 3 , the learning device 10 learns a student model 200 using a teacher model 100 . The teacher model 100 normalizes the plurality of generative models so that the student model 200 is not biased toward a specific generative model among the plurality of generative models. The teacher model 100 and the student model 200 may be connected through a pipeline. For reference, unlike general knowledge distillation in which the student model learns while imitating the teacher model, the teacher model 100 has a feature of selecting a probability distribution to make the student model 200 an unbiased generative model, , and named this relationship the teacher model and the learning model.

교사 모델(100)은 복수의 생성 모델들(110, 130), 그리고 정규화 모델(150)로 구성될 수 있다. 생성 모델(110)은 언어 모델 타입의 질의 생성 모델(LM-type QG)이라고 가정한다. 생성 모델(130)은 복사 타입의 질의 생성 모델(copy-type QG)이라고 가정한다. The teacher model 100 may include a plurality of generative models 110 and 130 and a normalization model 150 . It is assumed that the generation model 110 is a query generation model (LM-type QG) of a language model type. It is assumed that the generation model 130 is a copy-type query generation model (copy-type QG).

복수의 생성 모델들(110, 130) 각각은 입력 정보를 기초로 다음에 생성할 단어 토큰의 확률 분포를 출력하고, 단어 토큰의 확률 분포를 정규화 모델(150)에게 제공한다. 단어 토큰의 확률 분포는 어휘(vocabulary)에 포함된 모든 토큰들의 확률 분포를 의미한다. 타임 스텝마다 각 생성 모델로 입력 정보가 입력되는데, 현재 타임 스텝(step=t)의 입력 정보는 문서(Context, C), 정답(Answer, A) 그리고 현재까지 연속적으로 생성된 단어 토큰들(q_t<t)일 수 있다. 정답은 문서에서 인식된 임의의 개체명(named entity)일 수 있다. Each of the plurality of generation models 110 and 130 outputs a probability distribution of word tokens to be generated next based on input information, and provides the probability distribution of word tokens to the normalization model 150 . The probability distribution of word tokens means the probability distribution of all tokens included in the vocabulary. Input information is input to each generation model at each time step. The input information of the current time step (step = t) includes documents (Context, C), answers (Answer, A), and word tokens (q _{t < t} ). The correct answer can be any named entity recognized in the document.

정규화 모델(150)은 현재까지 생성된 단어 토큰들(q_t<t)이 복수의 생성 모델들 중에서 어느 생성 모델로 편향되어 있는지 판별하고, 편향을 제거하는 방향으로 다음 단어 토큰을 생성할 생성 모델을 선택한다. 정규화 모델(150)은 단어 토큰들로 구성된 질문이 언어 모델 타입(LM-type)이나 복사 타입(Copy-type)으로 쉽게 판별되는 것을 방지하는 인공신경망 모델일 수 있다. 예를 들면, 정규화 모델(150)은 생성적 적대 신경망(Generative Adversarial Networks, GAN)의 판별기(discriminator)로 구현될 수 있다. The normalization model 150 determines which of the plurality of generative models the word tokens (q _t<t ) generated so far are biased, and generates the next word token in the direction of removing the bias. Choose The normalization model 150 may be an artificial neural network model that prevents questions composed of word tokens from being easily determined as a language model type (LM-type) or a copy-type type (Copy-type). For example, the regularization model 150 may be implemented as a discriminator of generative adversarial networks (GANs).

정규화 모델(150)은 현재 타입 스텝(step=t)까지 생성된 단어 토큰들(q_t<t)이 LM-type과 Copy-type일 확률을 계산하고, LM-type 확률이 더 크다면, 다음 단어 토큰(q_t')은 Copy-type의 점수를 최대로 만들고, Copy-type 확률이 더 크다면, 다음 단어 토큰은 LM-type의 점수를 최대로 만들 수 있다.The normalization model 150 calculates the probability that the word tokens (q _t<t ) generated up to the current type step (step = t) are LM-type and Copy-type, and if the LM-type probability is greater, the next The word token (q _t ') maximizes the score of Copy-type, and if the copy-type probability is greater, the next word token can maximize the score of LM-type.

정규화 모델(150)은 입력 정보(C, A, q_t<t)에 대한 정규화된 레이블(regularized label)을 생성하고, 정규화된 레이블을 학생 모델(200)로 전달한다. 정규화된 레이블은 다음 단어 토큰(q_t')의 확률 분포이다. 정규화 모델(150)은 다음 단어 토큰을 생성할 생성 모델을 선택하고, 선택한 생성 모델에서 출력된 단어 토큰의 확률 분포를, 파이프라인 연결된 학생 모델(200)에게 전달한다. The regularization model 150 generates regularized labels for the input information (C, A, q _t<t ), and transfers the regularized labels to the student model 200. The normalized label is the probability distribution of the next word token (q _t '). The regularization model 150 selects a generation model to generate the next word token, and transfers a probability distribution of word tokens output from the selected generation model to the student model 200 connected to the pipeline.

교사 모델(100)를 구성하는 복수의 생성 모델들(110, 130) 각각은 질문이 완성될 때까지 입력 정보(C, A, q_t<t)로부터 다음 단어 토큰의 확률 분포를 생성하는 과장을 반복한다. 정규화 모델(150)은 질문이 완성될 때까지 입력 정보(C, A, q_t<t)로부터, 현재까지 생성된 단어 토큰들(q_t<t)의 편향을 제거하기 위한 생성 모델을 선택하고, 선택한 생성 모델이 생성한 다음 단어 토큰의 확률 분포를 학생 모델(200)에게 전달하는 정규화 과정을 반복한다.Each of the plurality of generative models 110 and 130 constituting the teacher model 100 exaggerates the probability distribution of the next word token from the input information (C, A, q _t<t ) until the question is completed. repeat The regularization model 150 selects a generative model for removing the bias of word tokens generated so far (q _t _{<t ) from the input information (C, A, q t <t} ) until the question is completed, and , repeats the normalization process of passing the probability distribution of the next word token generated by the selected generative model to the student model 200 .

학생 모델(200)은 다양한 생성 모델들의 특징을 학습하는 단일 생성 모델이고, 설명에서는 질의 생성 모델이라고 가정한다. 학생 모델(200)은 현재 타임 스텝(step=t)의 입력 정보(C, A, q_t<t)를 기초로 다음 단어 토큰의 확률 분포를 추론한다. 이때, 학생 모델(200)은 정규화 모델(150)로부터 전달된 정규화된 레이블인 확률 분포와의 손실(loss)을 최소화하는 학습을 반복한다. 학생 모델(200)은 KL-divergence loss를 이용하여 손실을 최소화하는 학습을 할 수 있다.The student model 200 is a single generative model that learns the characteristics of various generative models, and is assumed to be a query generative model in the description. The student model 200 infers the probability distribution of the next word token based on the input information (C, A, q _t<t ) of the current time step (step=t). At this time, the student model 200 repeats learning to minimize a loss with a probability distribution that is a normalized label transmitted from the normalization model 150 . The student model 200 can learn to minimize loss using KL-divergence loss.

이와 같이, 학생 모델(200)은 질문 스타일이 어느 생성 모델로 편향되지 않도록 선택된 단어 토큰 수준의 확률 분포를 제공받고, 제공받은 확률 분포와의 손실을 최소화하는 학습을 한다. 따라서, 학생 모델(200)은 비지도 환경에서도 문서와 너무 유사하거나 문서와 너무 다른 두 질의 타입이 혼합된 질의를 생성할 수 있다.In this way, the student model 200 is provided with a probability distribution at the word token level selected so that the question style is not biased toward a certain generation model, and learns to minimize loss with the provided probability distribution. Accordingly, the student model 200 may generate a query in which two query types that are too similar to or different from the document are mixed even in an unsupervised environment.

도 4는 한 실시예에 따른 학습 장치의 동작 방법을 설명하는 흐름도이다.4 is a flowchart illustrating a method of operating a learning device according to an exemplary embodiment.

도 4를 참고하면, 학습 장치(10)는 현재까지 추출된 단어 토큰들의 질의 타입을 판단하고, 판단한 질의 타입과 다른 타입의 질의 생성 모델을 다음 단어 토큰을 생성할 모델로 결정한다(S110).Referring to FIG. 4 , the learning device 10 determines the query type of word tokens extracted so far, and determines a query generation model of a different type from the determined query type as a model to generate the next word token (S110).

학습 장치(10)는 결정한 질의 생성 모델이 입력 정보로부터 어휘에 대해 예측한 확률 분포를 획득한다(S120). 입력 정보는, 문서, 정답, 그리고 현재까지 추출된 단어 토큰들로 구성될 수 있다.The learning device 10 obtains a probability distribution predicted for vocabulary from the input information of the determined query generation model (S120). The input information may consist of documents, correct answers, and word tokens extracted so far.

학습 장치(10)는 결정한 질의 생성 모델이 예측한 확률 분포를, 입력 정보에 대한 정규화된 레이블로 생성한다(S130).The learning device 10 generates a probability distribution predicted by the determined query generation model as a normalized label for the input information (S130).

학습 장치(10)는 입력 정보에 대한 정규화된 레이블을 이용하여, 신규 질의 생성 모델(학생 모델)을 학습시킨다(S140).The learning device 10 trains a new query generation model (student model) by using the normalized label for the input information (S140).

도 5는 한 실시예에 따른 교사 모델의 동작 방법을 설명하는 흐름도이다.5 is a flowchart illustrating a method of operating a teacher model according to an exemplary embodiment.

도 5를 참고하면, 교사 모델(100)은 문서를 입력받는다(S210).Referring to FIG. 5 , the teacher model 100 receives a document (S210).

교사 모델(100)은 문서에서 인식된 임의의 개체명을 정답으로 선택하고, 문서와 정답을 포함하는 초기 입력 정보를 생성한다(S220).The teacher model 100 selects an arbitrary entity name recognized from the document as the correct answer, and generates initial input information including the document and the correct answer (S220).

교사 모델(100)은 복수의 생성 모델들로 입력 정보를 입력하고, 현재까지 생성된 단어 토큰들의 질의 타입을 기초로, 복수의 생성 모델들 중에서 다음 단어 토큰을 생성할 특정 생성 모델을 결정한다(S230). 교사 모델(100)은 현재까지 생성된 단어 토큰들(q_t<t)로부터 질의 타입(LM-type인지 Copy-type)을 판단하고, 판단한 타입과 다른 타입의 생성 모델에서 예측한 다음 단어 토큰을 사용한다.The teacher model 100 inputs input information into a plurality of generation models, and determines a specific generation model to generate the next word token from among the plurality of generation models based on the query type of word tokens generated so far ( S230). The teacher model 100 determines the query type (LM-type or Copy-type) from the word tokens (q _t<t ) generated so far, predicts it with a generation model of a different type from the determined type, and then generates the word token use.

교사 모델(100)은 특정 생성 모델에서 예측한 새로운 단어 토큰을 추출하고, 이전에 추출한 단어 토큰들에 새로운 단어 토큰을 추가하여 질의를 생성한다(S240). The teacher model 100 generates a query by extracting new word tokens predicted by a specific generation model and adding new word tokens to previously extracted word tokens (S240).

한편, 교사 모델(100)은 특정 생성 모델에서 예측한 입력정보에 대한 확률 분포를 저장하고, 입력 정보에 대한 확률 분포를 학생 모델(200)의 학습 데이터로 제공한다(S250). 교사 모델(100)은 선택된 특정 단어 토큰을 입력 정보의 레이블로 전달하는 대신, 확률 분포를 레이블로 제공한다. 이를 통해, 학생 모델(200)은 어휘에 포함된 단어 토큰들 전체의 확률을 학습할 수 있다. Meanwhile, the teacher model 100 stores a probability distribution of input information predicted by a specific generation model and provides the probability distribution of the input information as learning data of the student model 200 (S250). The teacher model 100 provides a probability distribution as a label instead of passing a selected specific word token as a label of input information. Through this, the student model 200 may learn probabilities of all word tokens included in the vocabulary.

교사 모델(100)은 현재 타임 스텝에서 질의 생성 완료인지 판단한다(S260). 교사 모델(100)은 물음표가 생성되면 질의 생성 완료로 판단할 수 있다.The teacher model 100 determines whether query generation is complete at the current time step (S260). The teacher model 100 may determine that the question generation is complete when a question mark is generated.

교사 모델(100)은 질의 생성 미완료이면, 다음 단어 토큰 추출을 위해, 선택한 단어 토큰을 입력 정보에 추가하여 질의 생성 과정(S130)을 반복한다(S270). 입력 정보는 문서, 정답, 그리고 지금까지 선택된 단어 토큰들로 구성될 수 있다.If the query generation is not completed, the teacher model 100 repeats the query generation process (S130) by adding the selected word token to the input information to extract the next word token (S270). The input information can consist of the document, the correct answer, and the word tokens selected so far.

교사 모델(100)은 질의 생성이 완료되면, 문서로부터의 질의 생성 과정을 종료한다(S280).When the query generation is completed, the teacher model 100 ends the query generation process from the document (S280).

도 6은 한 실시예에 따른 학생 모델의 동작 방법을 설명하는 흐름도이다.6 is a flowchart illustrating a method of operating a student model according to an exemplary embodiment.

도 6을 참고하면, 학생 모델(200)은 교사 모델(100)로부터 전달된 입력 정보를 이용하여 다음 단어 토큰의 확률 분포를 예측한다(S310).Referring to FIG. 6 , the student model 200 predicts the probability distribution of the next word token using input information transmitted from the teacher model 100 (S310).

학생 모델(200)은 교사 모델(100)로부터 전달된 입력 정보에 대한 확률 분포와 예측한 확률 분포의 손실을 계산하고, 손실을 학습한다(S320). 입력 정보에 대한 확률 분포는 입력 정보로부터 다음 단어 토큰을 예측한 생성 모델이 어휘에 대해 예측한 확률 분포다. 입력 정보에 대한 확률 분포를 제공하는 생성 모델은 교사 모델(100)의 판단에 따라 결정된다. 따라서, 학생 모델(200)이 제공받는 입력 정보에 대한 확률 분포는, 교사 모델(100)에 의해 정규화된 레이블이다.The student model 200 calculates a probability distribution for the input information transmitted from the teacher model 100 and a loss of the predicted probability distribution, and learns the loss (S320). The probability distribution for the input information is the probability distribution predicted for the vocabulary by the generative model that predicted the next word token from the input information. A generation model providing a probability distribution for input information is determined according to the judgment of the teacher model 100 . Accordingly, a probability distribution of input information provided to the student model 200 is a label normalized by the teacher model 100 .

이와 같이, 실시예에 따르면 비지도 환경에서 다양한 질의 생성 모델들을 조합하고 정규화할 수 있다. 실시예에 따르면 교사 모델과 학생 모델 구조를 기초로, 다양한 질의 생성 모델들의 특징이 일반화된 질의 생성 모델을 생성할 수 있다. 실시예에 따라 생성된 모델은 챗봇과 같은 대화 시스템, QA 시스템, 정보 검색 시스템 등의 자연어 처리 분야에 폭 넓게 적용될 수 있다. 실시예에 따라 생성된 모델은 특히 비지도 학습 환경에서의 질의 생성 및 질의 응답을 제공하므로, 여러 도메인과 언어에 적용될 수 있는 확장성을 가진다. 실시예에 따라 생성된 모델은 질의 응답 데이터셋이 부족한 언어에 적용되어, 해당 언어를 사용하는 사람들을 위한 QA 시스템 개발에 활용될 수 있다.In this way, according to the embodiment, various query generation models may be combined and normalized in an unsupervised environment. According to the embodiment, a generalized query generation model may be generated based on the structure of the teacher model and the student model, and features of various query generation models are generalized. The model generated according to the embodiment can be widely applied to natural language processing fields such as conversation systems such as chatbots, QA systems, and information search systems. Since the model created according to the embodiment provides query generation and query response in an unsupervised learning environment, it has extensibility applicable to various domains and languages. The model generated according to the embodiment can be applied to a language lacking in a query response dataset and used to develop a QA system for people who use the language.

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiments of the present invention described above are not implemented only through devices and methods, and may be implemented through programs that realize functions corresponding to the configuration of the embodiments of the present invention or a recording medium on which the programs are recorded.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements made by those skilled in the art using the basic concept of the present invention defined in the following claims are also included in the scope of the present invention. that fall within the scope of the right.

Claims

As a method of operating a learning device,
Determining query types of word tokens extracted so far in a query generation process for documents and correct answers;
Determining, among a plurality of query generation models, a specific query generation model of a different type from the determined query type as a model to generate a next word token;
obtaining a probability distribution predicted for a vocabulary from input information by the specific query generation model; and
generating the probability distribution as a normalized label for the input information, and learning a new query generation model using the input information and the normalized label;
Operation method including.

In paragraph 1,
The above input information is
and word tokens extracted up to the document, the correct answer, and the present.

In paragraph 1,
Extracting a new word token predicted by the specific query generation model from the input information, adding the new word token to the word tokens extracted so far, and repeating the query generation process.
Further comprising a method of operation.

In paragraph 1,
The normalized label is
Generated from a query generation model selected from among the plurality of query generation models.

In paragraph 1,
The plurality of query generation models are
An operating method comprising a language model type query generation model and a copy type query generation model that generates a query based on reverse translation.

In paragraph 1,
The step of determining a model to generate the next word token
Determining which query generation model among the plurality of query generation models the word tokens generated so far are biased, and selecting a query generation model to generate a next word token in a direction to remove the bias.

As a method of operating a learning device,
sequentially extracting word tokens constituting a query for a correct answer from a document by combining a plurality of query generation models of different types;
obtaining a probability distribution of a query generation model predicting a corresponding word token from among the plurality of query generation models whenever the word tokens are sequentially extracted; and
Providing the probability distribution obtained whenever the word tokens are sequentially extracted as a new query generation model that learns query generation for the document.
Including, operating method.

In paragraph 7,
The step of sequentially extracting the word tokens is
Determining which query generation model, among the plurality of query generation models, the word tokens extracted so far are biased, and determining a specific query generation model to generate the next word token in the direction of removing the bias; and
extracting a new word token predicted by the specific query generation model from input information;
Including, operating method.

In paragraph 8,
The step of determining the specific query generation model is
determining a query type of the word tokens extracted so far, and determining a type different from the determined query type as the specific query query generation model, among the plurality of query generation models.

In paragraph 7,
The probability distribution obtained each time the word tokens are sequentially extracted is used for learning the new query generation model as a label obtained by normalizing the plurality of query generation models.

In paragraph 7,
The plurality of query generation models are
An operating method comprising a language model type query generation model and a copy type query generation model that generates a query based on reverse translation.

A learning device operated by at least one processor,
A plurality of query generation models of different types are combined to sequentially extract word tokens constituting a query for a correct answer from a document, and each time the word tokens are sequentially extracted, a corresponding query generation model is selected from among the plurality of query generation models. A teacher model that obtains the probability distribution of the query generation model that predicted word tokens; and
From the teacher model, the probability distribution obtained each time the word tokens are sequentially extracted is received as a label for the input information, and the probability distribution of the next word token predicted from the input information and the loss with the label student model learning
A learning device comprising a.

In paragraph 12,
The teacher model is
Determine which query generation model, among the plurality of query generation models, the word tokens extracted so far are biased, determine a specific query generation model to generate the next word token in the direction of removing the bias, and generate the specific query. A learning device that extracts new word tokens predicted by the model from input information.

In paragraph 13,
The teacher model is
The learning apparatus, which determines a query type of word tokens extracted so far, and determines a type different from the determined query type as the specific query query generation model, among the plurality of query generation models.

In paragraph 12,
The plurality of query generation models are
It includes a language model type query generation model and a copy type query generation model that generates a query based on reverse translation,
Wherein the teacher model and the student model are connected through a pipeline.