KR20220114157A

KR20220114157A - Commonsense question answer reasoning method and apparatus

Info

Publication number: KR20220114157A
Application number: KR1020210017327A
Authority: KR
Inventors: 박영택; 바트셀렘; 김민성; 이민호
Original assignee: 숭실대학교산학협력단
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2022-08-17
Also published as: KR102464998B1

Abstract

According to the present invention, a commonsense question-answer reasoning method and device are disclosed. According to the present invention, provided is an explainable commonsense question-answer reasoning completion device. The device comprises a memory connected to a processor. The memory stores program instructions executable by the processor to extract a plurality of entities from a plurality of combinations for each of a question and a plurality of answers; extract one or more paths between the extracted entities by referring to a knowledge graph for each of the combinations; convert the one or more paths into sentences through data augmentation; generate a first embedding vector by combining the sentence and the question and generating a second embedding vector for each of the combinations; input the first embedding vector and the second embedding vector to a multi-head attention module to calculate an attention score for each of the combinations; and infer one of the answers corresponding to the question as a correct answer through the attention score. Accordingly, the present invention can grasp a relationship between a question and a correct answer.

Description

Commonsense question answer reasoning method and apparatus

본 발명은 상식적인 질문 답변 추론 방법 및 장치에 관한 것이다. The present invention relates to a common-sense question-answer reasoning method and apparatus.

Commonsense Questions Answering(상식적인 질문 답변 문제)은 인공지능이 직면한 큰 과제중 하나이며, 이는 질문에 답변이 명시적으로 표현되지 않은 질문에 대해 정확하게 대답하는 것을 목표로 한다. Commonsense Questions Answering is one of the big challenges facing artificial intelligence, which aims to accurately answer questions for which the answer is not explicitly expressed.

Commonsense question answering은 질문과 후보 답변들이 주어지면 자동으로 질문에 대한 하나의 정답을 선택하는 시스템이다. n개의 질문이 포함된 상식 질문 답변 작업에 CommonsenseQA 데이터셋을 사용하고 각 질문에는 정답 1개와 오답 4개로 총 5개의 후보 답변으로 구성되어 있다.Commonsense question answering is a system that automatically selects one correct answer to a question given a question and candidate answers. CommonsenseQA dataset is used to answer common sense questions with n questions, and each question consists of 5 candidate answers with 1 correct answer and 4 incorrect answers.

Commonsense question에 대한 답변을 할때 외부의 상식적인 지식이나 사실이 필요하다. When answering the Commonsense question, external common sense knowledge or facts are required.

이때, 지식 그래프에서 이러한 질문 답변에 대한 지식 및 사실을 찾을 수 있다.In this case, knowledge and facts about the answers to these questions can be found in the knowledge graph.

지식 그래프는 트리플 형태(s, r, t)로 사실 정보를 갖고 있으나, 완전하지 않은 지식 그래프는 인공지능 기반 시스템에서 사용하기가 어려운 문제점이 있다. The knowledge graph has factual information in the form of triples (s, r, t), but an incomplete knowledge graph has a problem in that it is difficult to use in an AI-based system.

종래기술에는 지식그래프에서 경로(경로) 정보를 추출하고 이를 활용하여 Commonsense Question에 답하는 방식이 제공될 수 있으나, 지식 그래프의 불완전성으로 인해 추출된 경로 정보에서 의미있는 정답을 찾는 것이 어려운 문제점이 있다. In the prior art, a method of extracting path (path) information from the knowledge graph and using it to answer the Commonsense Question may be provided, but there is a problem in that it is difficult to find a meaningful answer from the extracted path information due to the incompleteness of the knowledge graph. .

종래기술에서는 이러한 문제점을 해결하기 위해 경로를 증가시켰지만 생성된 경로들은 신경망 모델에서 학습이 잘 되지 않는 문제점이 있다. In the prior art, the number of paths is increased to solve this problem, but the generated paths have a problem in that they are not well trained in the neural network model.

대한민국공개특허 제10-2019-0133931호Republic of Korea Patent Publication No. 10-2019-0133931

상기한 종래기술의 문제점을 해결하기 위해, 본 발명은 질문과 정답 사이의 관계를 파악할 수 있는 상식적인 질문 답변 추론 방법 및 장치를 제안하고자 한다.In order to solve the problems of the prior art, the present invention intends to propose a common-sense question answer reasoning method and apparatus capable of grasping a relationship between a question and a correct answer.

상기한 바와 같은 목적을 달성하기 위하여, 본 발명의 일 실시예에 따르면, 상기 프로세서에 연결되는 메모리를 포함하되, 상기 메모리는, 질문과 복수의 답변 각각에 대한 복수의 조합에서 복수의 엔티티를 추출하고, 상기 복수의 조합별로 지식 그래프를 참조하여 상기 추출된 복수의 엔티티 사이의 하나 이상의 경로를 추출하고, 데이터 증강(data augmentation)을 통해 상기 하나 이상의 경로를 문장으로 변환하고, 상기 문장과 상기 질문을 조합하여 제1 임베딩 벡터를 생성하고, 상기 복수의 조합 각각에 대한 제2 임베딩 벡터를 생성하고, 멀티-헤드 어텐션(multi-head attention) 모듈에 상기 제1 임베딩 벡터 및 제2 임베딩 벡터를 입력하여 복수의 조합 각각의 어텐션 스코어를 계산하고, 상기 어텐션 스코어를 통해 상기 질문에 대응되는 복수의 답변 중 하나를 정답으로 추론하도록, 상기 프로세서에 의해 실행 가능한 프로그램 명령어들을 저장하는 설명 가능한 상식적인 질문 답변 추론 완성 장치가 제공된다. In order to achieve the above object, according to an embodiment of the present invention, a memory coupled to the processor is included, wherein the memory extracts a plurality of entities from a plurality of combinations for each of a question and a plurality of answers. and extracting one or more paths between the extracted plurality of entities by referring to the knowledge graph for each of the plurality of combinations, converting the one or more paths into a sentence through data augmentation, and converting the sentence and the question to generate a first embedding vector, generate a second embedding vector for each of the plurality of combinations, and input the first embedding vector and the second embedding vector to a multi-head attention module to calculate the attention score of each of a plurality of combinations, and to infer one of the plurality of answers corresponding to the question as the correct answer through the attention score. An inference completion device is provided.

상기 프로그램 명령어들은, 상기 질문을 토큰화하여 복수의 엔티티를 추출하고, 상기 지식 그래프에 상기 질문에서 추출된 복수의 엔티티에 대응되는 단어들이 상기 지식 그래프에 존재하는지 검사할 수 있다. The program instructions may tokenize the question to extract a plurality of entities, and check whether words corresponding to the plurality of entities extracted from the question exist in the knowledge graph in the knowledge graph.

상기 프로그램 명령어들은, 상기 질문에서 추출된 엔티티와 상기 답변에서 추출된 엔티티를 연결하는 경로 중 미리 설정된 길이 이하의 경로를 선택하고, 상기 선택된 경로를 문장으로 변환할 수 있다. The program instructions may select a path less than a preset length among paths connecting the entity extracted from the question and the entity extracted from the answer, and convert the selected path into a sentence.

상기 프로그램 명령어들은, back-translation을 통해 제1 언어로 이루어진 상기 하나 이상의 경로를 제2 언어로 변역하고, 상기 번역된 제2 언어를 다시 제1 언어로 번역하여 상기 하나 이상의 경로를 문장으로 변환할 수 있다. The program instructions translate the one or more paths made of a first language into a second language through back-translation, and translate the translated second language back into the first language to convert the one or more paths into a sentence. can

상기 프로그램 명령어들은, RoBERTa 모델을 이용하여 상기 제1 임베딩 벡터 및 상기 제2 임베딩 벡터를 생성하고, 상기 제1 임베딩 벡터를 생성하기 위한 제1 입력 데이터는 시작 토큰 <s>, 상기 질문, 서로 다른 세그먼트를 분리하는 토큰 <sep>, 상기 문장 및 종료 토큰 </s>으로 구성되고, 상기 제2 임베딩 벡터를 생성하기 위한 제2 입력 데이터는 시작 토큰 <s>, 상기 질문, 서로 다른 세그먼트를 분리하는 토큰 <sep>, 상기 복수의 답변 중 하나 및 종료 토큰 </s>으로 구성될 수 있다. The program instructions generate the first embedding vector and the second embedding vector by using a RoBERTa model, and the first input data for generating the first embedding vector is a start token <s>, the question, different The second input data for generating the second embedding vector is composed of a token <sep> for separating segments, the sentence, and an end token </s>. It may consist of a token <sep>, one of the plurality of answers, and an end token </s>.

상기 멀티-헤드 어텐션(multi-head attention) 모듈은 상기 복수의 조합에 대응되는 복수의 학습 모델을 포함하고, 상기 복수의 학습 모델은 상기 복수의 조합 각각에 대한 제2 임베딩 벡터와 복수의 조합 각각으로부터 추출된 하나 이상의 경로를 변환환 문장으로부터 생성된 제2 임베딩 벡터가 입력 받아 복수의 어텐션 스코어를 계산할 수 있다. The multi-head attention module includes a plurality of learning models corresponding to the plurality of combinations, and the plurality of learning models includes a second embedding vector for each of the plurality of combinations and a plurality of combinations, respectively. A plurality of attention scores may be calculated by receiving one or more paths extracted from the second embedding vector generated from the transformation sentence as input.

본 발명의 다른 측면에 따르면, 프로세서 및 메모리를 포함하는 장치에서 상식적인 질문 답변을 추론하는 방법으로서, 상기 복수의 조합별로 지식 그래프를 참조하여 상기 추출된 복수의 엔티티 사이의 하나 이상의 경로를 추출하는 단계; 데이터 증강(data augmentation)을 통해 상기 하나 이상의 경로를 문장으로 변환하는 단계; 상기 문장과 상기 질문을 조합하여 제1 임베딩 벡터를 생성하는 단계; 상기 복수의 조합 각각에 대한 제2 임베딩 벡터를 생성하는 단계; 멀티-헤드 어텐션(multi-head attention) 모듈에 상기 제1 임베딩 벡터 및 제2 임베딩 벡터를 입력하여 복수의 조합 각각의 어텐션 스코어를 계산하는 단계; 및 상기 어텐션 스코어를 통해 상기 질문에 대응되는 복수의 답변 중 하나를 정답으로 추론하는 단계를 포함하는 상식적인 질문 답변 추론 방법이 제공된다. According to another aspect of the present invention, there is provided a method for inferring a common-sense question answer in a device including a processor and a memory, extracting one or more paths between the extracted plurality of entities by referring to a knowledge graph for each of the plurality of combinations. step; converting the one or more paths into sentences through data augmentation; generating a first embedding vector by combining the sentence and the question; generating a second embedding vector for each of the plurality of combinations; calculating an attention score of each of a plurality of combinations by inputting the first embedding vector and the second embedding vector to a multi-head attention module; and inferring one of a plurality of answers corresponding to the question as a correct answer through the attention score.

본 발명의 또 다른 측면에 따르면, 상기한 방법을 수행하는 컴퓨터 판독 가능한 프로그램이 제공된다. According to another aspect of the present invention, there is provided a computer readable program for performing the above method.

본 발명에 따르면, Data Augmentation과 질문 답변 임베딩 값을 활용하여 Multi-head Attention 메커니즘을 통해 불완전한 지식 그래프를 사용하여 의미있는 경로를 찾기 어려운 문제점을 해결할 수 있다. According to the present invention, it is possible to solve the problem that it is difficult to find a meaningful path using an incomplete knowledge graph through a multi-head attention mechanism by utilizing data augmentation and question answer embedding values.

도 1은 본 발명의 바람직한 일 실시예에 따른 상식적인 질문 답변 추론 장치의 구성의 도시한 도면이다.
도 2는 본 실시예에 따른 상식적인 질문 답변 추론 과정을 설명하기 위한 도면이다.
도 3은 본 실시예에 따른 경로의 임베딩 과정을 설명하기 위한 도면이다.
도 4는 본 실시예에 따른 질문과 답변의 조합에 대한 임베딩 과정을 설명하기 위한 도면이다. 1 is a diagram illustrating the configuration of a common-sense question answer reasoning apparatus according to a preferred embodiment of the present invention.
2 is a diagram for explaining a common-sense question answer reasoning process according to the present embodiment.
3 is a diagram for explaining a process of embedding a path according to the present embodiment.
4 is a diagram for explaining an embedding process for a combination of a question and an answer according to the present embodiment.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다.Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail.

그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다.However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention. In describing each figure, like reference numerals have been used for like elements.

도 1은 본 발명의 바람직한 일 실시예에 따른 상식적인 질문 답변 추론 장치의 구성의 도시한 도면이다. 1 is a diagram illustrating the configuration of a common-sense question answer reasoning apparatus according to a preferred embodiment of the present invention.

도 1에 도시된 바와 같이, 본 실시예에 따른 장치는 프로세서(100) 및 메모리(102)를 포함할 수 있다. As shown in FIG. 1 , the device according to the present embodiment may include a processor 100 and a memory 102 .

프로세서(100)는 컴퓨터 프로그램을 실행할 수 있는 CPU(central processing unit)나 그밖에 가상 머신 등을 포함할 수 있다. The processor 100 may include a central processing unit (CPU) capable of executing a computer program or other virtual machines.

메모리(102)는 고정식 하드 드라이브나 착탈식 저장 장치와 같은 불휘발성 저장 장치를 포함할 수 있다. 착탈식 저장 장치는 컴팩트 플래시 유닛, USB 메모리 스틱 등을 포함할 수 있다. 메모리(102)는 각종 랜덤 액세스 메모리와 같은 휘발성 메모리도 포함할 수 있다.Memory 102 may include a non-volatile storage device such as a fixed hard drive or a removable storage device. The removable storage device may include a compact flash unit, a USB memory stick, and the like. Memory 102 may also include volatile memory, such as various random access memories.

이와 같은 메모리(102)에는 프로세서(100)에 의해 실행 가능한 프로그램 명령어들이 저장된다. The memory 102 stores program instructions executable by the processor 100 .

본 실시예에 따른 프로그램 명령어들은, 질문과 복수의 답변 각각에 대한 복수의 조합에서 복수의 엔티티를 추출하고, 상기 복수의 조합별로 지식 그래프를 참조하여 상기 추출된 복수의 엔티티 사이의 하나 이상의 경로를 추출하고, 데이터 증강(data augmentation)을 통해 상기 하나 이상의 경로를 문장으로 변환하고, 상기 문장과 상기 질문을 조합하여 제1 임베딩 벡터를 생성하고, 상기 복수의 조합 각각에 대한 제2 임베딩 벡터를 생성하고, 멀티-헤드 어텐션(multi-head attention) 모듈에 상기 제1 임베딩 벡터 및 제2 임베딩 벡터를 입력하여 복수의 조합 각각의 어텐션 스코어를 계산하고, 상기 어텐션 스코어를 통해 상기 질문에 대응되는 복수의 답변 중 하나를 정답으로 추론한다. The program instructions according to the present embodiment extract a plurality of entities from a plurality of combinations for each of a question and a plurality of answers, and refer to a knowledge graph for each of the plurality of combinations to determine one or more paths between the plurality of extracted entities. extracting, converting the one or more paths into sentences through data augmentation, combining the sentences and the questions to generate a first embedding vector, and generating a second embedding vector for each of the plurality of combinations and inputting the first embedding vector and the second embedding vector into a multi-head attention module to calculate an attention score of each of a plurality of combinations, and use the attention score to calculate a plurality of values corresponding to the question. One of the answers is inferred as the correct answer.

일반적으로 Commonsense question answering은 5개의 답변 중 하나를 정답으로 선택하는 것이므로, 경로 추출, 문장 변환, 임베딩 및 어텐션 스코어 계산은 하나의 질문과 5개의 답변에 대한 5개의 조합 각각에 대해 수행될 수 있다. In general, Commonsense question answering is to select one of five answers as the correct answer, so path extraction, sentence transformation, embedding, and attention score calculation can be performed for each of five combinations of one question and five answers.

도 2는 본 실시예에 따른 상식적인 질문 답변 추론 과정을 설명하기 위한 도면이다. 2 is a diagram for explaining a common-sense question answer reasoning process according to the present embodiment.

도 2에서는 Where do adult use glue sticks?라는 질문과 Office라는 하나의 답변에 대한 추론 과정을 나타낸 것이다. 2 shows an inference process for the question of Where do adult use glue sticks? and one answer of Office.

본 실시예에 따르면, 하나의 질문과 복수의 답변 각각에 대한 추론 과정이 수행되며, 최종적으로 복수의 답변 중 하나를 정답으로 결정한다. According to the present embodiment, an inference process for each of a single question and a plurality of answers is performed, and finally, one of the plurality of answers is determined as the correct answer.

도 2를 참조하면, 질문과 답변에 대해 복수의 엔티티를 추출하고, 추출된 엔티티가 지식 그래프에 존재하는지 확인한다. Referring to FIG. 2 , a plurality of entities are extracted for questions and answers, and it is checked whether the extracted entities exist in the knowledge graph.

본 실시예에 따른 장치는 질문을 토큰화하여 복수의 엔티티를 추출하며, 복수의 엔티티는 질문에 포함된 여러 단어 중 명사에 해당하는 단어와 답변을 포함하는 단어일 수 있다. The apparatus according to the present embodiment extracts a plurality of entities by tokenizing a question, and the plurality of entities may be a word corresponding to a noun among several words included in the question and a word including an answer.

이는 Concept matching 과정으로 정의될 수 있고, n-grams 기법을 활용해 엔티티가 될 수 있는 단어들이 지식 그래프에 존재하는지 검사한다. This can be defined as a concept matching process and checks whether words that can become entities exist in the knowledge graph using the n-grams technique.

예를 들어, “What is likely the result of a small episode of falling?" 문장이 있을 때, 복수의 엔티티는 {result, small, episode, fall} 일 수 있다. 종래에는 “falling asleep”같은 경우 지식 그래프에서 매칭될 수 있는 경우가 {fall, fall_asleep, asleep}와 같이 세 개 전부 나오게 된다. For example, when there is the sentence “What is likely the result of a small episode of falling?”, the plurality of entities may be {result, small, episode, fall}. Conventionally, in the case of “falling asleep”, the knowledge graph There are all three cases that can be matched in {fall, fall_asleep, asleep}.

의미적으로 볼 때, “fall”과 “fall_asleep” 다른 의미를 가지므로 유용하지 않다. 본 실시예에서는 다른 의미를 갖지 않는 엔티티를 추출하기 위해 비교적 긴 단어 조합으로 이루어진 엔티티를 활용하게 된다.Semantically, “fall” and “fall_asleep” have different meanings, so they are not useful. In this embodiment, an entity composed of a relatively long combination of words is used to extract entities that do not have different meanings.

이후, 지식 그래프를 참조하여 복수의 엔티티에 대한 하나 이상의 경로를 추출한다. Then, one or more paths for a plurality of entities are extracted with reference to the knowledge graph.

본 실시예에 따르면, 불완전한 지식 그래프를 사용하여 질문으로부터 정보 추출이 어려운 문제점을 피하기 위해 상기한 바와 같이 하나 이상의 경로를 추출한다. According to the present embodiment, one or more paths are extracted as described above in order to avoid a problem in which it is difficult to extract information from a question using an incomplete knowledge graph.

질문과 답변으로부터 추출된 엔티티

를 가지고 지식 그래프에서 두 엔티티 사이에 존재하는 하나 이상의 경로(path)를 추출한다. 경로의 길이는 짧은 것부터 긴 것까지 다양하게 존재할 수 있다. Entities extracted from questions and answers

Extracts one or more paths that exist between two entities in the knowledge graph. The length of the path may vary from short to long.

본 실시예에 따르면, 복수의 경로 중 의미있는 경로를 선택하기 위해 비교적 길이가 짧은 경로(예를 들어, 5 이하)를 선택한다. According to the present embodiment, a relatively short path (eg, 5 or less) is selected in order to select a meaningful path among a plurality of paths.

지식 그래프에서 질문으로부터 추출한 엔티티를 시작으로 답변에서 추출한 엔티티까지 도달하는 경로들을 다음과 같이 표현할 수 있다

: . In the knowledge graph, the paths from the entity extracted from the question to the entity extracted from the answer can be expressed as follows.

: .

많은 경로들을 전부 사용하는 것은 비효율적이므로 랜덤 워크 확률을 이용하여 경로 당 스코어를 계산하고, 이를 통해 추론에 사용할 경로를 선택한다. Since it is inefficient to use all of the many paths, a score per path is calculated using the random walk probability, and a path to be used for inference is selected through this.

도 2에서는 "adult-->capableof-->work-->atlocation-->office"와 "glue_stick-->atlocation-->office" 두 개의 경로가 추출되는 것을 예시적으로 나타낸다. 2 exemplarily shows that two paths "adult-->capableof-->work-->atlocation-->office" and "glue_stick-->atlocation-->office" are extracted.

본 실시예에서는 경로를 학습에 사용하기 위해서 Data augmentation을 활용하여 경로를 의미있는 문장으로 변환한다. In this embodiment, in order to use the path for learning, data augmentation is used to transform the path into a meaningful sentence.

Data augmentation할 때 back-translation 방식을 통해 경로를 문장으로 변환한다. During data augmentation, the path is converted into a sentence through the back-translation method.

이 방법은 경로를 특정 언어로 번역 후 다시 원래 언어로 번역하는 방식이다. In this method, the route is translated into a specific language and then translated back to the original language.

people->capableOf->taste_food<-capableOf<-tongues와 같은 경로가 추출된 경우, back-translation 방식을 통해 경로를 프랑스어로 먼저 번역하고 다시 영어로 번역하여 "people can taste food with a tongue" 와 같은 문장을 생성한다. If a path such as people->capableOf->taste_food<-capableOf<-tongues is extracted, the path is first translated into French through the back-translation method, and then translated back into English, such as "people can taste food with a tongue" create a sentence

입력 경로와 변환된 문장은 다르지만 역 번역 후에도 의미는 같다. The input path and the converted sentence are different, but the meaning is the same even after reverse translation.

본 실시예에 따르면, 하나 이상의 경로를 문장으로 변환하고, 변환된 최종 문장과 질문을 RoBERTa(transformer-based language model)를 이용하여 임베딩한다.According to this embodiment, one or more paths are converted into sentences, and the transformed final sentences and questions are embedded using RoBERTa (transformer-based language model).

이는 변환된 문장을 다차원 공간이 임베딩하여 변환된 문장에 상응하는 임베딩 벡터를 생성하는 것이다. This is to generate an embedding vector corresponding to the transformed sentence by embedding the transformed sentence in a multidimensional space.

RoBERTa 모델 중 next sentence prediction task를 fine-tuning하여 사용할 수 있다. It can be used by fine-tuning the next sentence prediction task among the RoBERTa models.

도 3은 본 실시예에 따른 경로의 임베딩 과정을 설명하기 위한 도면이다. 3 is a diagram for explaining a process of embedding a path according to the present embodiment.

도 3을 참조하면, RoBERTa 모델에 입력 데이터(제1 입력 데이터)는 아래와 같이, 시작 토큰 <s>, 질문(q), 서로 다른 세그먼트를 분리하기 위한 토큰 <sep>, Data Augmentation 결과값인 변환된 문장(s) 및 종료 토큰 </s>으로 구성된다. Referring to FIG. 3 , the input data (first input data) to the RoBERTa model is as follows, a start token <s>, a question (q), a token <sep> for separating different segments, and a transformation that is a data augmentation result value It consists of sentence(s) and end token </s>.

Input Data:

또한, 본 실시예에 따르면, 변환된 문장뿐만 아니라, 질문과 답변의 조합에 대해서도 임베딩을 수행한다. Also, according to the present embodiment, embedding is performed not only on the converted sentence but also on the combination of the question and the answer.

도 4는 본 실시예에 따른 질문과 답변의 조합에 대한 임베딩 과정을 설명하기 위한 도면이다. 4 is a diagram for explaining an embedding process for a combination of a question and an answer according to the present embodiment.

도 4를 참조하면, 아래의 입력 데이터(제2 입력 데이터)를 사용하며, 제2 입력 데이터는 시작 토큰 <s>, 질문(q), 서로 다른 세그먼트를 분리하기 위한 토큰 <sep>, 답변(Answer choice) 및 마지막을 의미하는 </s> 토큰을 사용한다.4 , the following input data (second input data) is used, and the second input data is a start token <s>, a question (q), a token for separating different segments <sep>, and an answer ( Answer choice) and </s> tokens meaning the last.

본 실시예에 따르면, 어텐션 메커니즘을 이용하여 변환된 문장 및 질문을 포함하는 제1 입력 데이터를 통해 생성된 임베딩 벡터와 질문 및 답변을 포함하는 제2 입력 데이터를 통해 생성된 임베딩 벡터 사이의 의미적 유사성을 비교한다. According to the present embodiment, a semantic difference between an embedding vector generated through first input data including sentences and questions transformed using an attention mechanism and an embedding vector generated through second input data including questions and answers Compare similarities.

의미적 유사성 비교를 통해 질문과 복수의 답변 사이의 관계가 유의미한지 여부를 파악할 수 있다. Through semantic similarity comparison, it is possible to determine whether the relationship between a question and multiple answers is significant.

도 5는 본 실시예에 따른 질문에 대한 정답을 결정하는 과정을 설명하기 위한 도면이다. 5 is a diagram for explaining a process of determining a correct answer to a question according to the present embodiment.

도 5를 참조하면, 하나의 질문 및 복수의 답변 각각의 복수의 조합(Question+choice1 내지 Question+choice5)에 대응되는 복수의 학습된 모델이 제공되고, 각 모델은 복수의 조합 각각에 대한 임베딩 벡터와 복수의 조합 각각으로부터 추출된 하나 이상의 경로를 변환환 문장으로부터 생성된 임베딩 벡터가 입력된다. Referring to FIG. 5 , a plurality of learned models corresponding to a plurality of combinations (Question+choice1 to Question+choice5) of one question and a plurality of answers, respectively, are provided, and each model is an embedding vector for each of the plurality of combinations. and one or more paths extracted from each of a plurality of combinations, an embedding vector generated from a transformation sentence is input.

각 모델은 임베딩 벡터의 비교를 통해 의미적 유사성에 대한 어텐션 스코어(score1 내지 score5)를 출력하고, 소프트맥스(softmax)를 통해 가장 높은 값을 질문에 대한 정답으로 결정한다. Each model outputs attention scores (score1 to score5) for semantic similarity through comparison of embedding vectors, and determines the highest value as the correct answer to the question through softmax.

상기한 본 발명의 실시예는 예시의 목적을 위해 개시된 것이고, 본 발명에 대한 통상의 지식을 가지는 당업자라면 본 발명의 사상과 범위 안에서 다양한 수정, 변경, 부가가 가능할 것이며, 이러한 수정, 변경 및 부가는 하기의 특허청구범위에 속하는 것으로 보아야 할 것이다. The above-described embodiments of the present invention have been disclosed for the purpose of illustration, and various modifications, changes, and additions will be possible within the spirit and scope of the present invention by those skilled in the art having ordinary knowledge of the present invention, and such modifications, changes and additions should be regarded as belonging to the following claims.

Claims

As a common-sense question-answer reasoning completion device,
processor; and
a memory coupled to the processor;
The memory is
extracting a plurality of entities from a plurality of combinations for each of a question and a plurality of answers,
extracting one or more paths between the plurality of extracted entities with reference to the knowledge graph for each of the plurality of combinations,
Transform the one or more paths into sentences through data augmentation,
combining the sentence and the question to generate a first embedding vector,
generating a second embedding vector for each of the plurality of combinations;
inputting the first embedding vector and the second embedding vector to a multi-head attention module to calculate an attention score of each of a plurality of combinations;
to infer one of the plurality of answers corresponding to the question as the correct answer through the attention score;
A descriptive common-sense question-answer reasoning completion device that stores program instructions executable by the processor.

According to claim 1,
The program instructions are
A common-sense question answer reasoning completion apparatus for extracting a plurality of entities by tokenizing the question, and checking whether words corresponding to the plurality of entities extracted from the question exist in the knowledge graph in the knowledge graph.

According to claim 1,
The program instructions are
A common-sense question answer inference completion device for selecting a path less than a preset length among paths connecting the entity extracted from the question and the entity extracted from the answer, and converting the selected path into a sentence.

According to claim 1,
The program instructions are
Common-sense question answer reasoning for translating the one or more paths made of a first language into a second language through back-translation, and translating the translated second language back to the first language to convert the one or more paths into a sentence complete device.

According to claim 1,
The program instructions are
generating the first embedding vector and the second embedding vector using the RoBERTa model;
The first input data for generating the first embedding vector consists of a start token <s>, the question, a token separating different segments <sep>, the sentence and an end token </s>,
The second input data for generating the second embedding vector is composed of a start token <s>, the question, a token separating different segments <sep>, one of the plurality of answers, and an end token </s> Common-sense question-answer reasoning complete device.

According to claim 1,
The multi-head attention module includes a plurality of learning models corresponding to the plurality of combinations,
The plurality of learning models receive a second embedding vector for each of the plurality of combinations and one or more paths extracted from each of the plurality of combinations, a second embedding vector generated from a transformation sentence, and a common sense to calculate a plurality of attention scores A complete device for question-answer reasoning.

A method for inferring common-sense question answers in a device comprising a processor and memory, the method comprising:
extracting one or more paths between the plurality of extracted entities with reference to a knowledge graph for each of the plurality of combinations;
converting the one or more paths into sentences through data augmentation;
generating a first embedding vector by combining the sentence and the question;
generating a second embedding vector for each of the plurality of combinations;
calculating an attention score of each of a plurality of combinations by inputting the first embedding vector and the second embedding vector to a multi-head attention module; and
and inferring one of a plurality of answers corresponding to the question as a correct answer through the attention score.

8. The method of claim 7,
generating the first embedding vector and the second embedding vector using the RoBERTa model;
The first input data for generating the first embedding vector consists of a start token <s>, the question, a token separating different segments <sep>, the sentence and an end token </s>,
The second input data for generating the second embedding vector is composed of a start token <s>, the question, a token separating different segments <sep>, one of the plurality of answers, and an end token </s> A common-sense question-answer reasoning method.

8. The method of claim 7,
The multi-head attention module includes a plurality of learning models corresponding to the plurality of combinations,
The plurality of learning models receive a second embedding vector for each of the plurality of combinations and one or more paths extracted from each of the plurality of combinations, a second embedding vector generated from a transformation sentence, and a common sense to calculate a plurality of attention scores A method of reasoning about answering a question.

A computer readable program for performing the method according to claim 7.