KR102352251B1

KR102352251B1 - Method of High-Performance Machine Reading Comprehension through Feature Selection

Info

Publication number: KR102352251B1
Application number: KR1020190073533A
Authority: KR
Inventors: 김학수; 이현구
Original assignee: 강원대학교 산학협력단
Priority date: 2018-06-20
Filing date: 2019-06-20
Publication date: 2022-01-17
Also published as: KR20190143415A

Abstract

본 발명은 자질 선별을 통한 기계독해(Machine Reading Comprehension)에 관한 것으로, 주어진 문맥을 기계가 이해하고 관련된 질의에 대해 답을 하는 질의응답 모델에 관한 것이다. 본 발명은 높은 성능의 기계독해 모델을 위해 사전 학습된 문맥 정보를 통해 문장 이해력을 향상시키고 모델 스스로 유용한 자질을 선별하는 자질 게이트(Feature gate) 방법을 적용한 기계독해 모델 GF-Net(Gated Feature Networks)을 제안한다.The present invention relates to machine reading comprehension through quality selection, and to a question-and-answer model in which a machine understands a given context and answers related questions. The present invention is a machine reading comprehension model GF-Net (Gated Feature Networks) that improves sentence comprehension through pre-trained context information for a high-performance machine reading model and applies a feature gate method that selects useful qualities by the model itself. suggest

Description

Method of High-Performance Machine Reading Comprehension through Feature Selection

본 발명은 자질 선별을 통한 고성능 기계독해 방법에 관한 것이다.The present invention relates to a high-performance machine reading method through quality screening.

기계독해(Machine Reading Comprehension)는 주어진 문맥을 기계가 이해하고 관련된 질의에 대해 답을 하는 질의응답 모델이다. 종래의 질의응답 모델의 일 예로서, 국내공개특허 10-2002-0030545(2002.04.25), 인공지능과 자연어 처리 기술에 기반한 자연어 문장형 질문에 대한 자동 해답 및 검색 제공방법이 제안된 바 있으나, 지식 데이터베이스를 이용하기 때문에 데이터베이스 구축에 많은 자원이 요구되는 문제점이 있다.Machine Reading Comprehension is a question-and-answer model in which a machine understands a given context and answers related questions. As an example of a conventional question-and-answer model, Korean Patent Publication No. 10-2002-0030545 (4.25 2002.04.25), an automatic answer and search provision method for a natural language sentence-type question based on artificial intelligence and natural language processing technology has been proposed. Since a knowledge database is used, there is a problem that a lot of resources are required to build the database.

기계독해는 문서를 통해 정보를 습득하고 활용하는 자가발전형 인공지능을 위해 필수적으로 연구되어야한다. 심층 신경망(Deep Neural Network)을 통한 주의집중 방법(Attention mechanism)과 종단형 모델(end-to-end model)이 가능해지면서 활발히 연구되고 있으며 인코딩, 상호 집중, 응답 추출 3단계가 정립되었다. 인코딩 단계는 단어 임베딩 외에도 음절 임베딩, 품사, 개체명 등 자질을 사용하여 문장의 이해도를 높이도록 연구되었다. 최근 CoVe와 ELMo와 같이 많은 말뭉치를 통해 언어 모델을 사전학습하고 그 결과를 같이 사용하여 전체 모델 성능이 대폭 향상되었다. 상호 집중은 R-Net에서 제안한 문맥에 나타나는 단어가 질의의 어떠한 단어와 연관되는지 파악하는 주의집중 방법, 문맥을 다시 검토하는 자기 집중(self-attention), Bi-directional Attention Flow에서 제안한 질의, 문맥 간 양방향 상호 집중이 연구되었고 많은 연구들이 해당 방법을 사용하고 있다. 응답 추출 단계는 포인터 네트워크, Stochastic Prediction Dropout 등 정답에 해당하는 단어의 시작 위치와 끝 위치를 찾아내는 방식으로 연구되었다. Machine reading comprehension must be studied essential for self-development AI that acquires and utilizes information through documents. As attention mechanism and end-to-end model became possible through a deep neural network, it is being actively studied, and three stages of encoding, mutual concentration, and response extraction have been established. In the encoding stage, besides word embedding, it was studied to increase the understanding of sentences by using qualities such as syllable embedding, part-of-speech, and entity name. Recently, language models are pre-trained through many corpora, such as CoVe and ELMo, and the results are used together to significantly improve overall model performance. Mutual attention is the attention-focusing method to find out which word in the query is related to the word appearing in the context suggested by R-Net, self-attention to review the context again, the query proposed by Bi-directional Attention Flow, and inter-context Two-way reciprocal focus has been studied and many studies are using the method. The response extraction step was studied by finding the starting position and ending position of the word corresponding to the correct answer, such as a pointer network and stochastic prediction dropout.

본 발명은 기계독해의 3단계 중 인코딩 단계와 응답 추출 단계에 모델 스스로 유용한 자질을 선별하는 자질 게이트를 적용하여 성능을 향상시키는 방법을 제안한다.The present invention proposes a method for improving performance by applying a feature gate that selects useful features by the model itself in the encoding step and the response extraction step among the three steps of machine reading.

본 발명은 높은 성능의 기계독해 모델을 위해 사전 학습된 문맥 정보를 통해 문장 이해력을 향상시키고 모델 스스로 유용한 자질을 선별하는 자질 게이트(Feature gate) 방법을 적용한 기계독해 모델 GF-Net(Gated Feature Networks)을 특징으로 한다. The present invention is a machine reading comprehension model GF-Net (Gated Feature Networks) that improves sentence comprehension through pre-trained context information for a high-performance machine reading model and applies a feature gate method that selects useful qualities by the model itself. is characterized by

(a) 자질 선별과 사전학습 정보를 통해 문맥 인코딩 벡터와 질의 인코딩 벡터를 생성하는 인코딩 단계, (b) 상기 인코딩 단계의 문맥과 질의간의 상호 집중 결과 벡터를 생성하는 양방향 상호 집중 단계 및 (c) 상기 상호 집중 결과 벡터를 통해 자질을 선별하여 포인터 네트워크에 전달하는 자질 선별이 반영된 응답 추출 단계를 포함한다.(a) an encoding step of generating a context encoding vector and a query encoding vector through feature selection and pre-learning information, (b) a bidirectional mutual concentration step of generating a mutually focused result vector between the context and the query of the encoding step, and (c) and a response extraction step in which the feature selection is reflected by selecting the feature through the mutual concentration result vector and transmitting it to the pointer network.

바람직하게 인코딩 단계는 언어 모델에 사용되는 단어의 임베딩 벡터와 은닉벡터를 연결한 벡터에 정규화 벡터와 스칼라 파라미터를 적용하여 사전 학습된 다층 순환 신경망 언어모델에 계층별 추상화 정보를 혼합한다.Preferably, in the encoding step, the abstract information for each layer is mixed in the pre-trained multi-layer recurrent neural network language model by applying the normalization vector and the scalar parameter to the vector connecting the embedding vector and the hidden vector of the word used in the language model.

또한 인코딩 단계는 문장에 나타나는 단어의 단어 임베딩, 음절 임베딩 및 문맥 상황에 따른 자질별 가중치가 부여된 자질 벡터를 순환 신경망에 입력하여 출력된 결과에 사전 학습된 ELMo벡터를 연결하여 문맥 인코딩 벡터와 질의 인코딩 벡터를 생성한다.In addition, the encoding step inputs the word embedding, syllable embedding, and feature vector weighted for each feature according to the contextual situation of the word appearing in the sentence into the recurrent neural network, and connects the pre-learned ELMo vector to the output result to query the context encoding vector Create an encoding vector.

또한 문맥과 질의간의 양방향 상호 집중 단계는 상기 인코딩 단계의 문맥 인코딩 벡터와 질의 인코딩 벡터에서 문맥과 질의간의 관계를 찾아내는 것을 특징으로 한다.In addition, the bidirectional reciprocal concentration step between the context and the query is characterized in that the relation between the context and the query is found in the context encoding vector and the query encoding vector of the encoding step.

그리고 자질 선별이 반영된 응답 추출 단계에서 선별된 자질은 기존 상호 집중 벡터와 연결하여 포인터 네트워크에 전달한다.In addition, the selected features in the response extraction step in which the feature selection is reflected are connected to the existing mutually focused vector and transmitted to the pointer network.

본 발명은 높은 성능의 기계독해 모델을 위해 사전 학습된 문맥 정보를 통해 문장 이해력을 향상시키고 모델 스스로 유용한 자질을 선별할 수 있는 효과가 있다. The present invention has the effect of improving sentence comprehension through pre-trained context information for a high-performance machine reading model and selecting useful qualities by the model itself.

도 1은 본 발명의 실시예에 의한 모델의 구조도이다.
도 2는 본 발명의 실시예에 따라 자질 게이트를 사용한 자질 가중치 변화를 설명하는 도면이다.
도 3은 본 발명의 일 실시예에 따른 자질 선별을 통한 고성능 기계독해 방법의 인코딩층, 상호작용층, 포인팅층을 설명하기 위한 예시도이다.
도 4는 본 발명의 일 실시예에 따른 자질 선별을 통한 고성능 기계독해 방법의 단어의 언어학적 자질이 인코딩층에서 변경되는 과정을 나타낸 예시도이다.1 is a structural diagram of a model according to an embodiment of the present invention.
2 is a view for explaining a feature weight change using feature gates according to an embodiment of the present invention.
3 is an exemplary diagram for explaining an encoding layer, an interaction layer, and a pointing layer of a high-performance machine reading method through feature selection according to an embodiment of the present invention.
4 is an exemplary diagram illustrating a process in which the linguistic quality of a word of a high-performance machine reading method through feature selection is changed in the encoding layer according to an embodiment of the present invention.

본 발명의 일 실실시예에 따른 자질 선별을 통한 고성능 기계독해 방법에서는 높은 성능의 기계독해 모델을 위해 사전 학습된 문맥정보를 통해 문장 이해력을 향상시키고 모델 스스로 유용한 자질을 선별하는 자질 게이터(Feature gate) 방법을 적용한 기계독해 모델 GF-Net(Gated Feature Networks)을 제안한다. 본 발명의 일 실시예에서는 기계독해의 3단계 중 인코딩 단계와 응답 추출 단계에 모델 스스로 유용한 자질을 선별하는 자질 게이트를 적용하여 성능을 향상시키는 방법을 제안한다.In the high-performance machine reading method through feature selection according to an embodiment of the present invention, the feature gate improves sentence comprehension through pre-learned context information for a high-performance machine reading comprehension model and selects useful qualities by the model itself. ) method applied to the machine reading comprehension model GF-Net (Gated Feature Networks) is proposed. An embodiment of the present invention proposes a method for improving performance by applying a feature gate that selects useful features by the model itself in the encoding step and the response extraction step among the three steps of machine reading.

기계독해(MRC; Machine Reading Comprehension)은 컴퓨터가 문서를 이해하여 사용자의 질물에 해당하는 답을 주어진 문서에서 자동으로 찾아주는 인공지능 기술이다. 기존의 기계독해 모델들은 단어들을 의미 공간으로 사상하는 단어 임베딩 자질(word embedding feature)과 품사, 개체명, 구문관계 등의 언어학적 자질(linguistic feature)을 결합한 후, 이를 바탕으로 문서와 질문 사이의 연관성을 계산하여 정답을 추론하는 방법을 주로 사용하였지만, 기존의 기계독해 모델들은 단어 임베딩 자질과 언어학적 자질을 단순 결합하는 방법을 사용하였기 때문에 언어학적 자질들이 갖는 특성이 서로 섞이게 됨으로써, 발생하는 자질 간섭 문제(feature interference problem)로 피할 수 없었다. 즉, 언어학적 자질이 정답을 추론하는 과정에서 서로 역할이 다름에도 불구하고 단순 결합하여 사용됨으로써, 기계독해 성능 향상에 크게 기여하지 못하는 결과를 초래하였다. 예를 들어, 품사는 날짜, 시간, 환율 등과 관련된 질문에서 정답 후보를 숫자(수사에 해당하는 품사를 가진 단어)로 한정하는데 중요한 단서를 제공할 수 있으며, 개체명은 인물, 장소, 기관 등과 관련된 질문에서 정답 후보를 해당 의미 범주로 한정하는데 중요한 단서를 제공할 수 있다. 구문관계는 질문의 대상이 주어인지, 목적어 인지 등의 정보를 바탕으로 정답 후보를 선별하는데 중요한 단서를 제공할 수 있다. 그러므로 정답 추론과정에서 서로 다른 역할을 하는 언어학적 자질을 단순 결합하여 사용하는 것이 아니라 역할에 따라 다르게 작동하도록 선별적으로 사용할 필요가 있다.Machine Reading Comprehension (MRC) is an artificial intelligence technology in which a computer understands a document and automatically finds the answer corresponding to the user's question from a given document. Existing machine reading models combine the word embedding feature that maps words into a semantic space and linguistic features such as part-of-speech, entity name, and syntactic relationship. Although the method of inferring the correct answer by calculating the association was mainly used, the existing machine reading models used a method of simply combining word embedding and linguistic qualities. It was unavoidable due to a feature interference problem. In other words, although the linguistic qualities have different roles in the process of inferring the correct answer, they were simply combined and used, resulting in not contributing significantly to the improvement of machine reading performance. For example, part-of-speech can provide an important clue to limiting correct candidates to numbers (words with parts-of-speech corresponding to rhetoric) in questions related to date, time, exchange rate, etc. It can provide important clues in limiting the correct answer candidates to the corresponding semantic category. The syntactic relationship can provide an important clue in selecting a correct answer candidate based on information such as whether the subject of a question is a subject or an object. Therefore, it is necessary to selectively use the linguistic qualities that play different roles in the answer reasoning process so that they work differently depending on the role, rather than simply combining them.

이하, 첨부된 도면을 참조하여 본 발명의 일 실시예를 상세히 설명한다. Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 제안 모델의 구조도를 보여준다. 제안 모델은 인코딩, 상호 집중, 응답 추출 3단계로 구성된다. 인코딩 단계는 단어 임베딩, 미등록어(Out-of-vocabulary) 처리를 위한 음절 임베딩, 사전 학습된 ELMo벡터 그리고 본 발명에서 제안하는 자질 게이트를 통한 자질 벡터가 입력된다. 상호 집중은 질의, 문맥 간 양방향 상호 집중에 자기 집중을 함께 사용한 방식을 사용한다. 마지막 응답 추출은 자질 선별 기능을 추가한 포인터 네트워크를 사용하여 문맥에서 응답으로 사용될 어구의 시작 위치와 끝 위치를 반환한다.1 shows a structural diagram of the proposed model. The proposed model consists of three steps: encoding, mutual concentration, and response extraction. In the encoding step, word embeddings, syllable embeddings for out-of-vocabulary processing, pre-trained ELMo vectors, and feature vectors through feature gates proposed in the present invention are input. Mutual focus uses a method in which self-focus is used together with two-way mutual focus between queries and contexts. The final response extraction returns the starting and ending positions of the phrases to be used as responses in the context using a pointer network to which a screening function is added.

도 1에 도시된 바와 같이, 본 발명의 일 실실시예에 따른 자질 선별을 통한 고성능 기계독해 방법을 구현하기 위한 시스템은 인코딩층(encoding layer), 상호작용층(interaction layer), 포인팅층(pointing layer)로 구성된다. 인코딩층에서는 문서와 질문에서 단어 임베딩 벡터들과 언어학적 자질 벡터를 생성한다. 그리고 언어학적 자질 벡터를 단어 임베딩 벡터와 결합하는 게이트 메커니즘(gate mechanism)을 통해 언어학적 자질들이 선별적으로 기계독해 과정에 참여하도록 유도한다. 상호작용층에서는 문서의 각 단어와 질문의 각 단어가 서로 어떤 연관성이 있는지 양방향 상호 주의집중 메커니즘(bi-directional co-attention mechanism)과 셀프 주의집중 메커니즘(self-attention mechanism)을 통해서 계산한다. 포인팅층에서는 상호작용 단계에서 생성된 벡터들에 게이트 메커니즘을 통해서 언어학적 자질을 다시 결합한 후, 포인터 네트워크(pointer network)를 이용하여 문서 내의 정답위치를 결정한다. 여기서, 포인터 네트워크는 Recurrent Neural Network(RNN)를 기반으로 어텐션 메커니즘(Attention Mechanism)을 이용하여 입력 시퀀스에 대응되는 위치들의 리스트를 출력하는 RNN의 확장된 딥러닝 모델이다. 어텐션 메커니즘은 딥러닝 모델이 특정 벡터에 주목하게 만들어 모델의 성능을 높이는 기법으로, 기계번역(machine translation)을 위한 sequence-to-sequence 모델(S2S)의 아키텍처에서 소스랭귀지(A,B,C)를 입력으로 해서 벡터로 만드는 앞부분을 인코더, 인코더가 출력한 벡터를 입력으로 해서 타겟랭귀지(W,X,Y,Z)를 출력하는 뒷부분을 디코더라고 했을 때, 소스랭귀지와 타겟랭귀지의 길이가 길어질수록 모델의 성능은 나빠지게 된다. W를 예측할 때, A,B,C 모두에 집중해 보게 되면 정확도가 떨어질 수 있는데, 모델로 하여금 중요한 부분만 집중(attention)하게 만들자가 어텐션 메커니즘이다. 예컨대 독일어 "Ich mochte ein bier"를 영어 "I'd like a beer"로 번역하는 S2S 모델을 만든다고 가정하면, 모델이 네번째 단어인 'beer'를 예측할 때 'bier'에 주목하게 만들고자 한다. 어텐션 매커니즘의 가정은 인코더가 'bier'를 받아서 벡터로 만든 결과(인코더 출력)는 디코더가 'beer'를 예측할 때 쓰는 벡터(디코더 입력)와 유사할 것이다.1, a system for implementing a high-performance machine reading method through feature selection according to an embodiment of the present invention includes an encoding layer, an interaction layer, and a pointing layer. is composed of layers). The encoding layer generates word embedding vectors and linguistic feature vectors in documents and queries. And through the gate mechanism that combines the linguistic feature vector with the word embedding vector, the linguistic features are selectively induced to participate in the machine reading process. In the interaction layer, the relationship between each word in the document and each word in the question is calculated through the bi-directional co-attention mechanism and the self-attention mechanism. In the pointing layer, after recombining the linguistic qualities with the vectors generated in the interaction step through the gate mechanism, the correct position in the document is determined using a pointer network. Here, the pointer network is an extended deep learning model of RNN that outputs a list of positions corresponding to an input sequence using an attention mechanism based on a recurrent neural network (RNN). Attention mechanism is a technique to increase the performance of the model by making the deep learning model pay attention to a specific vector. When the front part that makes a vector with input is the encoder and the rear part that outputs the target language (W, X, Y, Z) with the vector output by the encoder as input is called the decoder, the length of the source language and the target language will become longer. As the number increases, the performance of the model deteriorates. When predicting W, if you try to focus on all A, B, and C, the accuracy may be lowered, but making the model pay attention to only the important part is the attention mechanism. For example, assuming that we create an S2S model that translates German "Ich mochte ein bier" into English "I'd like a beer", we want to make the model pay attention to 'bier' when predicting the fourth word 'beer'. The assumption of the attention mechanism is that the result that the encoder receives 'bier' and turns it into a vector (encoder output) will be similar to the vector (decoder input) that the decoder uses when predicting 'beer'.

본 발명의 일 실시예에 따른 자질 선별을 통한 고성능 기계독해 방법의 핵심 사항은 단어 정보와 언어학적 자질 정보를 선별적으로 결합하는 게이트 메커니즘과 기계독해에 최적화된 인공 신경망의 전체 구조이다. 기계독해 성능을 향상시킴으로써 다양한 질의응답 서비스(지능형 검색, 인공지능 비서 서비스 등)의 사용자 만족도를 높일 수 있고, 이로 인해 관련 시장의 확대를 통한 수익창출을 기대할 수 있다.The core of the high-performance machine reading method through feature selection according to an embodiment of the present invention is a gate mechanism that selectively combines word information and linguistic feature information and the overall structure of an artificial neural network optimized for machine reading. By improving machine reading performance, it is possible to increase user satisfaction of various question and answer services (intelligent search, artificial intelligence assistant service, etc.)

본 발명의 일실시예에 따른 자질 선별을 통한 고성능 기계독해 방법은 자질 선별과 사전학습 정보를 통한 인코딩 단계, 문맥과 질의간의 양방향 상호 집중 단계, 자질 선별이 반영된 응답 추출 단계를 포함한다.A high-performance machine reading method through feature selection according to an embodiment of the present invention includes an encoding step through feature selection and pre-learning information, a bidirectional mutual concentration step between context and query, and a response extraction step reflecting feature selection.

먼저, 높은 성능의 기계독해 모델을 위해 사전 학습된 문맥정보를 통해 문장 이해력을 향상시키고 모델 스스로 유용한 자질을 선별하는 자질 선별과 사전학습 정보를 통한 인코딩 단계를 설명하면 다음과 같다.First, for a high-performance machine reading comprehension model, the character selection process, which improves sentence comprehension through pre-trained context information, and selects useful qualities by the model itself, and the encoding step through pre-learning information will be described as follows.

본 발명에서 자질 선별과 사전학습 정보를 통한 인코딩 단계는 단어 임베딩과 음절 임베딩, 사전학습된 ELMo벡터 그리고 자질 벡터를 사용한다. 음절 임베딩은 특정 단어가 단어 임베딩 행렬에 나타나지 않는 미등록어 문제를 해결하기 위해 사용한다. 음절 임베딩은 단어의 음절들을 합성곱 신경망(Convolutional Neural Network)을 통해 하나의 벡터로 표현한다. ELMo벡터는 양방향 언어 모델(Bidirectional Language Model)을 사전학습 한 후 식 (1)과 같이 생성한다. In the present invention, the encoding step through feature selection and pre-learning information uses word embeddings and syllable embeddings, pre-trained ELMo vectors, and feature vectors. Syllable embedding is used to solve the problem of non-registered words where a specific word does not appear in the word embedding matrix. Syllable embedding expresses the syllables of a word as a vector through a convolutional neural network. The ELMo vector is generated as in Equation (1) after pre-training the Bidirectional Language Model.

[식 1][Equation 1]

는 양방향 LSTM의 입력(

)과 출력(

)을 연결한 벡터이다. LM은 언어모델(Language model)이고, L은 양방향 LSTM 계층 수를 의미한다. LSTM은 장단기 기억 네트워크(Long Short Term Memory networks)로 보통 LSTM으로 불리며, 장기 의존성을 학습할 수 있는 특별한 종류의 순환 신경망으로 Hochreiter와 Schmidhuber(1997)에 의해 소개된 것으로 이에 대한 자세한 설명은 생략하기로 한다.

is the input of the bidirectional LSTM (

) and output (

) is a concatenated vector. LM is a language model, and L is the number of bidirectional LSTM layers. LSTM is a long-short-term memory network, commonly called LSTM, and is a special kind of recurrent neural network that can learn long-term dependencies. It was introduced by Hochreiter and Schmidhuber (1997). do.

식 (1)에서

은 언어 모델에 사용되는 k번째 단어의 임베딩 벡터이고,

는 여러 층으로 이루어진 양방향 순환 신경망의 j번째 층의 은닉벡터이며,

는

와

를 연결(concatenation)한 벡터를 의미한다.

는 여러 층의 순환 신경망 벡터를 층 단위로 정규화하기 위해 사용하는 벡터로 각 층의 가중치는

로 구성된다.

는 효과적인 학습을 위해 사용한 스칼라 파라미터이다.

은

에 정규화 벡터(

), 스칼라 파라미터(

)을 적용하여 사전 학습된 다층 순환 신경망 언어모델에서 계층별 추상화 정보를 혼합한 결과를 의미한다.in formula (1)

is the embedding vector of the kth word used in the language model,

is the hidden vector of the j-th layer of a multi-layer bidirectional recurrent neural network,

Is

Wow

It means a vector concatenated with .

is a vector used to normalize multi-layer recurrent neural network vectors layer by layer, and the weight of each layer is

is composed of

is a scalar parameter used for effective learning.

silver

to the normalization vector(

), a scalar parameter (

) means the result of mixing abstract information for each layer in the pre-trained multi-layer recurrent neural network language model.

즉, 식 (1)은 사전 학습된 다층 순환 신경망 언어 모델에서 계층별 추상화 정보를 적절히 섞는 것을 의미한다. 마지막으로 본 발명에서 제안하는 자질 선별이 적용된 자질 벡터를 사용한다. 자질 선별은 순환 신경망에서 사용하는 게이트 이론을 사용하여 각 자질별로 다른 가중치 가지게 한다. 도 2는 자질 게이트를 사용한 자질 벡터의 변화를 나타낸다.That is, Equation (1) means that the abstract information for each layer is properly mixed in the pre-trained multi-layered recurrent neural network language model. Finally, the feature vector to which the feature selection proposed in the present invention is applied is used. Feature selection uses the gate theory used in recurrent neural networks to have different weights for each feature. Figure 2 shows the change of feature vectors using feature gates.

도 2의 예제는 개체명 자질과 구문 자질을 사용하는 예시이다. 예제의 "Newton"과 "DeMarcus Ware"는 구문 정보보다 개체명 정보가 중요하다. 반면 "was sacked"는 개체명이 아니기 때문에 개체명 가중치를 낮출 필요가 있다. 즉, 도 2와 같이 자질별로 다른 비율을 주기 위한 자질 게이트는 식 (2)를 따른다.The example of FIG. 2 is an example using the entity name feature and the syntax feature. In the example "Newton" and "DeMarcus Ware", object name information is more important than syntax information. On the other hand, since "was sacked" is not an entity name, it is necessary to lower the entity name weight. That is, as shown in FIG. 2, the feature gate for giving different ratios for each feature follows Equation (2).

[식 2][Equation 2]

식 (2)에서

는 문장에 나타나는 단어의 단어 임베딩, 음절 임베딩, ELMo벡터를 순환 신경망에 입력하여 나온 결과의 i번째 벡터를 나타낸다.

는 양방향 순환 신경망을 의미하고,

는 단어 임베딩,

는 음절 임베딩,

는 ELMo벡터를 나타낸다.in formula (2)

represents the i-th vector of the result obtained by inputting word embeddings, syllable embeddings, and ELMo vectors of words appearing in the sentence into the recurrent neural network.

means a bidirectional recurrent neural network,

is the word embedding,

is a syllable embedding,

denotes an ELMo vector.

는 i번째 단어에서 나타나는 j번째 자질의 게이트의 출력(자질의 가중치)로

의 활성화 함수(

)를 사용하여 0~1 사이의 값을 가진다.

는 j번째 자질 게이트에서 사용하는 가중치이고,

는 문맥이 반영된

와 자질 벡터

를 연결(concatenation)한 벡터를 의미한다. 게이트를 계산한 후

를 통해 각 자질 벡터에 게이트를 곱해 문맥 상황에 맞게 자질의 비중을 조절하게 된다.

는 게이트를 통해 얻은 자질의 가중치와 해당 자질 벡터(

)를 곱한 값이다.

is the output (feature weight) of the j-th feature gate appearing in the i-th word.

of the activation function (

) to have a value between 0 and 1.

is the weight used in the j-th feature gate,

is the context

With Qualities Vector

It means a vector concatenated with . After counting the gate

Each feature vector is multiplied by a gate to adjust the feature weight according to the context.

is the weight of the feature obtained through the gate and the corresponding feature vector (

) multiplied by

마지막으로 단어 임베딩

, 음절 임베딩

, ELMo벡터

, 자질 벡터

를 순환 신경망에 입력하여 출력된 결과에 ELMo벡터를 연결한

를 인코딩 벡터로 사용한다.

는 순환 신경망을 이용한 인코딩 벡터이다. 위 작업을 문맥과 질의에 각각 적용하여 문맥 인코딩 벡터

와 질의 인코딩 벡터

를 생성한다.Finally word embedding

, syllable embeddings

, ELMo vector

, the quality vector

is input to the recurrent neural network and the ELMo vector is connected to the output result.

is used as the encoding vector.

is an encoding vector using a recurrent neural network. By applying the above operation to the context and query respectively, the context encoding vector

and query encoding vector

to create

다음으로 문맥과 질의간의 양방향 상호 집중을 설명하면 다음과 같다.Next, the two-way mutual concentration between context and query will be described as follows.

본 발명은 문맥과 질의간의 관계를 찾아내기 위해 양방향 상호 집중에 자기 집중을 함께 사용한다. 양방향 상호 집중은 식 (3)과 같이 계산된다.The present invention uses self-focus together with two-way mutual focus to find the relationship between context and query. The two-way mutual concentration is calculated as Equation (3).

[식 3][Equation 3]

식 (3)에서

는

의

간 주의집중 점수 (attention score)이고,

는

를 구하는 함수이다.

는 문맥 i번째 단어의 인코딩 벡터,

는 질의 j번째 단어의 인코딩 벡터,

는 attention score 계산 시 사용하는 가중치이고,

는

와

의 요소별 곱셈(elementwise multiplication)이다.

는 문맥에서 질의 방향 주의집중 가중치이다.

는 attention score(

)를 이용해 문맥에서 질의 방향 주의 집중 가중치를 계산하는 softmax 함수이다.

는 질의 벡터가 문맥에 얼마만큼 중요한지 계산한 주의집중 벡터이고,

는 문맥 벡터가 질의에 얼마만큼 중요한지 계산한 주의집중 벡터이다. 식 (3)에서는 문맥 벡터와 질의 벡터간의 상호 집중 가중치를 구하고 문맥 벡터와 결합하여 생성된

는 질의의 단어 벡터가 문맥의 어떤 단어에 중요하게 작용하는지를 찾아낸다.

에서

는 함수를 의미하고,

는 문맥 단어들의 인코딩 벡터를 의미한다. 즉,

는 문맥 벡터와 질의 벡터간의 상호 집중 가중치를 구하고 문맥 벡터와 결합하여 생성된 결과이다. 자기 집중은 식 (4)와 같이 계산된다.in equation (3)

Is

of

is the attention score of the liver,

Is

is a function to find

is the encoding vector of the context i-th word,

is the encoding vector of the j-th word of the query,

is the weight used to calculate the attention score,

Is

Wow

is elementwise multiplication of

is the query direction attention weight in context.

is the attention score(

) is a softmax function that calculates the query direction attention weight in context.

is the attention vector calculating how important the query vector is to the context,

is the attention vector that calculates how important the context vector is to the query. In Equation (3), the mutually concentrated weight between the context vector and the query vector is obtained, and the generated weight is combined with the context vector.

finds which words in the context are important for the word vector of the query.

at

means a function,

denotes an encoding vector of context words. in other words,

is the result generated by finding the mutually concentrated weight between the context vector and the query vector and combining it with the context vector. The magnetic concentration is calculated as in equation (4).

[식 4][Equation 4]

식 (4)에서

는 문맥 질의 간 상호 주의 집중 벡터(F)와 자가 주의 집중 벡터(c) 입력을 입력 받는 양방향 순환 신경망 출력 벡터이다.

는 상호 주의집중 계층의 출력이고,

는 자기 주의집중 계층의 출력이다.

는 t번째 단어와 j번째 단어 간 자가 주의 집중 가중치 점수이다.

는 상호 집중이 반영된 벡터,

는 t번째 단어가 i번째 단어와의 유사도를 나타내는 주의집중 가중치이다. 즉, 현재 단어가 다른 단어들과의 유사도 계산을 통해 관련된 중요 단어를 파악하고, 주의집중 풀링(attention pooling)을 통해 문장 전체의 단어를 반영해주는

를 생성한다. 생성된 자기 주의집중 벡터가 도움이 되지 않을 수도 있기 때문에 Residual layer를 적용하여 상황에 맞게 적용되도록 한다. Residual layer는

로 계산된다. 여기서,

을 수행하는 계층이 Residual layer, Residual layer 출력이

이다.in equation (4)

is the bidirectional recurrent neural network output vector that receives the mutual attention vector (F) and self attention vector (c) inputs between context queries.

is the output of the reciprocal attention layer,

is the output of the self-attention layer.

is the self-attention weighted score between the t-th and j-th words.

is the vector reflecting the mutual concentration,

is the attention weight indicating the similarity of the t-th word to the i-th word. In other words, it identifies important words related to the current word through similarity calculation with other words, and reflects the words of the entire sentence through attention pooling.

to create Since the generated self-attention vector may not be helpful, apply a residual layer to suit the situation. The residual layer is

is calculated as here,

The layer that performs

to be.

자질 선별이 반영된 응답 추출을 설명하면 다음과 같다.Response extraction reflecting quality selection is explained as follows.

본 발명에서는 응답 추출을 위해 자질 선별 기능이 추가된 포인터 네트워크를 사용한다. 응답 추출에서의 자질 선별은 식 (5)와 같이 나타낸다.In the present invention, a pointer network to which a feature selection function is added is used for response extraction. Qualification selection in response extraction is expressed as Equation (5).

[식 5][Equation 5]

식 5에서

는 Residual layer의 출력

를 입력 받는 양방향 순환 신경망의 출력이고,

은 양방향 순환 신경망을 의미하며,

는 Residual layer 출력이다.

는 i번째 단어에서 나타나는 j번째 자질의 게이트,

는 j번째 자질 게이트에서 사용하는 가중치,

는 문맥이 반영된

와 자질 벡터

를 연결(concatenation)한 벡터,

는 게이트를 통해 얻은 자질의 가중치와 해당 자질 벡터(

)를 곱한 값,

는 Residual layer 출력(

)과 자질 벡터(

)를 연결한 값,

는 자질 벡터를 의미한다.in Equation 5

is the output of the residual layer

is the output of a bidirectional recurrent neural network that receives

stands for a bidirectional recurrent neural network,

is the residual layer output.

is the gate of the j-th feature appearing in the i-th word,

is the weight used in the j-th feature gate,

is the context

with qualities vector

A vector concatenated by

) multiplied by

is the residual layer output (

) and the feature vector (

) concatenated,

is the feature vector.

인코딩 단계와 다르게 응답 추출에서는 상호 집중 결과 벡터를 통해 자질을 선별한다. 선별된 자질은 기존 상호 집중 벡터와 연결하여 포인터 네트워크에 전달한다. 포인터 네트워크는 식 (6)과 같다.Unlike the encoding step, in response extraction, features are selected through a mutually focused result vector. The selected features are connected to the existing mutually focused vector and transferred to the pointer network. The pointer network is as Equation (6).

[식 6][Equation 6]

식 (6)에서,

는 포인터 네트워크에서의 attention score,

는 포인터 네트워크의 attention score 계산을 위한 가중치,

는 자질이 적용된 인코딩 벡터와 가중치 곱,

는 현 디코딩 상태 벡터와 가중치 곱,

은 포인터를 위한 디코더 상태 벡터를 나타내며 0번째 벡터는 질의의 인코딩 벡터를 주의집중 풀링하여 생성한다.

는 포인터 네트워크에서 계산된 인코더 위치 분포를 나타내며,

은 포인터 네트워크의 출력으로

은 시작 위치,

은 끝 위치를 의미한다.

은 다음 디코더 스텝 입력,

는 자질이 적용된 인코딩 벡터를 의미한다. In formula (6),

is the attention score in the pointer network,

is the weight for calculating the attention score of the pointer network,

is the weighted product of the encoding vector to which the feature is applied,

is the current decoding state vector and weighted product,

denotes the decoder state vector for the pointer, and the 0th vector is created by attention pooling the query encoding vector.

denotes the encoder position distribution computed in the pointer network,

is the output of the pointer network

is the starting position,

indicates the end position.

is the next decoder step input,

is the encoding vector to which the feature is applied.

본 발명에서는 실험을 위해 SQuAD(Stanford Question Answering Dataset)를 사용한다. SQuAD는 100,000개 이상의 질의-정답쌍이 있으며 이 중 87,599개를 학습 데이터, 10,570개를 개발 평가 데이터로 공개하였고 공식 평가 데이터는 공개되지 않았다. 실험은 공개된 학습 데이터를 통해 학습하고 개발 평가 데이터로 파라미터를 튜닝했으며 SQuAD에 제출하여 성능을 측정하였다.In the present invention, SQuAD (Stanford Question Answering Dataset) is used for the experiment. SQuAD has more than 100,000 question-answer pairs, of which 87,599 are learning data and 10,570 are development evaluation data, and official evaluation data has not been released. The experiment learned through the public learning data, tuned the parameters with the development evaluation data, and submitted it to SQuAD to measure the performance.

본 발명에서는 모델의 성능 척도로 Exact Match와 F1-score를 사용한다. Exact Match는 모델이 예측한 응답과 정답이 정확히 일치하면 1 아니면 0을 주는 방식이고 F1-score는 단어 단위로 측정한 F1-score이다. 표 1은 SQuAD에 제출된 모델 중 발명이 공개되어있고 발명에 언급된 성능과 비교 분석한 결과를 나타낸다.In the present invention, Exact Match and F1-score are used as performance measures of the model. Exact Match is a method that gives 1 or 0 if the correct answer exactly matches the response predicted by the model, and F1-score is the F1-score measured in words. Table 1 shows the results of comparative analysis with the disclosed invention among the models submitted to SQuAD and the performance mentioned in the invention.

[표 1] [Table 1]

표 1에서 single은 단독 모델의 성능이며 ensemble은 같은 구조의 모델을 여러개 사용하여 측정한 성능이다. 본 발명에서 제안한 GF-Net이 다른 비교 모델들에 비해 높은 성능을 보임을 알 수 있다.In Table 1, single is the performance of a single model, and ensemble is the performance measured using multiple models of the same structure. It can be seen that the GF-Net proposed in the present invention shows higher performance than other comparative models.

본 발명에서는 자질 게이트를 사용하는 자질 선별을 통해 입력된 자질의 중요도를 조절하여 성능을 향상시키는 기계독해 모델을 제안하였다. SQuAD를 사용한 실험 결과 기존의 높은 성능을 보인 비교 모델보다 향상된 성능을 보였다. 본 발명에서 제안한 자질 선별 방법은 새로운 자질의 유효성이 검증되지 않아도 모델이 게이트를 통해 가중치를 조절하기에 자질 추가에 효율적일 것으로 생각된다. 향후 연구로 본 발명에서 제안한 자질 선별 방법을 다른 방법을 통하여 개선할 예정이다.In the present invention, a machine reading comprehension model is proposed to improve performance by adjusting the importance of input features through feature selection using feature gates. As a result of the experiment using SQuAD, it showed improved performance compared to the existing high-performance comparative model. The feature selection method proposed in the present invention is considered to be effective in adding features because the model adjusts the weight through the gate even if the validity of the new feature is not verified. As a future study, the quality screening method proposed in the present invention will be improved through other methods.

도 3은 본 발명의 일 실시예에 따른 자질 선별을 통한 고성능 기계독해 방법의 인코딩층, 상호작용층, 포인팅층을 설명하기 위한 예시도이다. 도 3에 도시된 바와 같이, 인코딩층에서는 단어 임베딩 벡터들(GloVe, Character Embedding, ELMo)과 언어학적 자질벡터(Feature)가 자질 게이트(Feature Gate)를 통해 결합되어 BiRNN(Bidirectional Recurrent Neural Network)의 입력으로 사용됨으로써, 문서(Context)와 질문(Question)이 인코딩된다. 상호작용층에서는 양방향 상호 주의집중 메커니즘(Bi-directional co-attention)과 셀프(또는 자기) 주의집중 메커니즘(Self-attention)을 바탕으로 문서 내 단어들과 질문 내 단어들 사이의 연관 관계를 계산한다. 포인팅층에서는 상호작용 단계에서 생성된 벡터들에 언어학적 자질을 자질 게이트로 결합한 후, 포인터 네트워크를 통해 정답의 위치를 결정한다. 3 is an exemplary diagram for explaining an encoding layer, an interaction layer, and a pointing layer of a high-performance machine reading method through feature selection according to an embodiment of the present invention. 3, in the encoding layer, word embedding vectors (GloVe, Character Embedding, ELMo) and a linguistic feature vector (Feature) are combined through a feature gate to form a Bidirectional Recurrent Neural Network (BiRNN). By being used as input, the document (Context) and the question (Question) are encoded. In the interaction layer, the association between the words in the document and the words in the question is calculated based on the bi-directional co-attention and self-attention mechanisms. . In the pointing layer, after combining the linguistic features with the feature gates to the vectors generated in the interaction step, the location of the correct answer is determined through the pointer network.

도 4는 본 발명의 일 실시예에 따른 자질 선별을 통한 고성능 기계독해 방법의 단어의 언어학적 자질이 인코딩층에서 변경되는 과정을 나타낸 예시도이다. 도 4에 도시된 바와 같이, "Who was sacked as the first half clock expired?" 라는 질문이 입력되었을 때, 정답을 포함하고 있는 문서 내의 'Newton'이라는 단어의 언어학적 자질이 인코딩층에서 변경되는 과정을 보여준다. 도 4에 보듯이 'Newton'의 언어학적 자질인 품사 자질 임베딩 벡터, 의존관계 자질 임베딩 벡터, 개체명 자질 임베딩 벡터가 자질 게이트를 통과하면서 정답 선별에 기여하는 정도에 따라 크기가 변경된다. 도 4는 개체명 자질 게이트(1.0), 의존관계 자질 게이트(0.8), 품사 자질 게이트(0.5) 순으로 기여도가 학습 과정을 통해 자동 부여된 모습을 보여준다.4 is an exemplary diagram illustrating a process in which the linguistic quality of a word of a high-performance machine reading method through feature selection is changed in the encoding layer according to an embodiment of the present invention. As shown in Figure 4, "Who was sacked as the first half clock expired?" It shows the process in which the linguistic quality of the word 'Newton' in the document containing the correct answer is changed in the encoding layer when a question is input. As shown in Figure 4, the size of the linguistic features of 'Newton', the part-of-speech feature embedding vector, dependency feature embedding vector, and entity name feature embedding vector, is changed according to the degree to which it contributes to correct answer selection while passing through the feature gate. 4 shows a state in which the contribution is automatically assigned through the learning process in the order of the entity name feature gate (1.0), dependency relationship feature gate (0.8), and part-of-speech feature gate (0.5).

Claims

(a) an encoding step of generating a context encoding vector and a query encoding vector through feature selection through a feature gate that adjusts the reflection degree of features by calculating weights between features;
(b) a bidirectional mutual concentration step of generating a mutually focused result vector between the context and the query of the encoding step;
(c) a response extraction step of transferring the mutual concentration result vector to a pointer network whose weight is set according to the feature gate;
High-performance machine reading method through quality screening, characterized in that it comprises.

According to claim 1,
In the encoding step, a normalization vector and a scalar parameter are applied to the vector connecting the embedding vector and the hidden vector of the word used in the language model, and the layer-by-layer abstraction information is mixed in the pre-trained multilayer recurrent neural network language model. A high-performance machine reading method through screening.

delete

According to claim 1,
The high-performance machine reading method through feature selection, characterized in that the bidirectional mutual concentration step between the context and the query finds a relationship between the context and the query in the context encoding vector and the query encoding vector of the encoding step.

According to claim 1,
A high-performance machine reading method through feature selection, characterized in that the features selected in the response extraction step in which the feature selection is reflected are connected to the existing mutually focused vector and transferred to a pointer network.