KR20200063281A

KR20200063281A - Apparatus for generating Neural Machine Translation model and method thereof

Info

Publication number: KR20200063281A
Application number: KR1020180141427A
Authority: KR
Inventors: 신종훈
Original assignee: 한국전자통신연구원
Priority date: 2018-11-16
Filing date: 2018-11-16
Publication date: 2020-06-05

Abstract

A translation method based on neural machine translation model including an encoder and a decoder comprises: a step of separating input tokens of a sub-word unit from an input sentence to generate vocabulary sequence column including number information given to each input token according to an order to vocabularies included in the input sentences by a pre-processor; a step of generating a location embedding vector of a vector type indicating location information of each input token using the vocabulary sequence column information by a location embedding vector generator; a step of combining the location embedding vector with an attention score calculated based on a concealment state value of a top concealment layer input from the encoder and a current concealment state value input from the decoder by an attention layer block; a step of generating a weight average value used to estimate a band token with respect to each input token in the decoder using the attention score combined with the location embedding vector and the concealment state value of the top concealment layer by token of the encoder by the weight average value calculator; and a step of predicting the band token using the weight average value input from the weight average value calculator by the decoder.

Description

Neural network automatic translation device and its method {Apparatus for generating Neural Machine Translation model and method thereof}

본 발명은 신경망 자동 번역 기술에 관한 것이다.The present invention relates to a neural network automatic translation technology.

인코더-디코더(encoder-decoder) 메커니즘 기반 신경망 자동번역(Neural Machine Translation, NMT) 모델은 장단기 메모리기반 재귀적 인공신경망(Recurrent Neural Network with Long Short Term Memory, RNN-LSTM) 또는 합성곱 인공 신경망(Convolutional Neural Network)를 사용하여, 입력문에 대한 출력문(번역문)을 생성한다.The neural network translation (NMT) model based on the encoder-decoder mechanism is a recurrent neural network with long short term memory (RNN-LSTM) or a convolutional artificial neural network (Convolutional). Neural Network) to generate output text (translation text) for input text.

기존의 인코더-디코더 메커니즘을 사용하는 신경망 자동번역 모델에서는, 인코더가 입력문의 입력 어휘들(입력 토큰들)을 N-차원의 단일 벡터로 변환하여 출력하고, 디코더가 상기 인코더로부터 출력된 단일 벡터와 현재 생성된 대역 어휘(대역 토큰)를 토대로 다음 대역 어휘(다음 대역 토큰)를 예측하는 방법을 재귀적으로 수행한다. In the neural network automatic translation model using the existing encoder-decoder mechanism, the encoder converts the input vocabulary (input tokens) of the input statement into an N-dimensional single vector and outputs it, and the decoder compares the single vector output from the encoder. The method of predicting the next band vocabulary (next band token) based on the currently generated band vocabulary (band token) is performed recursively.

한편, 인코더가 입력문의 입력 어휘들(입력 토큰들)을 N-차원의 단일 벡터로 변환하기 위해, 입력문의 입력 어휘들은 상호 연관 관계를 고려하지 않은 상태로 단순하게 부-어휘(sub-word) 단위로 분리된 입력 어휘들(입력 토큰들)을 생성하는 전처리 과정을 거친 후, 상기 인코더로 입력된다. On the other hand, in order for the encoder to convert the input vocabularies (input tokens) of the input statement into a single N-dimensional vector, the input vocabularies of the input statement are simply sub-words without considering the correlation. After going through a pre-processing process to generate input vocabularies (input tokens) separated by a unit, it is input to the encoder.

따라서, 인코더는 상호 연관 관계를 명시하는 어떠한 정보도 포함하지 않는 입력 어휘들(입력 토큰들)로부터 각 단일 벡터를 생성하여 이를 디코더로 전달하기 때문에, 디코더는 입력 어휘들(입력 토큰들) 사이의 상호 연관 관계를 내포하는 어휘 의미를 올바르게 전달받지 못한다.Therefore, since the encoder generates each single vector from input vocabularies (input tokens) that do not contain any information specifying the correlation, and passes them to the decoder, the decoder is between input vocabularies (input tokens). The meaning of the vocabulary that implies interrelationships is not received correctly.

이로 인해, 디코더는 상기 입력 어휘들(입력 토큰들) 각각에 대한 대역 어휘(대역 토큰)을 올바르게 예측할 수 없는 문제가 있다. 이는 상기 대역 토큰들로 구성되는 대역문의 번역품질을 저하시키는 중요한 요인으로 작용한다.For this reason, the decoder cannot correctly predict the band vocabulary (band token) for each of the input vocabularies (input tokens). This serves as an important factor that degrades the translation quality of the band text composed of the band tokens.

본 발명은, 입력문을 구성하는 입력 어휘들(입력 토큰들) 사이의 상호 연관 관계가 반영된 조건 변수를 생성하고, 상기 조건 변수를 인코더의 출력과 함께 디코더로 전달하여, 각 입력 어휘(각 입력 토큰)에 대한 대역 토큰(또는 대역 어휘)의 올바른 예측을 수행할 수 있는 신경망 자동 번역 모델 생성 장치 및 그 생성 방법을 제공하는 데 그 목적이 있다.The present invention generates a condition variable reflecting a correlation between input vocabularies (input tokens) constituting an input statement, and transmits the condition variable to a decoder together with the output of the encoder, so that each input vocabulary (each input An object of the present invention is to provide an apparatus for generating a neural network automatic translation model and a method for generating a correct prediction of a band token (or band vocabulary) for a token.

상술한 목적을 달성하기 위한 본 발명의 일면에 따른 인코더 및 디코더를 포함하는 신경망 자동 번역 모델 기반의 번역 방법은, 전처리부에서, 입력문을 부-어휘(sub-word) 단위의 입력 토큰들로 분리하고, 상기 입력문에 포함된 어휘들의 어휘 순서에 따라 각 입력 토큰에 부여된 번호정보를 포함하는 어휘 순서 열 정보를 생성하는 단계; 위치 임베딩 벡터 생성부에서, 상기 어휘 순서열 정보를 이용하여, 각 입력 토큰의 위치 정보를 나타내는 벡터 형식의 위치 임베딩 벡터를 생성하는 단계; 주의 집중 계층 블록에서, 상기 인코더로부터 입력되는 각 토큰 별 최상위 은닉층의 은닉 상태 값과 상기 디코더로부터 입력되는 현재 은닉 상태 값을 기반으로 계산된 주의 집중 스코어에 상기 위치 임베딩 벡터를 결합하는 단계; 및 가중치 평균값 계산부에서, 상기 위치 임베딩 벡터가 결합된 상기 주의 집중 스코어와 상기 인코더의 각 토큰 별 최상위 은닉층의 은닉 상태 값을 이용하여 상기 디코더에서 각 입력 토큰에 대한 대역 토큰을 예측하는데 활용되는 가중치 평균 값을 생성하는 단계; 및 상기 디코더에서, 상기 가중치 평균값 계산부로부터 입력된 상기 가중치 평균 값을 이용하여 상기 대역 토큰을 예측하는 단계를 포함한다.A neural network automatic translation model-based translation method including an encoder and a decoder according to an aspect of the present invention for achieving the above-mentioned object, in a pre-processing unit, inputs input into sub-word unit input tokens Separating and generating lexical order sequence information including number information assigned to each input token according to the lexical order of the words included in the input statement; Generating, by the location embedding vector generation unit, a location embedding vector in vector format representing location information of each input token using the lexical sequence information; In the attention layer block, combining the position embedding vector with the attention score calculated based on the hidden state value of the highest hidden layer for each token input from the encoder and the current hidden state value input from the decoder; And in the weighted average value calculation unit, Generating a weighted average value used for predicting a band token for each input token in the decoder by using the attention score combined with the location embedding vector and the hidden state value of the highest hidden layer for each token of the encoder; And in the decoder, predicting the band token using the weighted average value input from the weighted average value calculating unit.

본 발명의 인코더-디코더 메커니즘 기반 신경망 자동번역 모델에 따르면, 디코더가 인코더의 출력과 함께, 상기 인코더로 입력되는 입력문을 구성하는 입력 어휘들(입력 토큰들) 사이의 상호 연관 관계가 반영된 조건 변수를 전달받음으로써, 상기 디코더는 상기 인코더의 출력과 함께 전달받은 상기 조건 변수를 기반으로 상기 입력문을 구성하는 각 입력 어휘에 대한 대역 어휘(대역 토큰)을 올바르게 예측함으로써, 번역품질을 개선할 수 있다.According to the neural network auto-translation model based on the encoder-decoder mechanism of the present invention, a conditional variable reflecting a correlation between input vocabularies (input tokens) constituting an input sentence input to the encoder by the decoder along with the output of the encoder By receiving the, the decoder can improve the translation quality by correctly predicting the band vocabulary (band token) for each input vocabulary constituting the input statement based on the condition variable received along with the output of the encoder. have.

도 1은 본 발명에 적용되는 주의 집중 메커니즘을 설명하기 위한 신경망 자동 번역 모델의 구성도이다.
도 2는 본 발명의 일 실시 예에 따른 인코더-디코더 기반의 신경망 자동 번역 모델의 구성도이다.
도 3은 본 발명의 일 실시 예에 따른 신경망 자동 번역 모델이 탑재된 장치의 블록도이다.
도 4는 본 발명의 일 실시 예에 따른 신경망 자동 번역 모델의 번역 방법을 나타내는 흐름도이다.1 is a block diagram of a neural network automatic translation model for explaining the attention mechanism applied to the present invention.
2 is a block diagram of an automatic network translation model based on an encoder-decoder according to an embodiment of the present invention.
3 is a block diagram of a device equipped with a neural network automatic translation model according to an embodiment of the present invention.
4 is a flowchart illustrating a translation method of a neural network automatic translation model according to an embodiment of the present invention.

본 발명의 다양한 실시예는 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들이 도면에 예시되고 관련된 상세한 설명이 기재되어 있다. 그러나, 이는 본 발명의 다양한 실시예를 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 다양한 실시예의 사상 및 기술 범위에 포함되는 모든 변경 및/또는 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 도면의 설명과 관련하여, 유사한 구성요소에 대해서는 유사한 참조 부호가 사용되었다.Various embodiments of the present invention may have various modifications and various embodiments, and specific embodiments are illustrated in the drawings and related detailed descriptions are described. However, this is not intended to limit the various embodiments of the present invention to specific embodiments, and should be understood to include all modifications and/or equivalents or substitutes included in the spirit and scope of the various embodiments of the present invention. In connection with the description of the drawings, similar reference numerals have been used for similar elements.

본 발명의 다양한 실시예에서 사용될 수 있는“포함한다” 또는 “포함할 수 있다” 등의 표현은 개시(disclosure)된 해당 기능, 동작 또는 구성요소 등의 존재를 가리키며, 추가적인 하나 이상의 기능, 동작 또는 구성요소 등을 제한하지 않는다. 또한, 본 발명의 다양한 실시예에서, "포함하다" 또는 "가지다" 등의 용어는 명세서에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Expressions such as “include” or “can include” that may be used in various embodiments of the present invention indicate the existence of a corresponding function, operation, or component disclosed, and additional one or more functions, operations, or The components and the like are not limited. In addition, in various embodiments of the present invention, terms such as “include” or “have” are intended to designate the existence of features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, but one Or other features or numbers, steps, actions, components, parts, or combinations thereof, should not be excluded in advance.

이하, 도면을 참조하여, 본 발명의 실시 예에 대한 설명을 상세히 기술한다. 그에 앞서, 본 발명의 이해를 돕기 위해, 인코더-디코더(encoder-decoder) 메커니즘과 본 발명의 인코더-디코더 메커니즘 기반의 신경 자동번역 모델에 적용되는 주의 집중 메커니즘(Attention Mechanism: Bahnadau et al.,'15, Luong et al., 15)을 상세히 설명하기로 한다.Hereinafter, with reference to the drawings, a description of embodiments of the present invention will be described in detail. Prior to this, in order to help the understanding of the present invention, an attention mechanism (Attention Mechanism: Bahnadau et al.,) applied to an encoder-decoder mechanism and a neural automatic translation model based on the encoder-decoder mechanism of the present invention. 15, Luong et al., 15) will be described in detail.

인코더-디코더(encoder-decoder) 메커니즘Encoder-decoder mechanism

본 발명의 신경망 자동 번역 모델(또는 신경망 기계 번역 모델)은 인코더(encoder)와 디코더(decoder)를 포함하는 메커니즘을 기반으로 자동 번역을 수행한다. The neural network automatic translation model (or neural network machine translation model) of the present invention performs automatic translation based on a mechanism including an encoder and a decoder.

신경망 자동 번역 모델에서, 인코더는 원문 언어로 구성된 입력문을 단일 또는 다중의 N-차원 벡터로 압축(또는 추상화)하고, 디코더는 압축된(추상화된) 단일 또는 다중의 N-차원 벡터로부터 대역 언어로 구성된 출력문(번역결과)을 생성한다. In the neural network automatic translation model, the encoder compresses (or abstracts) the input text composed of the original language into a single or multiple N-dimensional vectors, and the decoder performs a band language from the compressed (abstracted) single or multiple N-dimensional vectors. Generates output statement (translation result) composed of.

잘 알려진 바와 같이, 이 메커니즘을 구현하기 위한 신경망 네트워크 구조 및 학습 방법 등에 대해서 많은 연구들이 진행되어왔고, 현재는 구글(Google), 네이버 등의 기업에서 제공하는 서비스의 핵심 기술로 알려져 있다. As is well known, many studies have been conducted on the neural network structure and learning method for implementing this mechanism, and it is now known as a core technology of services provided by companies such as Google and Naver.

인코더-디코더 메커니즘 기반의 신경망 자동 번역 모델의 특성은 인코더가 원문 언어로 구성된 입력문을 압축하여 N-차원의 단일 벡터를 생성한 뒤, 디코더가 인코더의 출력과, 현재 생성된 대역 어휘 정보를 토대로 다음 어휘를 예측하는 방법을 재귀적으로 수행한다. The characteristics of the neural network automatic translation model based on the encoder-decoder mechanism is that the encoder compresses the input text composed of the original language to generate a single N-dimensional vector, and then the decoder builds on the encoder output and the currently generated band vocabulary information. Perform recursively to predict the next vocabulary.

인코더가 원문 언어의 입력문을 압축한다는 것은, 입력된 원문 언어의 의미적 특성 및 통사적 특성을 모두 포함하는 일련의 문장 벡터로 생성한다는 의미로 해석할 수 있다. It can be interpreted that the encoder compresses the input text of the original language as a series of sentence vectors including both the semantic and syntactic characteristics of the input original language.

디코더는 대역 어휘로 구성된 임의의 문장을 생성하는 일종의 언어 모델(Language Model)의 역할을 수행한다. 즉, 디코더는 인코더의 출력으로 나타난 원문 언어의 추상 정보를 조건 변수로 받아, 디코더가 학습한 대역 언어의 생성 방법에 기초하여 대역 어휘들을 예측한다. The decoder acts as a kind of language model that generates random sentences composed of band vocabulary. That is, the decoder receives abstract information of the original language represented by the output of the encoder as a condition variable, and predicts the band vocabularies based on the method of generating the band language learned by the decoder.

디코더에서 수행되는 대역 어휘들의 예측 과정은 상기 대역 어휘들을 포함하는 대역문의 완료를 알리는 문장 종료 예약어가 나타날 때까지 반복적으로 수행되며, 그 반복 수행 과정에서 생성된 후보 대역문들 중에서 가장 자연스럽다고 판단되는 대역문을 선택적으로 출력하는 과정이다.The process of predicting the band vocabularies performed by the decoder is repeatedly performed until a reserved sentence ending word indicating the completion of the band sentences including the band vocabularies appears, and is determined to be the most natural among the candidate band sentences generated during the iteration process. This is a process of selectively outputting a band statement.

이러한 인코더-디코더 메커니즘은 대역문을 구성하는 대역 어휘(대역 토큰)가 입력문(또는 원문)을 구성하는 입력 어휘(입력 토큰)에 의해 생성된 것인지 알 수 없기 때문에, 입력문(또는 원문)을 구성하는 입력 어휘와 대역문을 구성하는 대역 어휘 사이에 직접적인 연관관계가 명시적으로 존재하지 않는 특성을 갖는다.Since this encoder-decoder mechanism does not know whether the band vocabulary (band token) constituting the band text is generated by the input vocabulary (input token) constituting the input text (or the original text), the input text (or the original text) is not known. It has a characteristic that there is no direct relationship between the input vocabulary constituting and the band vocabulary constituting the band sentence.

이러한 특성을 갖는 신경망 기반 기계 번역 시스템을 구성하기 위해, 원문 언어의 문장과 대역 언어로 구성된 문장 쌍의 집합(이중 언어 말뭉치)을 받아 이를 학습하는 신경망 학습 단계가 존재하며, 상기 학습 단계에 의해 생성된 신경망 모델을 토대로 기계 번역을 수행하는 예측 단계가 존재한다. In order to construct a neural network-based machine translation system having such characteristics, there is a neural network learning step of receiving a set of sentences pairs (double language corpus) composed of sentences in a original language and a band language, and learning them, and generated by the learning step There is a predictive step of performing machine translation based on the neural network model.

신경망 학습을 위해, 원문 언어로 구성된 입력 어휘를 특정한 기준으로 잘라 이를 1차원의 벡터로 각각 변환하여 2차원의 임베딩 벡터로 표현하게 된다. 이 과정에서, 학습 단계에 사용되는 입력문을 구성하는 입력 어휘(입력 토큰)과 대역문을 구성하는 대역 어휘(대역 토큰)가 고정된다. 즉, 학습에 사용되는 원문 언어의 말뭉치를 구성하는 각각의 고유한 어휘가 1000개가 존재하고, 1개의 어휘를 500개의 부동 소수점으로 표현된 1차원 벡터로 표현하는 경우, 1000 x 500의 2차원 벡터로 바꾸게 되는데, 이 2차원 벡터가 바로 임베딩 벡터가 된다. For neural network learning, the input vocabulary composed of the original language is cut to a specific standard, and each converted into a one-dimensional vector, which is expressed as a two-dimensional embedding vector. In this process, the input vocabulary (input token) constituting the input text used in the learning step and the band vocabulary (band token) constituting the band text are fixed. That is, when there are 1000 unique vocabularies each constituting the corpus of the original language used for learning, and when one vocabulary is expressed as a one-dimensional vector represented by 500 floating-point numbers, a two-dimensional vector of 1000 x 500 This 2D vector is the embedding vector.

대역 언어의 말뭉치를 구성하는 고유 어휘 토큰의 개수가 500개가 되는 경우, 역시 동일한 길이의 1차원 벡터로 표현하게 되면, 500 x 500의 2차원 벡터로 변환될 수 있다. When the number of unique vocabulary tokens constituting the corpus of the band language is 500, when expressed as a one-dimensional vector of the same length, it can be converted into a 500-500 two-dimensional vector.

신경망 학습은 이렇게 입력된 입력문의 원문 어휘와 대역문의 대역 어휘를 각각의 고유 어휘 토큰에 맞는 임베딩 벡터로 변환된 것을 입력 정보로 처리를 수행하게 된다.In the neural network learning, the original vocabulary of the input text and the band vocabulary of the band text converted into an embedding vector suitable for each unique vocabulary token are processed as input information.

상술한 바와 같이, 특정한 고유 어휘 토큰을 임베딩 벡터로 표현하면서, 자주 나타나지 않는 어휘나 학습 시 사용한 이중 언어 말뭉치에서 한 번도 출현하지 않은 어휘가 나타나면, 이를 일괄적으로 예약된 미등록어 토큰으로 간주하고, 미등록어를 위한 단일 임베딩 벡터로 치환하게 된다. 즉, 임베딩 벡터를 일종의 사전(dictionary)으로 본다면, 사전에 없는 어휘 토큰이 나타나는 경우 마다, 미리 정의한 특정한 임베딩 벡터를 반환하게 된다. 이는 원문을 추상화 하는 인코더뿐만 아니라, 대역문을 생성하는 디코더에서도 학습되지 않은 어휘 열(word sequence)을 생성해야 할 때 미등록어 토큰을 출력하는 경우도 발생한다.As described above, while expressing a specific unique vocabulary token as an embedding vector, if a vocabulary that does not appear frequently or a vocabulary that has never appeared in a bilingual corpus used for learning appears, it is regarded as a collectively reserved non-registered token, It is replaced with a single embedding vector for unregistered words. That is, if the embedding vector is viewed as a kind of dictionary, whenever a vocabulary token that is not in the dictionary appears, a specific embedding vector defined in advance is returned. This occurs when an unregistered word token is output when an untrained word sequence needs to be generated in an encoder that abstracts the original text as well as a decoder that generates a band text.

신경망 자동 번역 기술의 특성 상, 학습 단계에서 어휘가 결정되며, 결정된 어휘는 새로운 신경망으로 학습하지 않는 한 갱신될 수 없다는 문제가 존재한다. Due to the nature of the neural network automatic translation technology, the vocabulary is determined at the learning stage, and there is a problem that the determined vocabulary cannot be updated without learning with a new neural network.

또한, 어휘의 개수를 아주 크게 설정함으로 다양한 어휘를 처리할 수 있으나, 사용된 어휘의 개수는 신경망이 다루는 파라미터의 수를 지수 함수적(exponentially)으로 증가시키는 문제가 있어 그 수를 무한정 늘릴 수 없다. In addition, various vocabularies can be processed by setting the number of vocabularies very large, but the number of vocabularies used has a problem of increasing the number of parameters handled by the neural network exponentially, so that the number cannot be increased indefinitely. .

이에 따라, 입력된 어휘를 통계적으로 빈번하게 나타나는 문자열(character sequence)로 잘라, 일반적으로 의미를 가진다고 여겨지는 "단어(word)"보다 더 적은 길이를 가지는 "부-어휘(sub-word) 단위"로 떨어트려 단어 사전의 개수를 줄일 수 있다. 이러한 부-어휘 단위의 토큰 분리 방법은 Wordpiece 기법(Schuster' 12) 및 BPE 기법(Byte-Pair Encoding, Sennrich'15)등이 활용될 수 있다.Accordingly, the input vocabulary is cut into statistically frequent character sequences, and "sub-word units" having a length less than "words" generally considered to have meaning. You can reduce the number of word dictionaries by dropping to. The token-separation method of the sub-vocabulary unit may be utilized by the Wordpiece technique (Schuster' 12) and the BPE technique (Byte-Pair Encoding, Sennrich'15).

한편, 상술한 인코더-디코더 메커니즘을 사용하는 신경망 자동 번역의 특성 상, 입력문은 인코더에 입력된 전체 문장이 단일 벡터 형식으로 변환되는데, 이 때문에 대역문을 구성하는 대역 어휘(대역 토큰)을 생성하면서 개별 어휘의 의미를 올바르게 전달하지 못하는 경우가 발생한다. On the other hand, due to the characteristics of the automatic translation of the neural network using the above-described encoder-decoder mechanism, the entire sentence input to the encoder is converted into a single vector format, and thus, a band vocabulary (band token) constituting the band sentence is generated. However, there are cases where the meaning of individual vocabulary cannot be conveyed correctly.

이를 해결하기 위해, 주의 집중 메커니즘(Attention Mechanism: Bahnadau et al.,'15, Luong et al., 15)을 사용하여 대역 토큰을 생성할 때 마다, 입력문 내 특정 입력 토큰에 가중치가 부여된 조건 변수를 디코더에 전달하게 된다. 이 때, 입력된 토큰의 단위를 기준으로 가중치를 디코더에 전달하기 때문에, 부-어휘 단위로 토큰이 분리된 형태에서는 전체의 어휘 대신, 부분 어휘 토큰에 대한 가중치 벡터가 전달되게 된다. 이는 디코더가 대역어로 구성된 부-어휘 토큰을 생성할 때 마다 새롭게 계산된다.To solve this, whenever a band token is generated using the attention mechanism (Attention Mechanism: Bahnadau et al., '15, Luong et al., 15), a condition in which a specific input token in the input text is weighted The variable is passed to the decoder. At this time, since the weight is transmitted to the decoder based on the unit of the input token, the weight vector for the partial vocabulary token is transmitted instead of the entire vocabulary in the form in which the token is divided into sub-vocabulary units. This is newly calculated whenever the decoder generates sub-vocabulary tokens composed of band words.

이러한 효과를 확장 해석하면, 본 발명은 주의 집중 모델을 사용하는 신경망 자동 번역 시스템에서, 모델의파라미터 수를 효과적으로 줄이고, 미등록어에 대해 대체로 강건한 부-어휘 단위(sub-sord unit) 사전을 사용하는 경우, 대역 어휘를 생성하는 데 있어 기존 주의 집중 스코어가 해당 부-어휘 토큰이 속하는 전체 어휘를 가리키지 못하여, 의미가 올바르게 전달되지 못하는 문제를 효과적으로 개선하기 위한 발명이다. 기존의 종래 기술로 구현되는 신경망 자동 번역 시스템에 손쉽게 부가가 가능한 장점이 있다.When this effect is extensively interpreted, the present invention effectively reduces the number of parameters of the model and uses a generally robust sub-sord unit dictionary for unregistered words in a neural network automatic translation system using the attention model. In this case, in generating a band vocabulary, the existing attention concentration score does not indicate the entire vocabulary to which the corresponding sub-vocabulary token belongs, and is an invention for effectively improving a problem in which meaning is not properly transmitted. There is an advantage that it can be easily added to a neural network automatic translation system implemented in the conventional art.

이하, 본 발명의 인코더-디코더 기반의 신경망 자동 번역 모델에 적용되는 주의 집중 메커니즘에 대해 상세히 기술한다. Hereinafter, the attention mechanism applied to the encoder-decoder based neural network automatic translation model of the present invention will be described in detail.

주의 집중 메커니즘(Attention Mechanism)Attention Mechanism

도 1은 본 발명에 적용되는 주의 집중 메커니즘(Attention Mechanism)이 적용된 인코더-디코더 기반의 신경망 자동 번역 모델의 구성도이다. FIG. 1 is a block diagram of an encoder-decoder-based neural network automatic translation model to which the attention mechanism applied to the present invention is applied.

도 1을 참조하면. 본 발명에 적용되는 주의 집중 메커니즘(Attention Mechanism)이 적용된 인코더-디코더 기반의 신경망 자동 번역 모델은 기본적으로 전처리부(101), 인코더(102) 및 디코더(201)를 포함하며, 주의 집중 메커니즘을 구현하기 위한 주의 집중 스코어(a_t(s)) 계산부(103) 및 가중치 평균값(C_t) 계산부(104)를 포함하는 주의 집중 계층(Attention Layer) 블록(105)을 더 포함한다.Referring to FIG. 1. The encoder-decoder-based neural network automatic translation model to which the attention mechanism applied to the present invention is applied basically includes a preprocessor 101, an encoder 102, and a decoder 201, and implements an attention mechanism. An attention layer block 105 including an attention concentration score (a _t (s)) calculation unit 103 and a weighted average value (C _t ) calculation unit 104 is further included.

전처리부(101)는 입력문(10)을 부-어휘(Sub-word) 단위의 입력 토큰(20)으로 분리하는 전처리를 수행하고, 부-어휘(Sub-word) 단위의 입력 토큰(20)을 인코더(102)로 입력한다. The pre-processing unit 101 performs pre-processing that separates the input statement 10 into input tokens 20 in sub-word units, and input tokens 20 in sub-word units. Is input to the encoder 102.

입력문(10)으로, "이것은 쓰레드라고 불리는 것으로, 프로세스에 비해 가벼운 구조로 되어 있다."라는 문장을 예를 들면, 상기 입력문(10)은 전처리 과정에 의해, "이것 *은 쓰 *레드 *라고 불리 *는 것 *으 *로, 프로 *세스 *에 비 *해 가벼 *운 구조 *로 되 *어 있 *다"와 같이 분리된다. 여기서, '*'는 바로 앞의 어휘와 띄어쓰기 없이 이어진다는 뜻을 의미한다.As an input statement 10, for example, "This is called a thread, and has a lighter structure than a process." For example, the input statement 10 is preprocessed, and "This * is a thread *thread." It is divided into something called *, *, *, *, * lighter, *lighter than structure *, **. Here,'*' means that it continues without the words and spaces immediately preceding it.

이렇게 전처리 된 입력문은 인코더(102)를 구성하는데 사용되는 장-단기 메모리 셀(Long Short-Term Memory: LSTM)을 사용하는 재귀적 인공 신경망(RNN; Recurrent Neural Network)에 의해 입력 토큰(20)에 포함된 각 어휘 토큰에 대한 계산 값을 다음 어휘 토큰에 누적하여 반복적인 벡터 연산을 수행하게 된다. This pre-processed input statement is an input token 20 by a recurrent neural network (RNN) using a Long Short-Term Memory (LSTM) used to construct the encoder 102. The iterative vector operation is performed by accumulating the calculated value for each vocabulary token included in the next vocabulary token.

인코더(102)를 구성하는 각 층(layer)은 최소 1개에서, 임의의 은닉 계층 수 N층으로 구성되며, 가장 최상단의 은닉 계층 상태 값(h'_s)은 디코더(201)의 현재 은닉 상태 값(h_t)과 함께 주의 집중 계층 블록(105)에 전달되어, 주의 집중 계층 블록(105)은 디코더(201)의 현재 은닉 상태값(h_t)에 가장 필요한 토큰의 요소를 가리키는 가중치 평균값(C_t)을 계산한다. Each layer constituting the encoder 102 is composed of at least one, N number of hidden layers, and the highest hidden layer state value h _'s is the current hidden state of the decoder 201 Along with the value (h _t ), it is passed to the attention layer (105), and the attention layer (105) is a weighted average value indicating the element of the token most needed for the current hidden state value (h _t ) of the decoder 201 ( Calculate C _t ).

이러한 주의 집중 계층 블록(105)의 계산 과정은 Luong'15("Effective Approaches to Attention-based Neural Machine Translation", in Proc. of EMNLP’15, arXiv:1508.04025)에서 제안한 주의 집중 모델 중 "전역 주의 집중 모델"(global attention model)을 기반으로 한다.The calculation process of the attention-level hierarchical block 105 is "global attention concentration" among the attention models proposed by Luong'15 ("Effective Approaches to Attention-based Neural Machine Translation", in Proc. of EMNLP'15, arXiv:1508.04025). Model" (global attention model).

상기 주의 집중 계층 블록(105)에서 수행되는 처리 과정을 보다 상세히 설명하면 다음과 같다.The processing performed in the attention layer block 105 will be described in detail as follows.

인코더(102)의 각 토큰 별 최상위 은닉층의 은닉 상태 값을 h'_s라 하고, 디코더(201)의 현재 은닉 상태 값을 h_t라고 하면, 주의 집중 계층블록(105) 내의 주의 집중 스코어(a_t(s)) 계산부(103)에 의해 계산된 주의 집중 스코어 a_t(s)는 아래의 수학식 1로 계산된다.Assuming that the hidden state value of the highest hidden layer for each token of the encoder 102 is h _'s , and the current hidden state value of the decoder 201 is h _t , the attention concentration score in the attention concentration layer block 105 (a _t (s)) Attention concentration score a _t (s) calculated by the calculation unit 103 is calculated by Equation 1 below.

상기 수학식 1에서,

는 아래의 수학식 2로 계산된다.In Equation 1,

Is calculated by Equation 2 below.

여기서, 상기

에서, 위 첨자 T는 현재 스텝(디코더에서 생성한 토큰의 개수만큼 증가)의 번호이고,

는 디코더의 현재 은닉 상태 값에 따른 디코더의 상태를 의미한다. 가장 첫 번째 스텝은

이고, 두 번째 스텝은

이다. 즉,

은 매 대역 T번째 토큰을 생성할 때 마다 그 시점에서의 디코더의 상태를 의미한다.

는 학습 파라미터로서, 신경망 학습 과정에서 최적의 결과를 나오도록 학습하는 과정에서 바뀌는 가중치 벡터이다. 즉, 이 가중치 벡터는 올바른 주의 집중 벡터의 위치를 가리키는 방향으로 최적화된 학습 파라미터이다.Where, above

In, superscript T is the number of the current step (increased by the number of tokens generated by the decoder),

Denotes the state of the decoder according to the current state value of the decoder. The first step

And the second step

to be. In other words,

Denotes the state of the decoder at that time each time a T-th token is generated for each band.

As a learning parameter, is a weight vector that is changed in the process of learning to produce an optimal result in the neural network learning process. That is, this weight vector is a learning parameter optimized in the direction indicating the position of the correct attention vector.

최종적으로 생성된 주의 집중 스코어(a_t(s))는 인코더(102)의 각 토큰별 최상위 은닉 상태 값(

)과 함께 벡터 곱 연산(104)을 통해 각 입력 토큰(20)의 가중치 평균(weighted average) 값(C_t)으로 계산되어, 디코더(201)에서 생성하는 대역 토큰(202,

)을 생성 및 예측하기 위한 조건 변수로 활용된다.The finally generated attention score (a _t (s)) is the highest hidden state value for each token of the encoder 102 (

) Along with the vector multiplication operation 104 to calculate the weighted average value (C _t ) of each input token 20, the band token 202 generated by the decoder 201,

) As a condition variable for generating and predicting.

이와 같이, 입력 토큰(2-)이 부-어휘 단위로 잘게 쪼개져 있을 경우, 이들 가중치 평균 값(C_t)은 부-어휘 단위 토큰을 기준으로 계산되기 때문에, 어휘를 표현하는 전체 의미를 모두 담을 수 없게 된다. 즉, 가중치 평균 값(C_t)은 입력 어휘들(입력 토큰들)의 나열 순서와 같은 상호 연관 관계가 반영되지 못하여 어휘를 표현하는 전체 의미가 디코더로 전달되지 않는다.As described above, when the input token 2- is finely divided into sub-vocabulary units, since these weighted average values C _t are calculated based on the sub-vocabulary unit tokens, they contain all the meanings of expressing the vocabulary. It becomes impossible. In other words, the weighted average value C _t does not reflect the correlation between the input vocabulary (input tokens), such as the order of listing, so the entire meaning of expressing the vocabulary is not transmitted to the decoder.

이에 따라, 본 발명에서는 입력 어휘들(입력 토큰들)의 나열 순서와 같은 상호 연관 관계가 반영된 조건 변수(가중치 평균 값(C_t))를 디코더로 전달할 수 있는 새로운 신경망 자동 번역 모델이 제공된다. Accordingly, in the present invention, a new neural network automatic translation model capable of transmitting a condition variable (weighted average value C _t ) reflecting a correlation such as the order of listing of input vocabularies (input tokens) to a decoder is provided.

이하, 도 2를 참조하여, 새로운 주의 집중 메커니즘이 적용된 본 발명의 인코더-디코더 기반의 신경망 자동 번역 모델에 대해 상세히 기술한다.Hereinafter, with reference to FIG. 2, the encoder-decoder based neural network automatic translation model of the present invention to which the new attention mechanism is applied will be described in detail.

도 2는 본 발명의 일 실시 예에 따른 새로운 주의 집중 메커니즘이 적용된 인코더-디코더 기반의 신경망 자동 번역 모델의 구성도이다.2 is a block diagram of an encoder-decoder based neural network automatic translation model to which a new attention-attention mechanism is applied according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시 예에 따른 새로운 주의 집중 메커니즘이 적용된 인코더-디코더 기반의 신경망 자동 번역 모델(이하, '본 발명의 일 실시 예에 따른 신경망 자동 번역 모델'이라 함)은 전처리부(301), 인코더(303), 위치 임베딩(Positional Embedding) 벡터(E_p) 생성부(305), 주의 집중 계층 블록(307) 및 디코더(401)를 포함한다. 2, an encoder-decoder-based neural network automatic translation model to which a new attention-focusing mechanism according to an embodiment of the present invention is applied (hereinafter, referred to as'automatic translation model of a neural network according to an embodiment of the present invention') It includes a pre-processing unit 301, an encoder 303, a positional embedding (Positional Embedding) vector (E _p ) generating unit 305, the attention layer layer block 307 and the decoder 401.

인코더(303) 및 디코더(401)는 도 1의 신경망 자동 번역 모델의 인코더(102) 및 디코더(201)의 구성과 실질적으로 동일한 기능을 수행한다. 따라서, 인코더(303) 및 디코더(401)에 대한 설명은 도 1의 신경망 자동 번역 모델의 인코더(102) 및 디코더(201)에 대한 설명으로 대신한다.The encoder 303 and the decoder 401 perform substantially the same functions as those of the encoder 102 and decoder 201 of the neural network automatic translation model of FIG. 1. Therefore, the description of the encoder 303 and the decoder 401 replaces the description of the encoder 102 and the decoder 201 of the neural network automatic translation model of FIG. 1.

전처리부(301)는 입력문(10)을 부-어휘 단위의 입력 토큰(20)으로 자르는(분리하는) 전처리 과정을 수행하는 점에서 도 1의 전처리부(101)와 유사한 기능을 갖지만, 입력 토큰(20) 내의 어휘들 간의 상관관계를 나타내는 상관관계 정보의 생성 과정을 추가적으로 수행하는 점에서 도 1의 전처리부(101)와 차이가 있다. 생성된 상관관계 정보는 위치 임베딩 벡터(E_p) 계산부(305)로 입력된다. The pre-processing unit 301 has a function similar to that of the pre-processing unit 101 of FIG. 1 in that it performs a pre-processing process of cutting (separating) the input sentence 10 into sub-vocabulary input tokens 20. This is different from the pre-processing unit 101 of FIG. 1 in that the process of generating correlation information indicating correlation between vocabularies in the token 20 is additionally performed. The generated correlation information is input to the position embedding vector (E _p ) calculator 305.

상기 상관관계 정보는 입력 토큰(20)에 포함된 입력 어휘들('이것', '*은', '쓰', '*레드' 및 '*라고')의 어휘 순서(①→②→③→④→⑤)를 나타내는 어휘 순서 열 정보(30)이다. The correlation information is the vocabulary order (①→②→③→) of input vocabularies (“this”, “*silver”, “write”, “*red” and “*go”) included in the input token 20. ④→⑤) is lexical order column information 30.

이러한 어휘 순서 열 정보를 생성하기 위해, 전처리부(301)는 형태소 분석과 같은 분석 기법을 이용하여 입력문(10)을 형태소 단위로 분리하는 과정을 수행한다.In order to generate such lexical sequence column information, the pre-processing unit 301 performs a process of separating the input sentence 10 into morpheme units using an analysis technique such as morpheme analysis.

형태소 단위의 분리 과정에 의해, 입력 토큰(20)에 포함된 격조사 '*은'과 '*라고'은 개별적인 입력 토큰으로 취급된다. 즉, 표현 상 띄어쓰기 없이 이어져 있는 어휘열인 경우에도, 입력 토큰으로 분리된다. By the separation process of morpheme units, the “* silver” and “*”, which are included in the input token 20, are treated as separate input tokens. That is, even in the case of vocabulary strings connected without spaces in expression, they are separated into input tokens.

전처리부(301)는 형태소 단위로 분리된 입력 토큰들에게 순서대로 번호를 부여한다. 전처리부(301)에 의해 수행되는 번호 부여 과정은 아래와 같다.The pre-processing unit 301 sequentially assigns numbers to input tokens separated by morpheme units. The numbering process performed by the pre-processing unit 301 is as follows.

원문
"이것은 쓰레드라고 불리는 것으로, 프로세스에 비해 가벼운 구조로 되어 있다"
형태소 분석에 의한 어휘 분리 결과(각각의 어휘는 ‘/’ 구분자로 분리)
"이것/은/쓰레드/라고/불리/는/것/으로/프로세스/에/비해/가벼/운/구조/로/되/어/있다"
부-어휘 토큰 단위 분리 결과
"이것 *은 쓰 *레드 *라고 불리 *는 것 *으 *로, 프로 *세스 *에 비 *해 가벼 *운 구조 *로 되 *어 있 *다"
어휘 순서열 정보
"이것(1) *은(2) 쓰(3) *레드(3) *라고(4) 불리(5) *는(6) 것(7) *으(8) *로(8), 프로(9) *세스(9) *에(10) 비(11) *해(11) 가벼(12) *운(13) 구조(14) *로(15) 되(16) *어(17) 있(18) *다(18)" Original text
"This is called a thread and has a lighter structure than a process."
Vocabulary separation result by morpheme analysis (each vocabulary is separated by'/' separator)
"This / silver / thread / said / called / is / thing / into / process / to / in comparison / light / light / structure / into / do / uh /"
Sub-vocabulary token separation result
"This * is * that is called *thread * *, * is * lighter than the * process * and * has a light structure * *
Vocabulary sequence information
"This(1) *(2) write(3) *red(3) *(4) disadvantage(5) *(6) thing(7) *u(8) *to(8), pro( 9) * Seth (9) * At (10) Rain (11) * Year (11) Light (12) * Luck (13) Structure (14) * (15) Back (16) * Uh (17) 18) *c(18)"

위치 임베딩 벡터(E_p) 생성부(305)는 전처리부(301)에서 생성한 어휘 순서열 정보(30)를 기반으로 입력 토큰(20) 내에서 각 어휘의 위치를 벡터 형식으로 변환한 위치 임베딩 벡터(E_p)를 생성한다.The position embedding vector (E _p ) generator 305 converts the position of each vocabulary in the input token 20 into a vector format based on the lexical sequence information 30 generated by the preprocessor 301. Create a vector (E _p ).

위치 임베딩 벡터(E_p)는 입력 토큰(20) 내의 어휘들의 어휘 순서를 반영한 어휘 순서열 정보(30)로부터 생성된 것이므로, 입력문(10)에 포함된 입력 어휘들 사이의 상관 관계가 반영된 정보이다. Since the location embedding vector E _p is generated from the vocabulary sequence sequence information 30 reflecting the vocabulary order of the vocabularies in the input token 20, information reflecting the correlation between input vocabularies included in the input statement 10 is reflected. to be.

위치 임베딩 벡터(E_p)는, 예를 들면, 어휘 임베딩, 즉 입력 토큰(20)을 인코더(303)에 입력하기 전에 토큰의 순서 값에 의존적인 2차원의 실수 값(real valued) 벡터를 사용하여 치환하는 것으로 계산하거나, 사인파(sine wave) 변환을 갖는 위치 임베딩 기법(sinusodial positional embedding, Vaswani et al., “Attention is all you need”, 2017)에 따라 계산될 수 있다.The position embedding vector E _p uses, for example, a lexical embedding, that is, a two-dimensional real valued vector that depends on the order value of the token before inputting the input token 20 into the encoder 303. It can be calculated as a substitution, or can be calculated according to a sinusoidal (sine wave) position embedding technique (sinusodial positional embedding, Vaswani et al., “Attention is all you need”, 2017).

이러한 입력 어휘들 사이의 상관 관계가 반영된 위치 임베딩 벡터(E_p)는 주의 집중 계층 블록(307)으로 입력되고, 주의 집중 계층 블록(307)은 입력 어휘들 사이의 상관 관계가 반영된 가중치 평균값(C_t)을 디코더(401)의 현재 은닉 상태값(h_t)에 가장 필요한 대역 토큰(

)의 요소를 가리키는 조건 변수로서 디코더(401)에게 전달할 수 있게 된다.The position embedding vector E _p reflecting the correlation between the input vocabularies is input to the attention-focused hierarchical block 307, and the attention-focused hierarchical block 307 is the weighted average value C reflecting the correlation between the input vocabularies. _t ) is the most necessary band token for the current concealed state value h _t of the decoder 401 (

) As a condition variable indicating an element of ).

이를 위해, 주의 집중 계층 블록(307)은 결합부(307A), 주의 집중 스코어(a_t(S)) 계산부(307B) 및 가중치 평균값(C_t) 계산부를 포함한다.To this end, the attention concentration layer block 307 includes a combination unit 307A, an attention concentration score (a _t (S)) calculation unit 307B, and a weighted average value (C _t ) calculation unit.

결합부(307A)는 위치 임베딩 벡터(E_p) 생성부(305)로부터 입력된 위치 임베딩 벡터(E_p)를 주의 집중 스코어(a_t(S)) 계산부(307B)에 의해 계산된 주의 집중 스코어(a_t(S))에 결합하여, 입력 어휘들의 어휘 순서가 반영된 새로운 주의 집중 스코어(a_t(S))를 생성하는 구성으로, 결합을 위해, 예를 들면, 위치 임베딩 벡터(E_p)와 주의 집중 스코어(a_t(S)) 간의 벡터 곱 연산을 수행할 수 있다. 여기서, 주의 집중 스코어(a_t(S))는 전술한 수학식 1 및 2에 의해 계산될 수 있다.The joining unit 307A focuses the position embedding vector E _p input from the position embedding vector (E _p ) generator 305 by the attention concentration score (a _t (S)) calculation unit 307B. score (a _t (S)) to bind to, a configuration for generating a new attention score (a _t (S)) the lexical order of the input words is reflected, for engagement, for example, a position embedded vector (E _p ) And attention score (a _t (S)). Here, the attention score (a _t (S)) can be calculated by the above equations (1) and (2).

결합부(307A)에 의해 결합된 결과값은 가중치 평균값(C_t) 계산부(307C)로 입력되고, 가중치 평균값(C_t) 계산부(307C)는 인코더(102)의 각 토큰별 최상위 은닉 상태 값(

)과 함께 벡터 곱 연산을 통해 입력 토큰(20) 내의 전체 어휘들에 대한 가중치 평균 값(C_t)으로 계산되어, 디코더(201)에서 생성하는 대역 토큰(202,

)을 생성 및 예측하기 위해 조건 변수로 활용된다.The result combined by the combining unit (307A) has a weight average value (C _t) is input to the calculation unit (307C), a weight average value (C _t) calculation unit (307C) is a top-level concealment status of each token in the encoder 102 value(

Band token 202 generated by the decoder 201 by calculating the weighted average value C _t for all vocabularies in the input token 20 through a vector multiplication operation with ).

) Is used as a condition variable to generate and predict.

상기 조건 변수, 즉, 가중치 평균 값(C_t)은 상술한 바와 같이, 입력 어휘들의 어휘 순서가 반영된 새로운 주의 집중 스코어(a_t(S))로부터 계산된 것이므로, 디코더(201)는 입력 어휘들(입력 토큰들) 사이의 상호 연관 관계가 반영된 조건 변수를 이용하여 대역 토큰(202,

)을 생성 및 예측함으로써, 입력문을 구성하는 각 입력 어휘에 대한 대역 어휘(대역 토큰)을 올바르게 예측함으로써, 번역품질을 개선할 수 있게 된다.Since the condition variable, that is, the weighted average value (C _t ) is calculated from the new attention concentration score (a _t (S)) in which the lexical order of the input vocabulary is reflected, as described above, the decoder 201 inputs the vocabulary. Band token 202, using a condition variable reflecting the correlation between (input tokens)

By generating and predicting ), it is possible to improve the translation quality by correctly predicting the band vocabulary (band token) for each input vocabulary constituting the input text.

도 3은 본 발명의 일 실시 예에 따른 신경망 자동 번역 모델이 탑재된 장치의 블록도이다.3 is a block diagram of a device equipped with a neural network automatic translation model according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시 예에 따른 신경망 자동 번역 모델이 탑재된 장치(500, 신경망 자동 번역 장치)는 전자 장치일 수 있다. 전자 장치는, 예를 들면, 스마트 폰(smartphone), 태블릿 PC(tablet personal computer), 이동 전화기(mobile phone), 비디오 전화기, 전자북 리더기(e-book reader), 데스크탑 PC(desktop personal computer), 랩탑 PC(laptop personal computer), 넷북 컴퓨터(netbook computer), PDA(personal digital assistant), PMP(portable multimedia player), MP3 플레이어, 모바일 의료기기, 카메라(camera), 또는 웨어러블 장치(wearable device)(예: 전자 안경과 같은 head-mounted-device(HMD), 전자 의복, 전자 팔찌, 전자 목걸이, 전자 앱세서리(appcessory), 또는 스마트 와치(smart watch)), 서버, 게이트웨이, 라우터 중 적어도 하나를 포함할 수 있다.Referring to FIG. 3, a device 500 with a neural network automatic translation model according to an embodiment of the present invention may be an electronic device. The electronic device includes, for example, a smart phone, a tablet personal computer (PC), a mobile phone, a video phone, an e-book reader, a desktop personal computer (PC), Laptop personal computers (PCs), netbook computers, personal digital assistants (PDAs), portable multimedia players (PMPs), MP3 players, mobile medical devices, cameras, or wearable devices (eg : Includes at least one of head-mounted-device (HMD) such as electronic glasses, electronic clothing, electronic bracelets, electronic necklaces, electronic accessories, or smart watches, servers, gateways, and routers Can be.

이러한 전자 장치로 구현될 수 있는 신경망 자동 번역 모델 생성 장치(500)는 프로세서(510), 메모리(520), 저장매체(530), 사용자 인터페이스(540), 출력부(550) 및 통신부(560)를 포함한다.The neural network automatic translation model generating apparatus 500 that can be implemented as such an electronic device includes a processor 510, a memory 520, a storage medium 530, a user interface 540, an output unit 550, and a communication unit 560. It includes.

프로세서(510)는 하나 이상의 범용 마이크로프로세서들, 디지털 신호 프로세서들(DSP들), 하드웨어 코어들, ASIC들(application specific integrated circuits), FPGA들(field programmable gate arrays), 또는 이들의 임의의 결합에 의해서 구현된 것으로, 사용자 인터페이스(540)로부터 입력되는 사용자 입력을 기반으로 도 2에 도시한 신경망 자동 번역 모델을 프로그램 또는 소프트웨어 형태로 생성하는 프로세스를 수행할 수 있다. 여기서, 사용자 입력은 도 2에 도시한 신경망 자동 번역 모델을 생성하기 위해 사용자가 작성한 프로그램 코드, 명령어 등일 수 있다.The processor 510 may be configured for one or more general purpose microprocessors, digital signal processors (DSPs), hardware cores, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or any combination thereof. As implemented by the user interface 540, the neural network automatic translation model shown in FIG. 2 may be processed based on the user input input in the form of a program or software. Here, the user input may be a program code, a command, etc. written by the user to generate the neural network automatic translation model shown in FIG. 2.

즉, 프로세서(510)는 사용자 입력에 따라, 실행 가능한 프로그램 또는 소프트웨어 형태의 전처리부(301), 실행 가능한 프로그램 또는 소프트웨어 형태의 인코더(303), 실행 가능한 프로그램 또는 소프트웨어 형태의 위치 임베딩 벡터(E_p) 생성부(305), 실행 가능한 프로그램 또는 소프트웨어 형태의 주의 집중 계층 블록(307) 및 실행 가능한 프로그램 또는 소프트웨어 형태의 디코더(401)를 생성하여, 신경망 자동 번역 모델을 구축하고, 이를 저장 매체(530)에 저장한다. 여기서, 저장 매체(530)는 비휘발성 메모리일 수 있다.That is, the processor 510, according to a user input, an executable program or software type pre-processing unit 301, an executable program or software type encoder 303, an executable program or software type position embedding vector E _p ) Generates the generating unit 305, the attention-focused hierarchical block 307 in the form of an executable program or software, and the decoder 401 in the form of an executable program or software, to build a neural network automatic translation model and store it in the storage medium 530 ). Here, the storage medium 530 may be a non-volatile memory.

프로세서(510)는 하나 이상의 범용 마이크로프로세서들, 디지털 신호 프로세서들(DSP들), 하드웨어 코어들, ASIC들(application specific integrated circuits), FPGA들(field programmable gate arrays), 또는 이들의 임의의 결합에 의해서 구현될 수 있다.The processor 510 may be configured for one or more general purpose microprocessors, digital signal processors (DSPs), hardware cores, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or any combination thereof. Can be implemented by

저장 매체(530)에 저장된 신경망 자동 번역 모델은 프로세서(510)에 의해 메모리(520)에서 제공하는 실행 공간에서 실행될 수 있다. 여기서, 메모리(520)는 휘발성 메모리일 수 있다.The neural network automatic translation model stored in the storage medium 530 may be executed in the execution space provided by the memory 520 by the processor 510. Here, the memory 520 may be a volatile memory.

사용자 인터페이스(540)는 사용자에 의해 작성된 사용자 입력을 생성하는 키 입력 장치일 수 있다. 키 입력 장치는 키보드, 터치 스크린 등을 포함한다.The user interface 540 may be a key input device that generates user input written by a user. The key input device includes a keyboard, a touch screen, and the like.

출력부(550)는 프로세서(510)에 의해 실행되는 신경망 자동 번역 모델의 번역결과 또는 실행 과정에서 생성되는 중간 데이터를 사용자에게 제공하는 구성으로, 표시 장치일 수 있다. The output unit 550 is a display device configured to provide a user with translation results of an automatic translation model of a neural network executed by the processor 510 or intermediate data generated during execution.

통신부(560)는 신경망 자동 번역 모델에 의해 생성된 번역 결과를 유선 또는 무선 통신 방식으로 외부 장치로 전송하는 것으로, 변조 및 무선 전송을 지원하기 위해서 적절한 모뎀, 증폭기, 필터, 및 주파수 변환 성분들을 포함하도록 구현될 수 있다.The communication unit 560 transmits a translation result generated by the neural network automatic translation model to an external device through a wired or wireless communication method, and includes appropriate modems, amplifiers, filters, and frequency conversion components to support modulation and wireless transmission. Can be implemented.

도 4는 본 발명의 일 실시 예에 따른 신경망 자동 번역 모델의 번역 방법을 나타내는 흐름도이다.4 is a flowchart illustrating a translation method of a neural network automatic translation model according to an embodiment of the present invention.

도 4를 참조하면, 단계 S410에서, 전처리부에 의해, 입력문을 부-어휘(sub-word) 단위의 입력 토큰들로 분리하고, 상기 입력문에 포함된 어휘들의 어휘 순서에 따라 각 입력 토큰에 부여된 번호정보를 포함하는 어휘 순서 열 정보를 생성하는 과정이 수행된다. Referring to FIG. 4, in step S410, by the pre-processing unit, the input text is divided into input tokens in sub-word units, and each input token according to the lexical order of the words included in the input text A process of generating vocabulary sequence column information including number information assigned to is performed.

단계 S420에서, 위치 임베딩 벡터 생성부에 의해, 상기 어휘 순서열 정보를 이용하여, 각 입력 토큰의 위치 정보를 나타내는 벡터 형식의 위치 임베딩 벡터를 생성하는 과정이 수행된다.In step S420, a process of generating a location embedding vector in a vector format representing location information of each input token is performed by the location embedding vector generator, using the lexical sequence information.

단계 S430에서, 주의 집중 계층 블록의 결합기에 의해, 상기 인코더로부터 입력되는 각 토큰 별 최상위 은닉층의 은닉 상태 값과 상기 디코더로부터 입력되는 현재 은닉 상태 값을 기반으로 계산된 주의 집중 스코어에 상기 위치 임베딩 벡터를 결합하는 과정이 수행된다.In step S430, the location embedding vector is calculated by the combiner of the attention-focused layer block based on the hidden state value of the highest hidden layer for each token input from the encoder and the current hidden state value input from the decoder. The process of combining is performed.

단계 S440에서, 주의 집중 계층 블록의 가중치 평균값 계산부에 의해, 상기 위치 임베딩 벡터가 결합된 상기 주의 집중 스코어와 상기 인코더의 각 토큰 별 최상위 은닉층의 은닉 상태 값을 이용하여 상기 디코더에서 각 입력 토큰에 대한 대역 토큰을 예측하는데 활용되는 가중치 평균 값을 생성하는 과정이 수행된다.In step S440, by the weighted average value calculating unit of the attention layer, the decoder uses the attention score combined with the position embedding vector and the hidden state value of the highest hidden layer for each token of the encoder to each input token in the decoder. A process of generating a weighted average value that is used to predict the band tokens is performed.

단계 S450에서, 상기 디코더에 의해, 상기 가중치 평균값 계산부로부터 입력된 상기 가중치 평균 값을 이용하여 상기 대역 토큰을 예측하는 과장이 수행된다.In step S450, an exaggeration of predicting the band token is performed by the decoder using the weighted average value input from the weighted average value calculator.

이상 설명한 바와 같이, 본 발명의 인코더-디코더 기반의 신경망 자동 번역 모델에서는, 주의 집중 계층 블록(307: 307A, 307B 및 307C)에서의 연산 과정에서, 어휘 순서 열 정보(30)를 이용하여 계산된 위치 임베딩 벡터(E_p)를 생성하고, 이를 주의 집중 스코어(a_t(S))와 벡터 곱을 수행한 후, 이 값을 통해 부-어휘로 연결된 전체 어휘의 가중치 평균값(C_t)이 디코더(401)로 전달되고, 디코더는 주의 집중 계층 블록(307: 307A, 307B 및 307C)에서 계산된 새로운 가중치 평균값(C_t)을 활용하여 입력 토큰(20) 내의 각 토큰에 대한 대역 토큰(202,

)을 예측하게 된다. As described above, in the automatic translation model of the encoder-decoder based neural network of the present invention, in the operation process of the attention-focused hierarchical blocks 307: 307A, 307B, and 307C, it is calculated using the lexical order sequence information 30. After generating the position embedding vector (E _p ), and performing this multiplication with the attention concentration score (a _t (S)), the weighted average value (C _t ) of all vocabulary connected by sub-vocabulary through this value is the decoder ( 401), and the decoder utilizes the new weighted average value C _t calculated in the attention-attention layer blocks 307: 307A, 307B, and 307C, and the band token 202 for each token in the input token 20,

).

이처럼, 부-어휘 단위로 분리된 토큰을 사용하는 신경망 자동 번역 모델에 대해, 대역 토큰을 생성하는 과정에서 원문 어휘의 부-어휘 토큰 대신 해당 부-어휘가 속하는 전체 어휘를 가리키게 하여, 좀 더 올바른 대역 토큰을 생성할 수 있다. 이는 종래 기술에서 대역문을 구성하는 부-어휘 토큰을 생성하는 과정에서, 어휘의 전체 의미를 올바르게 변환하지 못하는 단점을 개선할 수 있다.As described above, for a neural network automatic translation model using tokens separated by sub-vocabulary units, in the process of generating the band token, instead of the sub-vocabulary token of the original vocabulary, the entire vocabulary to which the sub-vocabulary belongs is more correct You can generate a band token. This can improve the disadvantage of not correctly converting the entire meaning of the vocabulary in the process of generating sub-vocabulary tokens constituting the band sentence in the prior art.

이상에서 본 발명에 대하여 실시예를 중심으로 설명하였으나 이는 단지 예시일 뿐 본 발명을 한정하는 것이 아니며, 본 발명이 속하는 분야의 통상의 지식을 가진 자라면 본 발명의 본질적인 특성을 벗어나지 않는 범위에서 이상에 예시되지 않은 여러 가지의 변형과 응용이 가능함을 알 수 있을 것이다. 예를 들어, 본 발명의 실시예에 구체적으로 나타난 각 구성 요소는 변형하여 실시할 수 있는 것이다. 그리고 이러한 변형과 응용에 관계된 차이점들은 첨부된 청구 범위에서 규정하는 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다.In the above, the present invention has been mainly described with reference to examples, but this is merely an example, and is not intended to limit the present invention. It will be appreciated that various modifications and applications not illustrated in the examples are possible. For example, each component specifically shown in the embodiments of the present invention can be implemented by modification. And differences related to these modifications and applications should be construed as being included in the scope of the invention defined in the appended claims.

Claims

In a translation method based on the neural network automatic translation model, which includes an encoder and a decoder.
In the pre-processing unit, the input text is divided into sub-word unit input tokens, and a lexical order column including number information assigned to each input token according to the lexical order of the words included in the input text Generating information;
Generating, by the location embedding vector generation unit, a location embedding vector in vector format representing location information of each input token using the lexical sequence information;
In the attention layer block, combining the position embedding vector with the attention score calculated based on the hidden state value of the highest hidden layer for each token input from the encoder and the current hidden state value input from the decoder; And
In the weighted average value calculation unit, the weighted average used to predict the band token for each input token in the decoder by using the attention score score combined with the location embedding vector and the hidden state value of the highest hidden layer for each token of the encoder Generating a value; And
In the decoder, predicting the band token using the weighted average value input from the weighted average value calculator;
Neural network automatic translation model based translation method comprising an encoder and a decoder comprising a.