KR20190053028A

KR20190053028A - Neural machine translation apparatus and method of operation thereof based on neural network learning using constraint strength control layer

Info

Publication number: KR20190053028A
Application number: KR1020170148981A
Authority: KR
Inventors: 신종훈; 김강일
Original assignee: 한국전자통신연구원
Priority date: 2017-11-09
Filing date: 2017-11-09
Publication date: 2019-05-17

Abstract

Disclosed is an operation method of a neural network machine translation apparatus. The operation method of a neural network machine translation apparatus using a constraint influence control layer comprises the steps of: generating a first conceptual density vector for an original text; generating a second conceptual density vector for a translated text corresponding to the original text; determining a distance between the generated first conceptual density vector and the generated second conceptual density vector; determining a predicted text for the translated text based on the original text and determining a cross entropy of the determined predicted text and the translated text; and training a neural network based on a loss function obtained using the determined distance and the determined cross entropy, wherein the first conceptual density vector is obtained by performing affine transformation on the original text and the second conceptual density vector is obtained using the number of unique lexical tokens of the translated text and an embedding vector length.

Description

TECHNICAL FIELD [0001] The present invention relates to a neural network machine translation apparatus based on neural network learning using a constraint influence control layer, and to a neural network machine translation apparatus using the constraint influence control layer,

본 개시는 신경망 기계 번역 장치에서의 제약조건 영향력 제어 계층을 이용한 신경망 학습 방법에 관한 것으로, 보다 구체적으로는 원문에 대한 개념 밀도 벡터와 대역문에 대한 개념 밀도 벡터간의 거리를 분석하여 신경망을 학습시키는 신경망 기계 번역 장치의 신경망 학습 방법에 관한 것이다.The present invention relates to a neural network learning method using a constraint influence control layer in a neural network machine translation apparatus, and more particularly, to a neural network learning method using a constraint influence control layer in a neural network machine translation apparatus, To a neural network learning method of a neural network machine translation apparatus.

순환 신경망은 심층 신경망의 하나의 형태로서, 은닉 상태, 입력 데이터, 결과 노드가 연결되어 있는 네트워크로 구성되어 있으며, 구조적으로 메모리에 해당하는 은닉 상태(hidden state)를 통해 과거 정보가 다음 단계로 전달 가능하기 때문에, 시계열 데이터에 내제되어 있는 동적 패턴과 특성 파악에 유용하다. 그러나 순환 신경망은 학습과정에서 이전 시간의 은닉 상태 값이 현재 시간의 은닉 상태에 영향을 주기 때문에 학습이 진행됨에 따라 누적 값이 기하급수적으로 증가하거나 빠르게 0으로 수렴하는 문제가 발생하기 때문에, 장단기 메모리 기반의 순환 신경망(long short term memory RNN, RNN-LSTM)이 제안되었다. 이러한 RNN-LSTM 구조는 세 개의 게이트와 동적인 정보를 기억하는 메모리를 이용하여 각각의 은닉 상태 정보를 거르고 중요한 정보만 취득하고 출력값을 계산함으로써 발산 또는 수렴 문제를 막을 수 있다.Cyclic neural network is a type of in-depth neural network that consists of hidden state, input data, and a network to which result nodes are connected. Structured memories are passed through the hidden state, It is useful for understanding dynamic patterns and characteristics inherent in time series data. However, since the cyclic neural network affects the concealed state of the current time in the learning process, the cumulative value increases exponentially as the learning progresses, or converges to zero rapidly. Therefore, Based short-term memory RNN (RNN-LSTM) has been proposed. This RNN-LSTM architecture can prevent divergence or convergence problems by filtering three hidden gates and memory for storing dynamic information, acquiring only important information, and calculating the output value.

신경망 기계 번역(neural machine translation, NMT)은 다양한 인공신경망 기반의 기계 번역을 위한 모델을 만드는 연구분야를 가리키는데, RNN-LSTM는 신경망 기계 번역 분야에서 유용한 기술로서, 최근 이에 대한 네트워크 구조 및 학습 방법 등에 관하여 많은 연구가 진행되고 있다. 예를 들어, RNN-LSTM에서 번역문장을 학습하기 위한 기술은 일반적으로 입력 문장을 받아서 출력 문장을 생성하는 인코더-디코더 메커니즘 구조이며, 이를 기반으로 하여 주의집중 모델(attention model), 양방향 모델(bidirectional model) 또는 입력재분배 모델(input feeding model) 등이 제안되고 있다.Neural machine translation (NMT) refers to the field of research for creating models for machine translation based on various artificial neural networks. RNN-LSTM is a useful technology in the field of neural network machine translation. And so on. For example, a technique for learning a translation sentence in RNN-LSTM is an encoder-decoder mechanism structure that generally receives an input sentence and generates an output sentence. Based on this, an attention model, bidirectional model model or an input feeding model have been proposed.

이러한 RNN-LSTM을 기반으로 하는 신경망 기계 번역 모델은 일반적으로 입력 문장을 하나의 벡터로 표현하는 인코더(encoder)와 입력 문장을 이용하여 출력 문장을 생성하는 디코더(decoder)를 포함한다. 구체적으로 인코더에서는 입력된 원어 입력 문장을 이용하여 하나의 입력 문장 벡터(sentence vector)로 압축시킨다. 그리고, 인코더에 의해 생성된 문장 벡터는 디코더로 전달되고, 디코더는 대역 문장을 구성하는 단어나 문자와 같은 요소를 각 시점마다 하나씩 생성한다. 즉, 디코더는 인코더에 의해 원문 언어로 구성된 문장의 추상적인 정보를 조건 변수로 입력 받아(예를 들어, 조건부 언어 모델(conditional language model)), 학습된 대역 언어의 문장 구조에 기초하여 대역 언어를 예측하게 되며, 입력 조건을 만족하는 후보들 중 가장 확률이 높은 요소를 선택하게 된다.The neural network machine translation model based on RNN-LSTM generally includes an encoder for expressing an input sentence as a single vector and a decoder for generating an output sentence using an input sentence. Specifically, the encoder compresses a single input sentence vector using the input original sentence input sentence. Then, the sentence vector generated by the encoder is transmitted to the decoder, and the decoder generates elements such as words and characters constituting the band sentence, one at each time point. That is, the decoder receives the abstract information of the sentence composed of the original language by the encoder as a condition variable (for example, a conditional language model), and obtains the band language based on the sentence structure of the learned band language And selects the element having the highest probability among the candidates satisfying the input condition.

도 1은 일 실시예에 따른 장단기 메모리 기반의 순환 신경망(RNN-LSTM)에서의 인코더와 디코더의 동작을 나타내는 도면으로서, 도 1을 참조하면, 디코더는 대역 문장의 완료를 알리는 문장 종료 예약어(예를 들어, <eos>)가 나타날 때까지 대역 언어를 예측하며, 따라서 디코더에 의해 생성되는 단어들은 이전 시점에서 넘어오는 조건들을 의미하는 문맥 벡터(context vector)에 따라 생성확률이 정해지게 된다.1 is a diagram showing the operation of an encoder and a decoder in a short-term memory based cyclic neural network (RNN-LSTM) according to an embodiment. Referring to FIG. 1, a decoder includes a sentence end reservation word The word is generated until the word <eos> appears, so that words generated by the decoder have a probability of occurrence according to a context vector, which means conditions that are passed from a previous point in time.

따라서, RNN-LSTM 기반의 신경망 기계 번역 모델은 이상적으로 학습되는 경우에는 정확한 출력 문장을 생성하기 위한 정보들을 모두 받아들일 수 있으나, 실제로는 모델의 크기가 매우 커지게 되면 필요한 정보들을 이용한 모델의 학습 효율이 감소하는 문제가 발생할 수 있다. 특히 원문의 입력 정보가 하나의 입력 문장 벡터로 뭉쳐진 후에 이를 조건으로 하여 최대 확률을 가지는 출력 어휘가 탐욕적(greedy)으로 결정되는데, 원문의 입력 어휘 열과 출력 어휘들간의 연관관계에 관하여 직접적으로 정보를 전달해줄 수 있는 방법이 많지 않기 때문에, 최종적으로 생성되는 문장을 보고 판단하는 모델들이 별도로 존재하지 않는 이상 이에 대해 학습되지 않는 문제가 있다.Therefore, the RNN-LSTM-based neural network machine translation model can accept all the information for generating the correct output sentence when it is ideally learned, but in reality, when the size of the model becomes very large, The efficiency may be reduced. In particular, after the input information of the original text is aggregated into a single input sentence vector, an output vocabulary with the highest probability is determined as greedy, and information about the relation between the input vocabulary column and the output vocabulary of the original text is directly determined There are not many ways to deliver it, so there is a problem that no models are learned unless there are separate models to judge the final generated sentence.

본 개시의 기술적 과제는 신경망 기계 번역의 학습 효율 향상 및 번역의 정확도 향상을 위한 신경망 기계 번역 장치 및 그 동작방법을 제공하는 것이다.SUMMARY OF THE INVENTION The present invention provides a neural network machine translation apparatus and an operation method thereof for improving learning efficiency and accuracy of translation of a neural network machine translation.

본 개시에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical objects to be achieved by the present disclosure are not limited to the above-mentioned technical subjects, and other technical subjects which are not mentioned are to be clearly understood from the following description to those skilled in the art It will be possible.

계약조건 영향력 제어 계층을 이용한 신경망 학습 기반의 신경망 기계 번역 장치에서의 동작방법에 있어서, 원문에 대한 제1 개념 밀도 벡터를 생성하는 단계; 상기 원문에 대응되는 대역문에 대한 제2 개념 밀도 벡터를 생성하는 단계; 상기 생성된 제1 개념 밀도 벡터와 상기 생성된 제2 개념 밀도 벡터의 거리를 결정하는 단계; 상기 원문을 기초로 상기 대역문에 대한 예측문을 결정하고, 상기 결정된 예측문과 상기 대역문의 크로스 엔트로피(cross entropy)를 결정하는 단계; 및 상기 결정된 거리와 상기 결정된 크로스 엔트로피를 이용하여 획득된 손실 함수를 기초로 신경망을 학습시키는 단계를 포함하고, 상기 제1 개념 밀도 벡터는 상기 원문에 아핀 변환(affine transformation)을 수행하여 획득되고, 상기 제2 개념 밀도 벡터는 상기 대역문의 고유 어휘 토큰 수와 임베딩 벡터 길이를 이용하여 획득되는 것을 특징으로 하는, 신경망 기계 번역 장치의 동작방법이 제공될 수 있다.A method of operating in a neural network machine translation apparatus based on neural network learning using a contract condition influence control layer, the method comprising: generating a first conceptual density vector for a text; Generating a second conceptual density vector for a band sentence corresponding to the original text; Determining a distance between the generated first conceptual density vector and the generated second conceptual density vector; Determining a prediction statement for the bandwidth statement based on the original statement, and determining a cross-entropy of the determined prediction statement and the bandwidth query; And learning the neural network based on the determined distance and the loss function obtained using the determined cross entropy, wherein the first concept density vector is obtained by performing an affine transformation on the original text, And the second concept density vector is obtained by using the number of lexical tokens and the embedding vector length of the bandwidth query.

본 개시에 대하여 위에서 간략하게 요약된 특징들은 후술하는 본 개시의 상세한 설명의 예시적인 양상일 뿐이며, 본 개시의 범위를 제한하는 것은 아니다.The features briefly summarized above for this disclosure are only exemplary aspects of the detailed description of the disclosure which follow, and are not intended to limit the scope of the disclosure.

본 개시에 따르면, 신경망 기계 번역의 학습 효율 향상 및 번역의 정확도 향상을 위한 신경망 기계 번역 장치 및 방법이 제공될 수 있다.According to the present disclosure, a neural network machine translation apparatus and method for improving learning efficiency and improving translation accuracy of a neural network machine translation can be provided.

또한, 본 개시에 따르면, 신경망 학습 시 추가적인 제약조건으로 인해 불필요한 출력 문장의 생성 가능성을 줄일 수 있고, 출력 문장의 제약 벡터를 고정함으로써 제약조건이 항상 만족되지 못하게 억제하여 인코더 및 디코더의 각 파라미터의 학습에 필요한 피드백을 더 강하게 전달할 수 있다.Further, according to the present disclosure, it is possible to reduce the possibility of generating unnecessary output sentences due to additional constraints in neural network learning, and by constraining the constraint vector of the output sentence, constraint conditions are always prevented from being satisfied, The feedback required for learning can be transmitted more strongly.

또한, 본 개시에 따르면, RNN-LSTM 기반의 번역 모델에서 추가된 네트워크의 크기 조절로 제약조건의 학습 영향력을 조절할 수 있으며, 추가된 네트워크를 사용하는 방식으로 입력 및 출력에 대한 추가 제약 조건을 비용함수로 설정하여 보다 일반적인 제약조건이 반영된 RNN-LSTM 모델을 구축할 수 있다.In addition, according to the present disclosure, it is possible to control learning influence of constraints by adjusting the size of a network added in an RNN-LSTM-based translation model, and further constraints on input and output RNN-LSTM model with more general constraints can be constructed.

본 개시에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtainable from the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below will be.

도 1은 일 실시예에 따른 장단기 메모리 기반의 순환 신경망에서의 인코더와 디코더의 동작을 나타내는 도면이다.
도 2 및 3은 일 실시예에 따른 제약조건 영향력 제어 계층을 이용한 신경망 기계 번역 장치의 구성을 나타내는 블록도이다.
도 4는 다른 실시예에 따른 제약조건 영향력 제어 계층을 이용한 신경망 기계 번역 장치의 구성을 나타내는 블록도이다.
도 5는 일 실시예에 따른 제약조건 영향력 제어 계층을 이용한 신경망 기계 번역 장치의 동작방법을 나타내는 흐름도이다.1 is a diagram illustrating operations of an encoder and a decoder in a short-term memory based cyclic neural network according to an embodiment.
2 and 3 are block diagrams showing a configuration of a neural network machine translation apparatus using a constraint influence control layer according to an embodiment.
4 is a block diagram illustrating a configuration of a neural network machine translation apparatus using a constraint influence control layer according to another embodiment.
5 is a flowchart illustrating a method of operating a neural network machine translation apparatus using a constraint influence control layer according to an embodiment.

이하에서는 첨부한 도면을 참고로 하여 본 개시의 실시 예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나, 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings, which will be easily understood by those skilled in the art. However, the present disclosure may be embodied in many different forms and is not limited to the embodiments described herein.

본 개시의 실시 예를 설명함에 있어서 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그에 대한 상세한 설명은 생략한다. 그리고, 도면에서 본 개시에 대한 설명과 관계없는 부분은 생략하였으며, 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.In the following description of the embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present disclosure rather unclear. Parts not related to the description of the present disclosure in the drawings are omitted, and like parts are denoted by similar reference numerals.

본 개시에 있어서, 어떤 구성요소가 다른 구성요소와 "연결", "결합" 또는 "접속"되어 있다고 할 때, 이는 직접적인 연결관계뿐만 아니라, 그 중간에 또 다른 구성요소가 존재하는 간접적인 연결관계도 포함할 수 있다. 또한 어떤 구성요소가 다른 구성요소를 "포함한다" 또는 "가진다"고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 배제하는 것이 아니라 또 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In the present disclosure, when an element is referred to as being "connected", "coupled", or "connected" to another element, it is understood that not only a direct connection relationship but also an indirect connection relationship May also be included. Also, when an element is referred to as " comprising " or " having " another element, it is meant to include not only excluding another element but also another element .

본 개시에 있어서, 제1, 제2 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용되며, 특별히 언급되지 않는 한 구성요소들간의 순서 또는 중요도 등을 한정하지 않는다. 따라서, 본 개시의 범위 내에서 일 실시 예에서의 제1 구성요소는 다른 실시 예에서 제2 구성요소라고 칭할 수도 있고, 마찬가지로 일 실시 예에서의 제2 구성요소를 다른 실시 예에서 제1 구성요소라고 칭할 수도 있다. In the present disclosure, the terms first, second, etc. are used only for the purpose of distinguishing one element from another, and do not limit the order or importance of elements, etc. unless specifically stated otherwise. Thus, within the scope of this disclosure, a first component in one embodiment may be referred to as a second component in another embodiment, and similarly a second component in one embodiment may be referred to as a first component .

본 개시에 있어서, 서로 구별되는 구성요소들은 각각의 특징을 명확하게 설명하기 위함이며, 구성요소들이 반드시 분리되는 것을 의미하지는 않는다. 즉, 복수의 구성요소가 통합되어 하나의 하드웨어 또는 소프트웨어 단위로 이루어질 수도 있고, 하나의 구성요소가 분산되어 복수의 하드웨어 또는 소프트웨어 단위로 이루어질 수도 있다. 따라서, 별도로 언급하지 않더라도 이와 같이 통합된 또는 분산된 실시 예도 본 개시의 범위에 포함된다. In the present disclosure, the components that are distinguished from each other are intended to clearly illustrate each feature and do not necessarily mean that components are separate. That is, a plurality of components may be integrated into one hardware or software unit, or a single component may be distributed into a plurality of hardware or software units. Thus, unless otherwise noted, such integrated or distributed embodiments are also included within the scope of this disclosure.

본 개시에 있어서, 다양한 실시 예에서 설명하는 구성요소들이 반드시 필수적인 구성요소들은 의미하는 것은 아니며, 일부는 선택적인 구성요소일 수 있다. 따라서, 일 실시 예에서 설명하는 구성요소들의 부분집합으로 구성되는 실시 예도 본 개시의 범위에 포함된다. 또한, 다양한 실시 예에서 설명하는 구성요소들에 추가적으로 다른 구성요소를 포함하는 실시 예도 본 개시의 범위에 포함된다. In the present disclosure, the components described in the various embodiments are not necessarily essential components, and some may be optional components. Thus, embodiments consisting of a subset of the components described in one embodiment are also included within the scope of the present disclosure. Also, embodiments that include other elements in addition to the elements described in the various embodiments are also included in the scope of the present disclosure.

이하, 첨부한 도면을 참조하여 본 개시의 실시 예들에 대해서 설명한다.Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings.

도 2 및 3은 일 실시예에 따른 제약조건 영향력 제어 계층을 이용한 신경망 기계 번역 장치의 구성을 나타내는 블록도이다.2 and 3 are block diagrams showing a configuration of a neural network machine translation apparatus using a constraint influence control layer according to an embodiment.

도 2를 참조하면, 일 실시예에 따른 신경망 기계 번역 장치(200)는 입력부(210), 제어부(220) 및 저장부(230)를 포함할 수 있다. 다만, 이는 본 실시예를 설명하기 위해 필요한 일부 구성요소만을 도시한 것일 뿐, 신경망 기계 번역 장치(200)에 포함된 구성요소가 전술한 예에 한정되는 것은 아니다.2, a neural network machine translation apparatus 200 according to an embodiment may include an input unit 210, a control unit 220, and a storage unit 230. It should be noted, however, that this shows only some components necessary for explaining the present embodiment, and the elements included in the neural network machine translation apparatus 200 are not limited to the above-described examples.

예를 들어, 도 3을 참조하면, 신경망 기계 번역 장치(300)는 인코더(310), 디코더(320) 및 제약조건 영향력 제어 계층부(330) 등을 더 포함할 수 있다. 도 3의 신경망 기계 번역 장치(300)는 도 2의 신경망 기계 번역 장치(200)에 대응될 수 있다.For example, referring to FIG. 3, the neural network machine translation apparatus 300 may further include an encoder 310, a decoder 320, a constraint influence control layer 330, and the like. The neural network machine translation apparatus 300 of FIG. 3 may correspond to the neural network machine translation apparatus 200 of FIG.

심층 신경망(deep neural network)은 입력층과 출력층 사이에 여러 개의 은닉층들로 이루어진 인공 신경망으로서, 많은 은닉층을 통해 복잡한 비선형 관계들을 모델링할 수 있으며, 이처럼 층의 개수를 늘림으로써 고도화된 추상화가 가능한 신경망 구조를 딥러닝(deep learning)이라고 부른다. 딥러닝은 매우 방대한 양의 데이터를 학습하여, 새로운 데이터가 입력될 경우 학습 결과를 바탕으로 확률적으로 가장 높은 답을 선택하기 때문에 영상의 환경이 바뀌어도 적응적으로 동작할 수 있으며, 데이터에서 모델을 학습하는 과정에서 특성인자를 자동으로 찾아내기 때문에 최근 기계분야에서 이를 활용하려는 시도가 늘어나고 있다. 딥러닝의 대표적인 예로서, 합성곱 신경망(convolutional neural network), 순환 신경망(recurrent neural network, RNN), 오토인코더(autoencoder) 등이 있다.The deep neural network is an artificial neural network composed of several hidden layers between the input layer and the output layer. It can model complex nonlinear relations through many hidden layers. By increasing the number of layers in this way, The structure is called deep learning. Deep learning learns a very large amount of data, and when new data is input, it chooses the highest answer with probability based on the learning result, so it can adaptively operate even if the environment of the image changes. Since the characteristic parameters are automatically detected in the course of learning, there is an increasing tendency to utilize them in the field of machinery recently. Representative examples of deep learning include convolutional neural network, recurrent neural network (RNN), autoencoder, and the like.

신경망 기계 번역 장치(200)는 원문에 대한 개념 밀도 벡터와 대역문에 대한 개념 밀도 벡터간의 거리를 분석하여 신경망을 학습시킬 수 있다.The neural network machine translation apparatus 200 can learn the neural network by analyzing the distance between the concept density vector for the original text and the concept density vector for the sentence.

일 실시예에 따를 때, 장단기 메모리를 이용한 순환 신경망(long short term memory RNN, RNN-LSTM) 기반의 신경망 기계 번역 장치(200)는 원문에 대한 제1 개념 밀도 벡터를 생성하고, 원문에 대응되는 대역문에 대한 제2 개념 밀도 벡터를 생성하고, 생성된 제1 개념 밀도 벡터와 생성된 제2 개념 밀도 벡터의 거리를 결정하고, 원문을 기초로 대역문에 대한 예측문을 결정하고, 결정된 예측문과 대역문의 크로스 엔트로피(cross entropy)를 결정하고, 결정된 거리와 결정된 크로스 엔트로피를 이용하여 획득된 손실 함수를 기초로 신경망을 학습시킬 수 있다.According to one embodiment, a neural network machine translation apparatus 200 based on a short-term memory RNN (RNN-LSTM) using a short-term memory generates a first conceptual density vector for the original text, Determining a distance between the generated first conceptual density vector and the generated second conceptual density vector, determining a predictive statement for a band sentence based on the original text, It is possible to determine the cross entropy of the door and the bandwidth inquiry and to learn the neural network based on the obtained loss function using the determined distance and the determined cross entropy.

여기서, 개념 밀도 벡터(concept density vector)는 문장 벡터가 가지고 있는 추상적인 정보량을 표현하기 위한 벡터로서, 인코더에 입력되는 원문에 포함되는 정보량을 개념 밀도 벡터로 표현하고 디코더에서 이에 상응하는 정확한 개념 밀도 벡터를 표현할 수 있도록 단어 생성을 유도하여 단어 선택 시 불필요하게 고려되는 다양한 가능성들이 배제될 수 있도록 한다. 일 실시예에 따른 제1 개념 밀도 벡터는 상기 원문에 아핀 변환(affine transformation)을 수행하여 획득되고, 또한 일 실시예에 따른 제2 개념 밀도 벡터는 상기 대역문의 고유 어휘 토큰 수와 임베딩 벡터 길이를 이용하여 획득될 수 있다.Here, the concept density vector is a vector for expressing the amount of abstract information possessed by the sentence vector. The concept density vector expresses the amount of information included in the original text input to the encoder as a concept density vector, and a correct concept density The word generation is induced to express the vector so that various possibilities that are unnecessarily considered in word selection can be excluded. The first conceptual density vector according to an embodiment is obtained by performing an affine transformation on the original text, and the second conceptual density vector according to an embodiment is obtained by multiplying the number of the unique vocabulary tokens of the band statement and the embedding vector length &Lt; / RTI >

따라서, 본 개시의 장단기 메모리를 이용한 순환 신경망 기반의 신경망 기계 번역 장치는 학습 과정에서 제약조건의 적용 강도를 조절할 수 있는 네트워크를 추가할 수 있고, 또한 예측 문장 생성 과정에서는 제약조건과 관련된 네트워크를 제거함으로써 일반적인 신경망 기계 번역 장치의 구조를 이용할 수 있다.Therefore, the neural network machine translation apparatus based on the cyclic neural network using the memory of the present disclosure can add a network that can control the application strength of the constraint condition in the learning process, and can also remove the network related to the constraint condition in the prediction sentence generation process The structure of a general neural network machine translation device can be utilized.

도 2를 참조하면, 입력부(210)는 제어부(220)의 제어에 의해 신경망 기계 번역 장치(200)의 외부에서부터 텍스트, 영상 시퀀스, 오디오(예를 들어, 음성, 음악 등) 및 부가 정보(예를 들어, EPG 등) 등을 수신한다.2, an input unit 210 receives text, an image sequence, audio (for example, voice, music, etc.) and additional information (for example, speech) from the outside of the neural network machine translation apparatus 200 under the control of the control unit 220 For example, EPG, etc.).

일 실시예에 따른 입력부(110)는 인코더-디코더 메커니즘의 신경망 자동 번역 시스템의 학습을 위한 문장 시퀀스를 입력 받을 수 있다.The input unit 110 according to an exemplary embodiment may receive a sentence sequence for learning the automatic neural network translation system of the encoder-decoder mechanism.

제어부(220)는 신경망 기계 번역 장치(200)의 전반적인 동작 및 신경망 기계 번역 장치(200)의 내부 구성 요소들 사이의 신호 흐름을 제어하고, 데이터를 처리하는 기능을 수행한다. 제어부(220)는 사용자의 입력이 있거나 기 설정되어 저장된 조건을 만족하는 경우, 저장부(230)에 저장된 다양한 데이터들을 이용하고 또한 다양한 애플리케이션을 실행할 수 있다. 도 2의 제어부(220)는 도 3의 인코더(310), 디코더(320) 및 제약조건 영향력 제어 계층부(330)를 더 포함할 수 있으나, 이는 본 실시예를 설명하기 위해 필요한 일부 구성요소만을 도시한 것일 뿐, 제어부(220)에 포함된 구성요소가 전술한 예에 한정되는 것은 아니다.The control unit 220 controls the overall operation of the neural network machine translation apparatus 200 and the signal flow between the internal components of the neural network machine translation apparatus 200 and performs processing of data. The control unit 220 may use various data stored in the storage unit 230 and may execute various applications when the input unit of the user is satisfied or predetermined conditions are satisfied. 2 may further include the encoder 310, the decoder 320, and the constraint influence control layer 330 shown in FIG. 3. However, the controller 220 of FIG. 2 may include only the components necessary for explaining the present embodiment The components included in the control unit 220 are not limited to the above-described examples.

일 실시예에 따른 제어부(220)는 원문에 대한 제1 개념 밀도 벡터를 생성하고, 원문에 대응되는 대역문에 대한 제2 개념 밀도 벡터를 생성하고, 생성된 제1 개념 밀도 벡터와 생성된 제2 개념 밀도 벡터의 거리를 결정하고, 원문을 기초로 대역문에 대한 예측문을 결정하고, 결정된 예측문과 대역문의 크로스 엔트로피를 결정하고, 결정된 거리와 결정된 크로스 엔트로피를 이용하여 획득된 손실 함수를 기초로 신경망을 학습시킬 수 있다.The controller 220 generates a first conceptual density vector for the original text, generates a second conceptual density vector for a band sentence corresponding to the original text, 2 determine the distance of the concept density vector, determine the predictive statement for the band sentence based on the original text, determine the determined cross-entropy of the predictive sentence and the band statement, and determine the loss function obtained using the determined distance and the determined cross entropy To learn neural networks.

저장부(230)는 제어부(220)의 제어에 의해 신경망 기계 번역 장치(200)를 구동하고 제어하기 위한 다양한 데이터, 프로그램 또는 애플리케이션을 저장할 수 있다. The storage unit 230 may store various data, programs or applications for driving and controlling the neural network machine translation apparatus 200 under the control of the control unit 220. [

일 실시예에 따른 저장부(230)는 제1 개념 밀도 벡터, 제2 개념 밀도 벡터, 제1 개념 밀도 벡터와 제2 개념 밀도 벡터의 거리, 크로스 엔트로피 또는 손실 함수 등의 데이터를 저장할 수 있다.The storage unit 230 may store data such as a first conceptual density vector, a second conceptual density vector, a distance between the first conceptual density vector and a second conceptual density vector, a cross entropy, or a loss function.

후술하는 바와 같이 도 3을 참조하여 제약조건 영향력 제어 계층을 이용한 신경망 학습 기반의 신경망 기계 번역 장치의 동작을 살펴보도록 하겠다.The operation of the neural network machine translation apparatus based on the neural network learning using the constraint influence control layer will be described with reference to FIG.

도 3을 참조하면, 일 실시예에 따른 신경망 기계 번역 장치(200)는, 일반적인 장단기 메모리를 이용한 순환 신경망(long short term memory RNN, RNN-LSTM)의 인코더-디코더 메커니즘을 바탕으로, 신경망 학습을 제어하기 위한 계층을 추가한 구조이다.Referring to FIG. 3, the neural network machine translation apparatus 200 according to an embodiment of the present invention performs neural network learning based on an encoder-decoder mechanism of a short-term memory RNN (RNN-LSTM) It is a structure that adds a layer to control.

일 실시예에 따른 제약조건 영향력 제어 계층부(constraint strength control layer, 330)는 심층 신경망에서 주로 사용되는 선형 회귀 모델(linear regression model)의 구조로 구성되며, 제약조건 영향력 제어 계층부(330)의 학습을 위해 원문에 대한 개념 밀도 벡터(source concept density vector, Vs, 편의상 이를 제1 개념 밀도 벡터라 하겠다), 대역문에 대한 개념 밀도 벡터(target concept density vector, 편의상 이를 제2 개념 밀도 벡터라 하겠다), 제1 개념 밀도 벡터와 제2 개념 밀도 벡터의 거리 및 예측문과 대역문의 크로스 엔트로피 등이 이용된다.The constraint strength control layer 330 according to an exemplary embodiment of the present invention includes a structure of a linear regression model that is mainly used in a deep layer neural network, For the purpose of learning, a concept density vector (Vs) for the original text will be referred to as a first concept density vector for convenience, and a target concept density vector ), The distance between the first concept density vector and the second concept density vector, and the predicted sentence and the cross entropy of the query are used.

구체적으로, 일 실시예에 따른 제1 개념 밀도 벡터는 인코더(310)에서 출력되는 문장 벡터(Cs)를 입력으로 받아, 가중치 행렬(weight matrix, We)과 바이어스 벡터(bias vector, be)를 이용한 아핀 변환(affine transformation)을 수행하여 획득될 수 있다 (Vs= Ve?Cs + be). 여기서 문장 벡터(Cs)는 인코더(31)의 최상 계층의 출력을 입력 어휘 길이를 나타내는 t 길이만큼, 1부터 t까지를 결합하여 생성될 수 있다.Specifically, the first conceptual density vector according to an embodiment receives the sentence vector Cs output from the encoder 310 as input, and uses a weight matrix (We) and a bias vector (be) Can be obtained by performing an affine transformation (Vs = Ve? Cs + be). Here, the sentence vector Cs can be generated by combining the output of the top layer of the encoder 31 by 1 to t, which is the length of the input lexical length t.

또한, 일 실시예에 따른 제2 개념 밀도 벡터는 대역문의 고유 어휘 토큰 수와 임베딩 벡터 길이를 이용하여 획득될 수 있으며, 예를 들면 다음과 같다. 일반적인 신경망 자동 번역의 학습 방법에서는 각각의 어휘 토큰을 부동 소수점 형식의 1차원 벡터로 변경하는 작업을 수행해야 하는데, 이를 수행하는 부분을 임베딩 행렬(embedding matrix)라고 한다. 임베딩 행렬은 대역문의 고유 어휘 토큰 수 x 임베딩 벡터 길이에 해당하는 크기를 가지는데, 학습 단계에서, 예를 들어 ‘I have an apple.’이라는 대역문이 존재하면, 고유 어휘 토큰 수가 5이고 임베딩 벡터 길이가 6인 임베딩 행렬에 의해 변환된 출력 임베딩 벡터는 다음과 같다.In addition, the second concept density vector according to an embodiment can be obtained by using the number of unique vocabulary tokens of the bandwidth query and the length of the embedding vector, for example, as follows. In a general neural network automatic translation learning method, each lexical token must be converted into a one-dimensional vector of a floating point format. The portion performing this operation is called an embedding matrix. The embedded matrix has a size corresponding to the number of unique vocabulary tokens of the query word x the length of the embedding vector. In the learning phase, for example, if there is a band word 'I have an apple.', The number of unique vocabulary tokens is 5, The output embedding vector transformed by an embedding matrix of length 6 is as follows.

I -> 0.1 0.5 0.23 ?0.12 ?0.01 0.89 I - > 0.1 0.5 0.23? 0.12? 0.01 0.89

have -> -0.1 ?0.21 0.17 0.15 1.13 0.68have -> -0.1? 0.21 0.17 0.15 1.13 0.68

an -> -0.87 ?0.12 ?1.3 1.5 0.12 1.0an -> -0.87? 0.12? 1.3 1.5 0.12 1.0

apple -> -1.52 1.89 0.25 0.99 0.13 ?1.21apple -> -1.52 1.89 0.25 0.99 0.13? 1.21

. -> 0.001 0.52 ?0.02 ?0.81 ?1.0 0.23. - > 0.001 0.52? 0.02? 0.81? 1.0 0.23

이렇게 각각의 어휘열을 토대로 임베딩 벡터로 치환되면, 제2 개념 밀도 벡터는 이를 더함으로써(sum of vector elements) 획득될 수 있다(340). 따라서, ‘I have an apple.’이라는 대역문에 대한 제2 개념 밀도 벡터는 다음과 같다.If the embedded vector is replaced with an embedded vector based on the respective lexical column, the second conceptual density vector may be obtained (340) by adding the sum of the vector elements. Therefore, the second concept density vector for the band phrase 'I have an apple.'

대역문에 대한 제2 개념 밀도 벡터 = -2.389 2.58 ?0.67 1.71 0.37 1.59The second concept density vector for the bandwidth statement = -2.389 2.58? 0.67 1.71 0.37 1.59

일 실시예에 따른 원문에 대한 개념 밀도 벡터와 대역문에 대한 개념 밀도 벡터간의 거리(350)는 두 벡터의 유클리드 거리(Euclidean distance)일 수 있으나, 이에 제한되지 않으며, 두 벡터 간의 거리를 의미할 수 있는 수학적 방법들을 포함할 수 있다. 유클리드 거리를 이용하면 원문에 대한 개념 밀도 벡터(Vs)와 대역문에 대한 개념 밀도 벡터(Vt)간의 거리는 수학식 1과 같이 나타낼 수 있다.The distance 350 between the concept density vector for the original text and the conceptual density vector for the band text according to an exemplary embodiment may be an Euclidean distance of two vectors, but is not limited thereto. May include mathematical methods that can be used. Using the Euclidean distance, the distance between the concept density vector (Vs) for the original text and the concept density vector (Vt) for the band sentence can be expressed by Equation (1).

한편, 원문과 대역문에 대한 개념 밀도 벡터가 각각 결정되면, 일반적인 신경망 학습 단계를 거쳐, 최종적으로 디코더(320)에서 출력되는 예측문과 실제 정답인 대역문간의 크로스 엔트로피(cross entropy) c(Θ,D)가 결정될 수 있다.If the concept density vectors for the original sentences and the band sentences are respectively determined, the cross entropy c (?,?) Between the predicted sentences finally output from the decoder 320 and the band sentences, D) can be determined.

종래의 신경망 자동 번역 학습 단계에서는 예측문과 대역문간의 크로스 엔트로피가 최소화 되는 방향으로 신경망을 학습시킬 수 있으나, 본 개시의 신경망 기계 번역 장치는 예측문과 대역문간의 크로스 엔트로피에, 원문과 대역문에 대한 개념 밀도 벡터간의 거리를 결합한 손실 함수(loss function) loss(Θ,D)가 최소가 되도록 신경망을 학습시킬 수 있다(360). 일 실시예에 따른 손실 함수는 수학식 2와 같이 나타낼 수 있다.In the conventional neural network automatic translation learning step, the neural network can be learned in a direction in which the cross entropy between the predicted sentence and the band sentence is minimized. However, the neural network machine translation apparatus of the present disclosure is capable of learning the cross entropy between the predicted sentence and the band sentence, The neural network can be learned (360) such that the loss function (?, D) combining the distance between the concept density vectors is minimized. The loss function according to one embodiment can be expressed by Equation (2).

일 실시예에 따를 때, 손실 함수는 수학식 2와 같이 크로스 엔트로피와 두 개념 밀도 벡터간의 거리의 선형 결합으로 정의될 수 있지만, 이에 제한되지 않으며, 신경망의 학습 의도와 번역 방법에 맞도록 결합 조건이 변경될 수 있다.According to one embodiment, the loss function can be defined as a linear combination of the distance between the cross entropy and the two concept density vectors as in equation (2), but is not limited to this, Can be changed.

따라서, 손실 함수를 이용하여 신경망을 업데이트 함으로써 원문과 대역문에 대한 개념 밀도 벡터간의 거리를 좁힐 수 있으며, 이는 각각의 언어에서 설명하는 의미를 더욱 가깝게 해주는 효과를 얻을 수 있다.Therefore, by updating the neural network using the loss function, it is possible to narrow the distance between the concept density vectors for the original sentence and the band sentence, which brings the effect of bringing closer meaning to each language.

도 4는 다른 실시예에 따른 제약조건 영향력 제어 계층을 이용한 신경망 기계 번역 장치의 구성을 나타내는 블록도이다.4 is a block diagram illustrating a configuration of a neural network machine translation apparatus using a constraint influence control layer according to another embodiment.

일 실시예에 따를 때, 학습이 완료되어 완료된 신경망 모델을 이용하여 자동 번역을 예측(또는 수행)하는 경우, 본 개시의 신경망 기계 번역 장치는, 학습 과정에서의 개념 밀도 벡터 생성 과정을 거치지 않고, 종래의 일반적인 신경망 자동 번역 과정과 동일하게 예측을 수행할 수 있으며, 도 4를 참조하면, 신경망 기계 번역 장치(400)는 도 3의 제약조건 영향력 제어 계층부 등을 포함하고 있지 않음을 알 수 있다.According to one embodiment, when automatic learning is predicted (or performed) by using a neural network model whose learning is completed and completed, the neural network machine translation apparatus of the present disclosure does not perform a concept density vector generation process in a learning process, 4, the neural network machine translation apparatus 400 does not include the constraint influence control layer or the like in FIG. 3 .

도 5는 일 실시예에 따른 제약조건 영향력 제어 계층을 이용한 신경망 기계 번역 장치의 동작방법을 나타내는 흐름도이다.5 is a flowchart illustrating a method of operating a neural network machine translation apparatus using a constraint influence control layer according to an embodiment.

500 단계에서 신경망 기계 번역 장치는 원문에 대한 제1 개념 밀도 벡터를 생성할 수 있다. 일 실시예에 따를 때 제1 개념 밀도 벡터는 인코더에 의한 문장 벡터를 입력으로 받아, 가중치 행렬(weight matrix) We와 바이어스 벡터(bias vector) be을 이용한 아핀 변환(affine transformation)을 수행하여 획득될 수 있다In step 500, the neural network machine translation device may generate a first conceptual density vector for the original text. According to an embodiment, the first concept density vector is obtained by performing an affine transformation using a weight matrix We and a bias vector be by receiving a sentence vector by an encoder Can

510 단계에서 신경망 기계 번역 장치는 원문에 대응되는 대역문에 대한 제2 개념 밀도 벡터를 생성할 수 있다. 일 실시예에 따를 때 제2 개념 밀도 벡터는 대역문의 고유 어휘 토큰 수와 임베딩 벡터 길이를 이용하여 획득될 수 있다.In operation 510, the neural network machine translation apparatus may generate a second concept density vector for a band sentence corresponding to the original text. According to an embodiment, the second concept density vector may be obtained using the number of unique lexical tokens of the query string and the length of the embedding vector.

520 단계에서 신경망 기계 번역 장치는 생성된 제1 개념 밀도 벡터와 생성된 제2 개념 밀도 벡터의 거리를 결정할 수 있다. 일 실시예에 따를 때, 유클리드 거리를 이용하여 제1 개념 밀도 벡터와 제2 개념 밀도 벡터간 거리를 구할 수 있다.In step 520, the neural network machine translation apparatus can determine the distance between the generated first concept density vector and the generated second concept density vector. According to one embodiment, the distance between the first conceptual density vector and the second conceptual density vector can be determined using the Euclidean distance.

530 단계에서 신경망 기계 번역 장치는 원문을 기초로 대역문에 대한 예측문을 결정하고, 결정된 예측문과 대역문의 크로스 엔트로피를 결정할 수 있다.In operation 530, the neural network machine translation apparatus can determine a predicted sentence for the band sentence based on the original sentence, and determine the determined crossed entropy of the predicted sentence and the band sentence.

540 단계에서 신경망 기계 번역 장치는 결정된 거리와 결정된 크로스 엔트로피를 이용하여 획득된 손실 함수를 기초로 신경망을 학습시킬 수 있다. 일 실시예에 따를 때, 손실 함수는 크로스 엔트로피와 두 개념 밀도 벡터간의 거리의 선형 결합으로 획득될 수 있다.In operation 540, the neural network machine translation apparatus can learn the neural network based on the determined distance and the loss function obtained using the determined cross entropy. According to one embodiment, the loss function can be obtained by linear combination of the distance between the cross entropy and the two concept density vectors.

이상 도 1 내지 도 5를 참고하여, 본 개시의 일 실시 예에 따른 신경망 기계 번역 장치에서의 제약조건 영향력 제어 계층을 이용한 신경망 학습 방법에 대해 설명하였다.1 to 5, a neural network learning method using a constraint influence control layer in a neural network machine translation apparatus according to an embodiment of the present disclosure has been described.

본 개시에 따르면, 신경망 기계 번역의 학습 효율 향상 및 번역의 정확도 향상을 위한 신경망 기계 번역 장치 및 그 동작방법이 제공될 수 있다.According to the present disclosure, a neural network machine translation apparatus and an operation method thereof for improving learning efficiency and accuracy of translation of a neural network machine translation can be provided.

한편, 본 개시의 일 양상에 따르면 신경망 기계 번역 장치의 동작방법을 수행하기 위해 실행가능한 명령들(executable instructions)을 가지는 소프트웨어 또는 컴퓨터-판독가능한 매체(computer-readable medium)가 제공될 수 있다. 상기 실행가능한 명령들은, 원문에 대한 제1 개념 밀도 벡터를 생성하는 단계, 원문에 대응되는 대역문에 대한 제2 개념 밀도 벡터를 생성하는 단계, 생성된 제1 개념 밀도 벡터와 생성된 제2 개념 밀도 벡터의 거리를 결정하는 단계, 원문을 기초로 대역문에 대한 예측문을 결정하고, 결정된 예측문과 대역문의 크로스 엔트로피를 결정하는 단계 및 결정된 거리와 결정된 크로스 엔트로피를 이용하여 획득된 손실 함수를 기초로 신경망을 학습시키는 단계를 포함하고, 제1 개념 밀도 벡터는 원문에 아핀 변환을 수행하여 획득되고, 제2 개념 밀도 벡터는 대역문의 고유 어휘 토큰 수와 임베딩 벡터 길이를 이용하여 획득되는 것을 특징으로 할 수 있다.On the other hand, according to one aspect of the present disclosure, software or a computer-readable medium having executable instructions for performing the method of operation of the neural network machine translation apparatus may be provided. The executable instructions comprising the steps of generating a first conceptual density vector for the original text, generating a second conceptual density vector for the band sentence corresponding to the original text, generating a second conceptual density vector for the generated second conceptual density vector, Determining a distance to the density vector, determining a predictive statement for the band sentence based on the original text, determining the determined cross-entropy of the predictive statement and the band statement, and determining a loss function obtained using the determined distance and the determined cross entropy Wherein the first concept density vector is obtained by performing an affine transformation on the original text and the second concept density vector is obtained using the number of unique lexical tokens of the query string and the embedding vector length. can do.

본 개시의 예시적인 방법들은 설명의 명확성을 위해서 동작의 시리즈로 표현되어 있지만, 이는 단계가 수행되는 순서를 제한하기 위한 것은 아니며, 필요한 경우에는 각각의 단계가 동시에 또는 상이한 순서로 수행될 수도 있다. 본 개시에 따른 방법을 구현하기 위해서, 예시하는 단계에 추가적으로 다른 단계를 포함하거나, 일부의 단계를 제외하고 나머지 단계를 포함하거나, 또는 일부의 단계를 제외하고 추가적인 다른 단계를 포함할 수도 있다.Although the exemplary methods of this disclosure are represented by a series of acts for clarity of explanation, they are not intended to limit the order in which the steps are performed, and if necessary, each step may be performed simultaneously or in a different order. In order to implement the method according to the present disclosure, the illustrative steps may additionally include other steps, include the remaining steps except for some steps, or may include additional steps other than some steps.

본 개시의 다양한 실시 예는 모든 가능한 조합을 나열한 것이 아니고 본 개시의 대표적인 양상을 설명하기 위한 것이며, 다양한 실시 예에서 설명하는 사항들은 독립적으로 적용되거나 또는 둘 이상의 조합으로 적용될 수도 있다.The various embodiments of the disclosure are not intended to be all-inclusive and are intended to be illustrative of the typical aspects of the disclosure, and the features described in the various embodiments may be applied independently or in a combination of two or more.

또한, 본 개시의 다양한 실시 예는 하드웨어, 펌웨어(firmware), 소프트웨어, 또는 그들의 결합 등에 의해 구현될 수 있다. 하드웨어에 의한 구현의 경우, 하나 또는 그 이상의 ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 범용 프로세서(general processor), 컨트롤러, 마이크로 컨트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다. In addition, various embodiments of the present disclosure may be implemented by hardware, firmware, software, or a combination thereof. In the case of hardware implementation, one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays A general processor, a controller, a microcontroller, a microprocessor, and the like.

본 개시의 범위는 다양한 실시 예의 방법에 따른 동작이 장치 또는 컴퓨터 상에서 실행되도록 하는 소프트웨어 또는 머신-실행가능한 명령들(예를 들어, 운영체제, 애플리케이션, 펌웨어(firmware), 프로그램 등), 및 이러한 소프트웨어 또는 명령 등이 저장되어 장치 또는 컴퓨터 상에서 실행 가능한 비-일시적 컴퓨터-판독가능 매체(non-transitory computer-readable medium)를 포함한다. The scope of the present disclosure is to be accorded the broadest interpretation as understanding of the principles of the invention, as well as software or machine-executable instructions (e.g., operating system, applications, firmware, Instructions, and the like are stored and are non-transitory computer-readable medium executable on the device or computer.

300 : 신경망 기계 번역 장치
310 : 인코더
320 : 디코더
330 : 제약조건 영향력 제어 계층부300: Neural network machine translation device
310: encoder
320: decoder
330 constraint influence control layer

Claims

A method of operating a neural network machine translation apparatus based on neural network learning using a contract condition influence control layer,
Generating a first conceptual density vector for the original text;
Generating a second conceptual density vector for a band sentence corresponding to the original text;
Determining a distance between the generated first conceptual density vector and the generated second conceptual density vector;
Determining a prediction statement for the bandwidth statement based on the original statement, and determining a cross-entropy of the determined prediction statement and the bandwidth query; And
Learning the neural network based on the determined distance and the loss function obtained using the determined cross entropy,
Wherein the first conceptual density vector is obtained by performing an affine transformation on the original text and the second conceptual density vector is obtained using the number of unique lexical tokens of the query and the embedding vector length. A method of operating a neural network machine translation device.