JP7262231B2

JP7262231B2 - learning devices and programs

Info

Publication number: JP7262231B2
Application number: JP2019009658A
Authority: JP
Inventors: 秀弥美野; 功雄後藤; 一郎山田
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2019-01-23
Filing date: 2019-01-23
Publication date: 2023-04-21
Anticipated expiration: 2039-01-23
Also published as: JP2020119244A

Description

本発明は、翻訳装置、学習装置、およびプログラムに関する。 The present invention relates to a translation device, a learning device, and a program.

ニューラルネットワークによる機械学習を利用した機械翻訳では、学習データが多いほど、翻訳精度が向上する。翻訳精度を向上させるためには、できるだけ多くの学習データを準備することが望ましいが、そのためには高いコストがかかる。学習データの量を増やすために、ドメインが異なるデータを合わせて学習させることも考えられるが、ドメインが異なるデータを用いることにより翻訳精度が低下するという問題がある。 In machine translation using machine learning by neural networks, the more learning data there is, the more the translation accuracy improves. In order to improve translation accuracy, it is desirable to prepare as much learning data as possible, but this requires high costs. In order to increase the amount of learning data, it is conceivable to combine data of different domains for learning.

非特許文献１に記載されている技術では、複数のドメインに属する学習データを合わせて機械学習を行った後で、目的とするドメインに属するデータのみを用いて再学習を行っている。 In the technique described in Non-Patent Document 1, machine learning is performed by combining learning data belonging to a plurality of domains, and then re-learning is performed using only data belonging to a target domain.

非特許文献２に記載されている技術では、分類問題において、ドメインが異なるデータで学習したモデルの一部を共有させている。 In the technique described in Non-Patent Document 2, part of a model trained with data of different domains is shared in the classification problem.

特許文献１には、自動翻訳の素性重み最適化装置が記載されている。この素性重み最適化装置では、素性重み最適化部２７８が、複数のドメイン開発セット２１２を用い、複数のドメイン別統計モデル２７２及び汎用統計モデル２７４から得られる素性又はその対数の線形補間を用いて自然言語の翻訳を行う際の各素性重みを最適化する。素性重み最適化部２７８は、ドメイン開発セット２１２の各々について設けられたドメイン別素性記憶領域を持つ。その各々は、汎用統計モデルの素性を記憶する第１の領域と、複数のドメイン開発セットから得られる素性を記憶する複数の第２の領域と、重みの最適化に用いられる損失関数の値を記憶する第３の領域とを含む。 Patent Document 1 describes a feature weight optimization device for automatic translation. In this feature weight optimization device, a feature weight optimization unit 278 uses a plurality of domain development sets 212, uses features obtained from a plurality of domain-specific statistical models 272 and a general statistical model 274, or uses linear interpolation of their logarithms to Optimizing each feature weight for natural language translation. The feature weight optimization unit 278 has a domain-specific feature storage area provided for each domain development set 212 . Each of which contains a first area for storing features of a generic statistical model, a plurality of second areas for storing features from multiple domain development sets, and loss function values used for weight optimization. and a third area for storing.

特開２０１７－１５１８０４号公報JP 2017-151804 A

Rico Sennrich，Barry Haddow，Alexandra Birch，Improving neural machine translation models with monolingual data，Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics，p.86-96，Berlin，Germany，August 7-12, 2016.Rico Sennrich, Barry Haddow, Alexandra Birch, Improving neural machine translation models with monolingual data, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, p.86-96, Berlin, Germany, August 7-12, 2016. Young-Bum Kim，Karl Stratos，Ruhi Sarikaya，Frustratingly Easy Neural Domain Adaptation，Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers，pages 387-396，Osaka，Japan，December 11-17 2016.Young-Bum Kim, Karl Stratos, Ruhi Sarikaya, Frustratingly Easy Neural Domain Adaptation, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 387-396, Osaka, Japan, December 11-17 2016.

従来技術では、複数のドメインの翻訳モデル間で知識を共有することができないという問題があった。非特許文献１に記載されている技術では、性質の異なる学習データ（例えば、旅行会話分野の文と、特許文献の文といったようにドメインの異なる学習データ）を合わせて学習するため、翻訳精度が落ちる場合があった。非特許文献２には、分類問題において、ドメインが異なるデータで学習したモデルの一部を共有させているが、翻訳処理をはじめとする生成問題には適応することができないという問題があった。 Conventional techniques have the problem that knowledge cannot be shared between translation models of multiple domains. In the technology described in Non-Patent Document 1, learning data with different properties (for example, learning data in different domains such as sentences in the travel conversation field and sentences in patent documents) are combined for learning, so the translation accuracy is high. There were times when I fell. In Non-Patent Document 2, a part of the model trained with data of different domains is shared in the classification problem, but there is a problem that it cannot be applied to generation problems such as translation processing.

機械翻訳では、学習データが多ければ多いほど精度が高くなることが知られており、ドメインごとに学習データを分離して学習することは非効率である。本発明は、上記の事情を考慮して為されたものであり、ドメインの異なる学習データを完全に分離することなく、複数のドメインの翻訳モデル間で知識を共有できるようにするための翻訳装置、学習装置、およびプログラムを提供しようとするものである。 In machine translation, it is known that the more learning data, the higher the accuracy, and it is inefficient to learn by separating the learning data for each domain. The present invention has been made in view of the above circumstances, and provides a translation device for sharing knowledge between translation models of a plurality of domains without completely separating learning data of different domains. , a learning device, and a program.

［１］上記の課題を解決するため、本発明の一態様による学習装置は、第１ドメインにおけるエンコード処理のパラメーターに基づいて、前記第１ドメインに属する入力文であって原言語による入力文のエンコード処理を行う第１エンコーダー部と、第２ドメインにおけるエンコード処理のパラメーターに基づいて、前記第２ドメインに属する入力文であって原言語による入力文のエンコード処理を行う第２エンコーダー部と、前記第１ドメインと前記第２ドメインとで共有されるエンコード処理のパラメーターに基づいて、前記第１ドメインまたは前記第２ドメインのいずれかに属する前記入力文のエンコード処理を行う共有エンコーダー部と、前記第１エンコーダー部におけるエンコード処理の結果として出力される第１意味ベクトルと、前記共有エンコーダー部におけるエンコード処理の結果として出力される共通意味ベクトルと、前記第１ドメインにおけるデコード処理のパラメーターとに基づいて、前記入力文に対応する出力文を生成する第１デコーダー部と、前記第２エンコーダー部におけるエンコード処理の結果として出力される第２意味ベクトルと、前記共有エンコーダー部におけるエンコード処理の結果として出力される共通意味ベクトルと、前記第２ドメインにおけるデコード処理のパラメーターとに基づいて、前記入力文に対応する出力文を生成する第２デコーダー部と、を具備し、前記第１ドメインに属する原言語および目的言語の文の対である文対が学習データとして入力された場合には前記原言語による入力文を基に前記第１エンコーダー部と前記共有エンコーダー部と前記第１デコーダー部との処理によって出力文を生成し、前記文対の目的言語による文と、当該出力文との差に基づいて、前記第１エンコーダー部におけるエンコード処理のパラメーターと、前記共有エンコーダー部におけるエンコード処理のパラメーターと、前記第１デコーダー部におけるデコード処理のパラメーターとを更新し、前記第２ドメインに属する原言語および目的言語の文の対である文対が学習データとして入力された場合には前記原言語による入力文を基に前記第２エンコーダー部と前記共有エンコーダー部と前記第２デコーダー部との処理によって出力文を生成し、前記文対の目的言語による文と、当該出力文との差に基づいて、前記第２エンコーダー部におけるエンコード処理のパラメーターと、前記共有エンコーダー部におけるエンコード処理のパラメーターと、前記第２デコーダー部におけるデコード処理のパラメーターとを更新するものである。 [1] In order to solve the above problems, a learning device according to an aspect of the present invention converts an input sentence belonging to the first domain into a source language based on encoding processing parameters in the first domain. a first encoder unit that performs an encoding process; a second encoder unit that performs an encoding process on an input sentence belonging to the second domain and in a source language based on parameters for the encoding process in the second domain; a shared encoder unit that encodes the input sentence belonging to either the first domain or the second domain based on encoding parameters shared between the first domain and the second domain; Based on a first semantic vector output as a result of encoding processing in one encoder unit, a common semantic vector output as a result of encoding processing in the shared encoder unit, and parameters of decoding processing in the first domain, A first decoder unit that generates an output sentence corresponding to the input sentence, a second semantic vector that is output as a result of encoding processing in the second encoder unit, and a second semantic vector that is output as a result of encoding processing in the shared encoder unit a second decoder unit that generates an output sentence corresponding to the input sentence based on a common semantic vector and parameters of the decoding process in the second domain; When a sentence pair, which is a pair of sentences in a language, is input as training data, an output sentence is generated based on the input sentence in the source language by the processing of the first encoder unit, the shared encoder unit, and the first decoder unit. is generated, and based on the difference between the sentence in the target language of the sentence pair and the output sentence, parameters for encoding processing in the first encoder unit, parameters for encoding processing in the shared encoder unit, and the first Decoding parameters in the decoder unit are updated, and when a sentence pair that is a pair of sentences in the source language and the target language belonging to the second domain is input as learning data, based on the input sentence in the source language An output sentence is generated by processing of the second encoder unit, the shared encoder unit, and the second decoder unit, and based on the difference between the sentence in the target language of the sentence pair and the output sentence, the second encoder unit section, the encoding process parameter in the shared encoder section, and the decoding process parameter in the second decoder section are updated.

［２］また、本発明の一態様は、上記の学習装置において、前記第１エンコーダー部によるエンコード処理の結果として出力される第１意味ベクトルの要素と、前記共有エンコーダー部によるエンコード処理の結果として出力される共通意味ベクトルの要素とを並べて得られる連結ベクトルを、前記第１ドメインにおける低次元化処理のパラメーターに基づいて低次元化し、低次元化した結果である第１低次元化ベクトルを出力する第１低次元化部、を具備し、前記第１デコーダー部は、前記第１低次元化部が出力する前記第１低次元化ベクトルと、前記第１ドメインにおけるデコード処理のパラメーターとに基づいて、前記入力文に対応する出力文を生成し、前記文対の目的言語による文と、前記第１デコーダー部からの前記出力文との差に基づいて、前記第１低次元化部における低次元化処理のパラメーターをも更新する、ものである。 [2] Further, according to one aspect of the present invention, in the above learning device, the element of the first semantic vector output as a result of encoding processing by the first encoder unit and the result of encoding processing by the shared encoder unit are The concatenated vector obtained by arranging the elements of the common semantic vector to be output is reduced in dimension based on the parameters of the dimension reduction processing in the first domain, and the first reduced dimension vector that is the result of the reduction in dimension is output. The first decoder unit is based on the first reduced-order vector output by the first order reduction unit and the parameters of the decoding process in the first domain to generate an output sentence corresponding to the input sentence, and based on the difference between the sentence in the target language of the sentence pair and the output sentence from the first decoder unit, It also updates the parameters of the dimensioning process.

［３］また、本発明の一態様は、上記の学習装置において、前記第２エンコーダー部によるエンコード処理の結果として出力される第２意味ベクトルの要素と、前記共有エンコーダー部によるエンコード処理の結果として出力される共通意味ベクトルの要素とを並べて得られる連結ベクトルを、前記第２ドメインにおける低次元化処理のパラメーターに基づいて低次元化し、低次元化した結果である第２低次元化ベクトルを出力する第２低次元化部、を具備し、前記第２デコーダー部は、前記第２低次元化部が出力する前記第２低次元化ベクトルと、前記第２ドメインにおけるデコード処理のパラメーターとに基づいて、前記入力文に対応する出力文を生成し、前記文対の目的言語による文と、前記第２デコーダー部からの前記出力文との差に基づいて、前記第２低次元化部における低次元化処理のパラメーターをも更新する、ものである。 [3] In addition, in the above learning device, an element of a second semantic vector output as a result of encoding processing by the second encoder unit and an element of the second semantic vector output as a result of encoding processing by the shared encoder unit are The concatenated vector obtained by arranging the elements of the common semantic vector to be output is reduced in dimension based on the parameters of the dimension reduction processing in the second domain, and the second reduced dimension vector that is the result of the reduction in dimension is output. and the second decoder unit is configured based on the second reduced-order vector output by the second order reduction unit and the parameters of the decoding process in the second domain. to generate an output sentence corresponding to the input sentence, and based on the difference between the sentence in the target language of the sentence pair and the output sentence from the second decoder unit, It also updates the parameters of the dimensioning process.

［４］また、本発明の一態様は、上記の学習装置において、前記第１エンコーダー部によるエンコード処理の結果として出力される第１意味ベクトルと、前記共有エンコーダー部によるエンコード処理の結果として出力される共通意味ベクトルとの直交誤差である第１直交誤差を算出する第１直交誤差算出部と、前記第２エンコーダー部によるエンコード処理の結果として出力される第２意味ベクトルと、前記共有エンコーダー部によるエンコード処理の結果として出力される共通意味ベクトルとの直交誤差である第２直交誤差を算出する第２直交誤差算出部と、を具備し、前記文対の目的言語による文と前記第１デコーダー部から出力される前記出力文との差とともに、前記第１直交誤差算出部が算出した前記第１直交誤差にも基づいて、前記第１エンコーダー部におけるエンコード処理のパラメーターと、前記共有エンコーダー部におけるエンコード処理のパラメーターと、前記第１デコーダー部におけるデコード処理のパラメーターとを更新し、前記文対の目的言語による文と前記第２デコーダー部から出力される前記出力文との差とともに、前記第２直交誤差算出部が算出した前記第２直交誤差にも基づいて、前記第２エンコーダー部におけるエンコード処理のパラメーターと、前記共有エンコーダー部におけるエンコード処理のパラメーターと、前記第２デコーダー部におけるデコード処理のパラメーターとを更新する、ものである。 [4] Further, according to one aspect of the present invention, in the above learning device, a first semantic vector output as a result of encoding processing by the first encoder unit and a first semantic vector output as a result of encoding processing by the shared encoder unit a first orthogonal error calculation unit that calculates a first orthogonal error that is an orthogonal error with the common semantic vector, a second semantic vector output as a result of encoding processing by the second encoder unit, and a common encoder unit a second orthogonal error calculator that calculates a second orthogonal error that is an orthogonal error between the common semantic vector output as a result of the encoding process, and the sentence in the target language of the sentence pair and the first decoder unit Based on the first orthogonal error calculated by the first orthogonal error calculation unit as well as the difference from the output sentence output from, the encoding processing parameters in the first encoder unit and the encoding in the shared encoder unit updating parameters of processing and parameters of decoding processing in the first decoder unit, and updating the difference between the sentence in the target language of the sentence pair and the output sentence output from the second decoder unit, together with the second orthogonal Also based on the second orthogonal error calculated by the error calculation unit, parameters for encoding processing in the second encoder unit, parameters for encoding processing in the shared encoder unit, and parameters for decoding processing in the second decoder unit. to update the .

［５］本発明の一態様による翻訳装置は、第１ドメインにおけるエンコード処理のパラメーターに基づいて、原言語による入力文のエンコード処理を行う第１エンコーダー部と、前記第１ドメインおよび他のドメインで共有されるエンコード処理のパラメーターに基づいて、前記入力文のエンコード処理を行う共有エンコーダー部と、前記第１エンコーダー部におけるエンコード処理の結果として出力される第１意味ベクトルと、前記共有エンコーダー部におけるエンコード処理の結果として出力される共通意味ベクトルと、前記第１ドメインにおけるデコード処理のパラメーターとに基づいて、前記入力文に対応する出力文を生成する第１デコーダー部と、を具備するものである。 [5] A translation device according to an aspect of the present invention includes: a first encoder unit that encodes an input sentence in a source language based on encoding processing parameters in the first domain; A shared encoder unit that performs encoding processing on the input sentence based on shared encoding processing parameters, a first semantic vector that is output as a result of the encoding processing in the first encoder unit, and encoding in the shared encoder unit A first decoder section for generating an output sentence corresponding to the input sentence based on a common semantic vector output as a result of processing and parameters of the decoding process in the first domain.

［６］また、本発明の一態様は、上記の翻訳装置において、前記第１エンコーダー部によるエンコード処理の結果として出力される第１意味ベクトルの要素と、前記共有エンコーダー部によるエンコード処理の結果として出力される共通意味ベクトルの要素とを並べて得られる連結ベクトルを、前記第１ドメインにおける低次元化処理のパラメーターに基づいて低次元化し、低次元化した結果である第１低次元化ベクトルを出力する第１低次元化部、を具備し、前記第１デコーダー部は、前記第１低次元化部が出力する前記第１低次元化ベクトルと、前記第１ドメインにおけるデコード処理のパラメーターとに基づいて、前記入力文に対応する出力文を生成するものである。 [6] Further, in the translation device described above, an element of the first semantic vector output as a result of the encoding process by the first encoder unit and an element of the first semantic vector output as a result of the encoding process by the shared encoder unit are The concatenated vector obtained by arranging the elements of the common semantic vector to be output is reduced in dimension based on the parameters of the dimension reduction processing in the first domain, and the first reduced dimension vector that is the result of the reduction in dimension is output. The first decoder unit is based on the first reduced-order vector output by the first order reduction unit and the parameters of the decoding process in the first domain to generate an output sentence corresponding to the input sentence.

［７］また、本発明の一態様は、上記の翻訳装置において、前記他のドメインの数は、１以上である、というものである。 [7] Further, according to one aspect of the present invention, in the above translation device, the number of the other domains is one or more.

［８］また、本発明の一態様は、上記の翻訳装置において、前記第１エンコーダー部におけるエンコード処理のパラメーターと、前記共有エンコーダー部におけるエンコード処理のパラメーターと、前記第１デコーダー部におけるデコード処理のパラメーターとは、上記［１］から［４］までのいずれか一項に記載の学習装置の処理によって求められたものである。 [8] Further, according to one aspect of the present invention, in the translation device described above, parameters for encoding processing in the first encoder unit, parameters for encoding processing in the shared encoder unit, parameters for decoding processing in the first decoder unit, A parameter is obtained by the processing of the learning device described in any one of [1] to [4] above.

［９］また、本発明の一態様は、コンピューターを、第１ドメインにおけるエンコード処理のパラメーターに基づいて、前記第１ドメインに属する入力文であって原言語による入力文のエンコード処理を行う第１エンコーダー部と、第２ドメインにおけるエンコード処理のパラメーターに基づいて、前記第２ドメインに属する入力文であって原言語による入力文のエンコード処理を行う第２エンコーダー部と、前記第１ドメインと前記第２ドメインとで共有されるエンコード処理のパラメーターに基づいて、前記第１ドメインまたは前記第２ドメインのいずれかに属する前記入力文のエンコード処理を行う共有エンコーダー部と、前記第１エンコーダー部におけるエンコード処理の結果として出力される第１意味ベクトルと、前記共有エンコーダー部におけるエンコード処理の結果として出力される共通意味ベクトルと、前記第１ドメインにおけるデコード処理のパラメーターとに基づいて、前記入力文に対応する出力文を生成する第１デコーダー部と、前記第２エンコーダー部におけるエンコード処理の結果として出力される第２意味ベクトルと、前記共有エンコーダー部におけるエンコード処理の結果として出力される共通意味ベクトルと、前記第２ドメインにおけるデコード処理のパラメーターとに基づいて、前記入力文に対応する出力文を生成する第２デコーダー部と、を具備し、前記第１ドメインに属する原言語および目的言語の文の対である文対が学習データとして入力された場合には前記原言語による入力文を基に前記第１エンコーダー部と前記共有エンコーダー部と前記第１デコーダー部との処理によって出力文を生成し、前記文対の目的言語による文と、当該出力文との差に基づいて、前記第１エンコーダー部におけるエンコード処理のパラメーターと、前記共有エンコーダー部におけるエンコード処理のパラメーターと、前記第１デコーダー部におけるデコード処理のパラメーターとを更新し、前記第２ドメインに属する原言語および目的言語の文の対である文対が学習データとして入力された場合には前記原言語による入力文を基に前記第２エンコーダー部と前記共有エンコーダー部と前記第２デコーダー部との処理によって出力文を生成し、前記文対の目的言語による文と、当該出力文との差に基づいて、前記第２エンコーダー部におけるエンコード処理のパラメーターと、前記共有エンコーダー部におけるエンコード処理のパラメーターと、前記第２デコーダー部におけるデコード処理のパラメーターとを更新する、学習装置として機能させるためのプログラムである。 [9] Further, according to one aspect of the present invention, a computer performs encoding processing of an input sentence belonging to the first domain and in a source language based on encoding processing parameters in the first domain. an encoder unit; a second encoder unit that encodes an input sentence belonging to the second domain and in a source language based on encoding processing parameters in the second domain; a shared encoder unit that performs encoding processing of the input sentence belonging to either the first domain or the second domain based on encoding processing parameters shared by the two domains; and encoding processing in the first encoder unit. corresponding to the input sentence based on a first semantic vector output as a result of, a common semantic vector output as a result of encoding processing in the shared encoder unit, and parameters of decoding processing in the first domain a first decoder unit that generates an output sentence; a second semantic vector that is output as a result of encoding processing in the second encoder unit; a common semantic vector that is output as a result of encoding processing in the shared encoder unit; and a second decoder unit that generates an output sentence corresponding to the input sentence based on parameters of the decoding process in the second domain, and a pair of source language and target language sentences belonging to the first domain When a certain sentence pair is input as learning data, an output sentence is generated based on the input sentence in the source language through processing by the first encoder unit, the shared encoder unit, and the first decoder unit, and the sentence is parameters for encoding processing in the first encoder unit, parameters for encoding processing in the shared encoder unit, and parameters for decoding processing in the first decoder unit based on the difference between the sentence in the paired target language and the output sentence. parameters, and when a sentence pair, which is a pair of sentences in the source language and the target language belonging to the second domain, is input as learning data, the second encoder unit and the An output sentence is generated by the processing of the shared encoder unit and the second decoder unit, and parameters for encoding processing in the second encoder unit based on the difference between the sentence in the target language of the sentence pair and the output sentence and a program for functioning as a learning device that updates encoding processing parameters in the shared encoder section and decoding processing parameters in the second decoder section.

［１０］また、本発明の一態様は、コンピューターを、第１ドメインにおけるエンコード処理のパラメーターに基づいて、原言語による入力文のエンコード処理を行う第１エンコーダー部と、前記第１ドメインおよび他のドメインで共有されるエンコード処理のパラメーターに基づいて、前記入力文のエンコード処理を行う共有エンコーダー部と、前記第１エンコーダー部におけるエンコード処理の結果として出力される第１意味ベクトルと、前記共有エンコーダー部におけるエンコード処理の結果として出力される共通意味ベクトルと、前記第１ドメインにおけるデコード処理のパラメーターとに基づいて、前記入力文に対応する出力文を生成する第１デコーダー部と、を具備する翻訳装置として機能させるためのプログラムである。 [10] Further, according to one aspect of the present invention, a computer includes a first encoder unit that performs encoding processing of an input sentence in a source language based on encoding processing parameters in the first domain, the first domain and other A shared encoder unit that performs encoding processing of the input sentence based on encoding processing parameters shared by the domain, a first semantic vector that is output as a result of the encoding processing in the first encoder unit, and the shared encoder unit. and a first decoder unit that generates an output sentence corresponding to the input sentence based on a common semantic vector output as a result of the encoding process in and parameters of the decoding process in the first domain. It is a program for functioning as

本発明によれば、異なるドメイン間で知識を共有するための学習処理を行える。また、異なるドメイン間で共有した知識（モデル）に基づいて翻訳処理を行える。このような知識の共有のしくみを実現することにより学習データの量を増やすことができるため、翻訳精度を上げることができる。 According to the present invention, learning processing for sharing knowledge between different domains can be performed. In addition, translation processing can be performed based on knowledge (model) shared between different domains. By realizing such a mechanism for sharing knowledge, the amount of learning data can be increased, so that translation accuracy can be improved.

本発明の第１実施形態による翻訳装置（学習装置）の概略機能構成を示すブロック図である。1 is a block diagram showing a schematic functional configuration of a translation device (learning device) according to a first embodiment of the present invention; FIG. 同実施形態による第１エンコーダー部と第２エンコーダー部と共有エンコーダー部とのそれぞれにおける、エンコード処理のモデルを示す概略図である。FIG. 4 is a schematic diagram showing a model of encoding processing in each of the first encoder section, the second encoder section, and the shared encoder section according to the same embodiment; 同実施形態による第１デコーダー部および第２デコーダー部のそれぞれにおける、デコード処理のモデルを示す概略図である。FIG. 5 is a schematic diagram showing a model of decoding processing in each of the first decoder section and the second decoder section according to the same embodiment; 同実施形態による第１低次元化部および第２低次元化部のそれぞれにおける、低次元化の処理を示す概略図である。FIG. 5 is a schematic diagram showing the order reduction processing in each of a first order reduction unit and a second order reduction unit according to the same embodiment; 同実施形態による翻訳装置の、学習処理の手順を示すフローチャートである。4 is a flow chart showing the procedure of learning processing of the translation device according to the same embodiment. 同実施形態による翻訳装置の、翻訳処理の手順を示すフローチャートである。4 is a flow chart showing the procedure of translation processing of the translation device according to the same embodiment. 第２実施形態による翻訳装置（学習装置）の概略機能構成を示すブロック図である。FIG. 11 is a block diagram showing a schematic functional configuration of a translation device (learning device) according to a second embodiment; 第３実施形態による翻訳装置（学習装置）の概略機能構成を示すブロック図である。FIG. 11 is a block diagram showing a schematic functional configuration of a translation device (learning device) according to a third embodiment; 第４実施形態による翻訳装置（学習装置）の概略機能構成を示すブロック図である。FIG. 11 is a block diagram showing a schematic functional configuration of a translation device (learning device) according to a fourth embodiment;

［第１実施形態］
次に、図面を参照しながら、本発明の一実施形態について説明する。本実施形態では、ニューラル機械翻訳モデルを、ドメイン間で共有する部分と、ドメイン内でのみ利用する部分とに分割する。ドメイン間で共有する部分については、複数の機械翻訳システムで共有して学習を行う。 [First embodiment]
Next, one embodiment of the present invention will be described with reference to the drawings. In this embodiment, the neural machine translation model is divided into a part shared between domains and a part used only within the domain. Parts shared between domains are shared by multiple machine translation systems for learning.

図１は、本実施形態による翻訳装置の概略機能構成を示すブロック図である。図示する翻訳装置１を、翻訳モデルの学習を行うための学習装置と捉えることもできる。図示するように、翻訳装置１は、第１入力部１１と、第１エンコーダー部１２と、第１低次元化部１３と、第１デコーダー部１４と、第１出力部１５と、第２入力部２１と、第２エンコーダー部２２と、第２低次元化部２３と、第２デコーダー部２４と、第２出力部２５と、共有エンコーダー部３１とを含んで構成される。これらの各機能部は、例えば、コンピューターと、プログラムとで実現することが可能である。また、各機能部は、必要に応じて、記憶手段を有する。記憶手段は、例えば、プログラム上の変数や、プログラムの実行によりアロケーションされるメモリである。また、必要に応じて、磁気ハードディスク装置やソリッドステートドライブ（ＳＳＤ）といった不揮発性の記憶手段を用いるようにしてもよい。また、各機能部の少なくとも一部の機能を、プログラムではなく専用の電子回路として実現してもよい。 FIG. 1 is a block diagram showing a schematic functional configuration of a translation device according to this embodiment. The illustrated translation device 1 can also be regarded as a learning device for learning a translation model. As illustrated, the translation apparatus 1 includes a first input unit 11, a first encoder unit 12, a first dimension reduction unit 13, a first decoder unit 14, a first output unit 15, a second input It includes a unit 21 , a second encoder unit 22 , a second order reduction unit 23 , a second decoder unit 24 , a second output unit 25 and a shared encoder unit 31 . Each of these functional units can be realized by, for example, a computer and a program. In addition, each functional unit has storage means as required. The storage means is, for example, variables on the program or memory allocated by executing the program. Also, if necessary, non-volatile storage means such as a magnetic hard disk drive or a solid state drive (SSD) may be used. Also, at least part of the function of each functional unit may be realized as a dedicated electronic circuit instead of a program.

以下において、第１エンコーダー部１２や第２エンコーダー部２２や共有エンコーダー部３１のそれぞれを単に「エンコーダー」と呼ぶ場合がある。また、第１デコーダー部１４や第２デコーダー部２４のそれぞれを単に「デコーダー」と呼ぶ場合がある。 Hereinafter, each of the first encoder section 12, the second encoder section 22, and the shared encoder section 31 may be simply referred to as an "encoder". Moreover, each of the first decoder section 14 and the second decoder section 24 may be simply referred to as a "decoder".

第１エンコーダー部１２と、第１低次元化部１３と、第１デコーダー部１４と、共有エンコーダー部３１とを合わせて、第１翻訳モデル部１７と呼んでもよい。また、第２エンコーダー部２２と、第２低次元化部２３と、第２デコーダー部２４と、共有エンコーダー部３１とを合わせて、第２翻訳モデル部２７と呼んでもよい。第１翻訳モデル部１７と第２翻訳モデル部２７とは、互いに異なるドメインの翻訳モデルとして機能する。 The first encoder section 12 , the first order reduction section 13 , the first decoder section 14 , and the shared encoder section 31 may be collectively called a first translation model section 17 . Also, the second encoder section 22 , the second order reduction section 23 , the second decoder section 24 , and the shared encoder section 31 may be collectively called a second translation model section 27 . The first translation model section 17 and the second translation model section 27 function as translation models for domains different from each other.

ここで、ドメインとは、翻訳処理の対象の文が属する分野である。例えば、旅行会話文、特許文、放送字幕文、新聞文、ニュースアナウンス文などといった括りをドメインとすることができる。なお、上で挙げたドメインは一例であり、一般的なドメインはこれらの例に限らない。便宜的に、第１翻訳モデル部１７が対象とするドメインを第１ドメインと呼び、第２翻訳モデル部２７が対象とするドメインを第２ドメインと呼ぶ。つまり、翻訳装置１は、第１ドメインおよび第２ドメインという２つのドメイン用の翻訳モデルが統合された構成を持つ。 Here, a domain is a field to which a sentence to be translated belongs. For example, a domain can be a grouping of travel conversation sentences, patent sentences, broadcast caption sentences, newspaper sentences, news announcement sentences, and the like. Note that the domains listed above are only examples, and general domains are not limited to these examples. For convenience, the domain targeted by the first translation model unit 17 is called the first domain, and the domain targeted by the second translation model unit 27 is called the second domain. That is, translation apparatus 1 has a configuration in which translation models for two domains, the first domain and the second domain, are integrated.

翻訳装置１は、翻訳処理を行う場合には入力文を翻訳した結果を出力文として出力する。また、翻訳装置１は、入力文と出力文の対の集合を学習データとして取得し、翻訳モデルの機械学習を行う学習装置としても機能し得る。 The translation apparatus 1 outputs the result of translating an input sentence as an output sentence when performing translation processing. The translation device 1 can also function as a learning device that acquires a set of pairs of input sentences and output sentences as learning data and performs machine learning of a translation model.

第１入力部１１は、外部から、第１ドメインに属する文を取得し、第１エンコーダー部１２および共有エンコーダー部３１に渡す。第１入力部１１は、学習処理時には学習データの文対を取得し、その文対のうちの原言語による文を第１エンコーダー部１２および共有エンコーダー部３１に渡す。第１入力部１１は、翻訳処理時には翻訳対象である原言語による文を取得し、その入力文を第１エンコーダー部１２および共有エンコーダー部３１に渡す。なお、後段のエンコーダーが処理するために、第１入力部１１が入力文の形態素解析処理等を行ってもよい。第１入力部１１が、単語ごとに既に分割されている単語列のデータとして表された入力文のデータを取得するようにしてもよい。 The first input unit 11 acquires a sentence belonging to the first domain from the outside and passes it to the first encoder unit 12 and the shared encoder unit 31 . The first input unit 11 acquires a sentence pair of learning data during the learning process, and passes a sentence in the original language out of the sentence pair to the first encoder unit 12 and the shared encoder unit 31 . The first input unit 11 acquires a sentence in the original language to be translated during translation processing, and passes the input sentence to the first encoder unit 12 and the shared encoder unit 31 . Note that the first input unit 11 may perform morphological analysis processing, etc. of the input sentence for processing by a subsequent encoder. The first input unit 11 may acquire data of an input sentence expressed as word string data already divided into words.

第１エンコーダー部１２は、第１ドメインにおけるエンコード処理のパラメーターに基づいて、第１ドメインに属する入力文であって原言語による入力文のエンコード処理を行う。第１ドメインにおけるエンコード処理のパラメーターは、後述する翻訳処理において参照されるとともに、後述する学習処理においては誤差逆伝搬法等の方法によって更新され得る。第１エンコーダー部１２は、第１ドメインに属する入力文のエンコード処理を行うため、後述する学習処理においては、これら第１ドメインに固有の知識を蓄積する作用を持つ。つまり、第１エンコーダー部１２は、第１ドメインの特徴を知識として取り出す。 The first encoder unit 12 encodes an input sentence belonging to the first domain and written in the source language, based on parameters for the encoding process in the first domain. The parameters of the encoding process in the first domain can be referenced in the translation process, which will be described later, and updated by a method such as error back propagation in the learning process, which will be described later. Since the first encoder unit 12 encodes an input sentence belonging to the first domain, it has a function of accumulating knowledge unique to the first domain in the learning process described later. That is, the first encoder unit 12 extracts the features of the first domain as knowledge.

第１低次元化部１３は、第１エンコーダー部１２から出力される意味ベクトルの情報と、共有エンコーダー部３１から出力される意味ベクトルとの情報とを合わせた上で、その情報を低次元化する。具体的には、第１低次元化部１３は、第１エンコーダー部１２から出力される意味ベクトルと、共有エンコーダー部３１から出力される意味ベクトルとを連結し、連結後のベクトルを線形変換によって低次元化する。これにより、第１低次元化部１３は、第１エンコーダー部１２から得られる情報と、共有エンコーダー部３１から得られる情報とのうち、翻訳するために有用な部分のみを抽出し、他の部分を排除する作用を有する。第１低次元化部１３は、通常の機械翻訳システムが動作するのに十分な次元数の程度まで、前段から取得するベクトルを低次元化する。例えば、第１エンコーダー部１２から出力されるベクトルと共有エンコーダー部３１から出力されるベクトルとの次元数が同じであれば、両者を単純に連結することにより次元数は２倍になるが、第１低次元化部１３は、その次元数を半分に削減する。即ち、第１低次元化部１３は、第１エンコーダー部１２や共有エンコーダー部３１が出力する意味ベクトルの次元数程度までの低次元化を行う。第１低次元化部１３が、２つのエンコーダーから出力される情報の一部を削除することにより、後段における計算処理の高速化、効率化が可能となる。つまり、第１低次元化部１３は、第１エンコーダー部１２からの出力と、共有エンコーダー部３１からの出力を基に、第１デコーダー部１４に入力するためのデータを作成する。 The first dimensionality reduction unit 13 combines the information of the semantic vector output from the first encoder unit 12 and the information of the semantic vector output from the shared encoder unit 31, and then reduces the dimensionality of the information. do. Specifically, the first dimensionality reduction unit 13 concatenates the semantic vector output from the first encoder unit 12 and the semantic vector output from the shared encoder unit 31, and linearly transforms the concatenated vector to Reduce dimensionality. As a result, the first dimensionality reduction unit 13 extracts only the useful part for translation from the information obtained from the first encoder unit 12 and the information obtained from the shared encoder unit 31, and extracts the other parts has the effect of eliminating The first dimensionality reduction unit 13 reduces the dimensionality of the vector obtained from the previous stage to a degree sufficient for a normal machine translation system to operate. For example, if the vector output from the first encoder unit 12 and the vector output from the shared encoder unit 31 have the same number of dimensions, simply concatenating them doubles the number of dimensions. The one-dimensional reduction unit 13 reduces the number of dimensions by half. That is, the first dimensionality reduction unit 13 performs dimensionality reduction to approximately the number of dimensions of the semantic vectors output by the first encoder unit 12 and the shared encoder unit 31 . By deleting a part of the information output from the two encoders by the first dimensionality reduction unit 13, it is possible to increase the speed and efficiency of the calculation processing in the latter stage. That is, the first dimensionality reduction unit 13 creates data to be input to the first decoder unit 14 based on the output from the first encoder unit 12 and the output from the shared encoder unit 31 .

第１デコーダー部１４は、第１エンコーダー部におけるエンコード処理の結果として出力される第１意味ベクトルと、共有エンコーダー部におけるエンコード処理の結果として出力される共通意味ベクトルと、第１ドメインにおけるデコード処理のパラメーターとに基づいて、入力文に対応する出力文を生成する。第１デコーダー部１４が生成する出力文は、上記入力文（原言語による文）の翻訳文（目的言語による文）である。つまり、第１デコーダー部１４は、第１低次元化部１３からの出力を入力として、翻訳先である目的言語の文を出力する。 The first decoder unit 14 outputs a first semantic vector output as a result of encoding processing in the first encoder unit, a common semantic vector output as a result of encoding processing in the shared encoder unit, and a decoding process in the first domain. Generates an output sentence corresponding to the input sentence based on the parameters. The output sentence generated by the first decoder unit 14 is a translated sentence (sentence in the target language) of the input sentence (sentence in the source language). That is, the first decoder unit 14 receives the output from the first dimensionality reduction unit 13 as input, and outputs a sentence in the target language to be translated.

第１出力部１５は、第１デコーダー部１４によって出力された文を外部に出力する。 The first output unit 15 outputs the sentence output by the first decoder unit 14 to the outside.

第２入力部２１は、上述した第１入力部１１と同様の処理を、第２ドメインに関して実行するものである。その処理の詳細および作用については、既に第１入力部１１の説明において述べた通りであるため、ここでは説明を省略する。 The second input unit 21 performs the same processing as the first input unit 11 described above with respect to the second domain. Since the details and actions of the processing have already been described in the description of the first input unit 11, description thereof will be omitted here.

第２エンコーダー部２２は、上述した第１エンコーダー部１２と同様の処理を、第２ドメインに関して実行するものである。その処理の詳細および作用については、既に第１エンコーダー部１２の説明において述べた通りであるため、ここでは説明を省略する。 The second encoder section 22 performs the same processing as the above-described first encoder section 12 with respect to the second domain. Since the details and effects of the processing have already been described in the description of the first encoder unit 12, description thereof will be omitted here.

第２低次元化部２３は、上述した第１低次元化部１３と同様の処理を、第２ドメインに関して実行するものである。その処理の詳細および作用については、既に第１低次元化部１３の説明において述べた通りであるため、ここでは説明を省略する。つまり、第２低次元化部２３は、低次元化した結果である第２低次元化ベクトルを出力する。 The second dimension reduction unit 23 performs the same processing as the first dimension reduction unit 13 described above with respect to the second domain. The details and actions of the process have already been described in the description of the first dimension reduction unit 13, and therefore description thereof is omitted here. That is, the second dimension reduction unit 23 outputs the second dimension reduction vector that is the result of the dimension reduction.

第２デコーダー部２４は、上述した第１デコーダー部１４と同様の処理を、第２ドメインに関して実行するものである。その処理の詳細および作用については、既に第１デコーダー部１４の説明において述べた通りであるため、ここでは説明を省略する。 The second decoder section 24 performs the same processing as the first decoder section 14 described above with respect to the second domain. Since the details and actions of the processing have already been described in the description of the first decoder unit 14, description thereof will be omitted here.

第２出力部２５は、上述した第１出力部１５と同様の処理を、第２ドメインに関して実行するものである。その処理の詳細および作用については、既に第１出力部１５の説明において述べた通りであるため、ここでは説明を省略する。 The second output unit 25 performs the same processing as the first output unit 15 described above with respect to the second domain. Since the details and actions of the processing have already been described in the description of the first output unit 15, description thereof will be omitted here.

共有エンコーダー部３１は、第１ドメインと第２ドメインとで共有されるエンコード処理のパラメーターに基づいて、第１ドメインまたは第２ドメインのいずれかに属する入力文のエンコード処理を行う。共有エンコーダー部３１におけるエンコード処理のパラメーターは、後述する翻訳処理において参照されるとともに、後述する学習処理においては誤差逆伝搬法等の方法によって更新され得る。共有エンコーダー部３１は、第１ドメインまたは第２ドメインのいずれかに属する入力文（両ドメインの入力文）のエンコード処理を行うため、後述する学習処理においては、これら両ドメインに共通の知識を蓄積する作用を持つ。つまり、共有エンコーダー部３１は、ドメインに共通する特徴を知識として取り出す。 The shared encoder unit 31 encodes an input sentence belonging to either the first domain or the second domain based on encoding parameters shared between the first domain and the second domain. The parameters of the encoding process in the shared encoder unit 31 can be referred to in the translation process, which will be described later, and can be updated in the learning process, which will be described later, by a method such as error backpropagation. Since the shared encoder unit 31 encodes input sentences belonging to either the first domain or the second domain (input sentences of both domains), in the learning process described later, knowledge common to both domains is accumulated. have the effect of That is, the shared encoder unit 31 extracts features common to domains as knowledge.

つまり、翻訳装置１の構成において、１つのドメイン（第１ドメインあるいは第２ドメイン）の翻訳モデルをみたとき、エンコーダー部分が２つに分割されている。その２つとは、当該ドメインに専用のエンコーダーと、他のドメインと共有されるエンコーダーである。このため、第１エンコーダー部１２と第１低次元化部１３と第１デコーダー部１４とは、第１ドメインの学習データを用いた学習処理においてのみ学習される。また、第２エンコーダー部２２と第２低次元化部２３と第２デコーダー部２４とは、第２ドメインの学習データを用いた学習処理においてのみ学習される。これらに対して、共有エンコーダー部３１は、第１ドメインおよび第２ドメインのどちらの学習データを利用した場合においても学習される。 That is, in the configuration of the translation device 1, when looking at the translation model of one domain (first domain or second domain), the encoder portion is divided into two. Encoders dedicated to that domain and encoders shared with other domains. Therefore, the first encoder unit 12, the first order reduction unit 13, and the first decoder unit 14 are learned only in the learning process using the learning data of the first domain. Also, the second encoder unit 22, the second order reduction unit 23, and the second decoder unit 24 are learned only in the learning process using the learning data of the second domain. On the other hand, the shared encoder unit 31 is trained using learning data of either the first domain or the second domain.

本実施形態におけるエンコーダーおよびデコーダーのモデルは、再帰型ニューラルネットワーク（ＲＮＮ，Recurrent Neural Network）の構造をベースとする。あるいは、デコーダーのモデルは、ＲＮＮの一種であるＬＳＴＭ（Long Short-Term Memory）型のニューラルネットワーク構造をベースとしてもよい。再帰型ニューラルネットワーク自体は既存技術によるものである。再帰型ニューラルネットワークは、一般的なニューラルネットワークの一種である。再帰型ニューラルネットワークの特徴として、時系列データを処理することができる。この時系列データは、固定長の系列であっても、可変長の系列であってもよい。例えば時系列データの各要素を単語等として、再帰型ニューラルネットワークは、文を処理することができる。例えば、下記の各文献では、再帰型ニューラルネットワークについて記載されている。 The encoder and decoder models in this embodiment are based on the structure of a recurrent neural network (RNN). Alternatively, the decoder model may be based on an LSTM (Long Short-Term Memory) type neural network structure, which is a type of RNN. The recurrent neural network itself is based on existing technology. A recurrent neural network is one type of general neural network. A feature of recurrent neural networks is that they can process time-series data. This time-series data may be a fixed-length series or a variable-length series. For example, the recursive neural network can process sentences using each element of the time-series data as a word or the like. For example, the following documents describe recurrent neural networks.

参考文献：再帰型ニューラルネットワーク：ＲＮＮ入門，@kiminaka，2017年02月12日更新，ＵＲＬ：https://qiita.com/kiminaka/items/87afd4a433dc655d8cfd
参考文献：自然言語処理プログラミング勉強会８リカレントニューラルネット，Graham Neubig，奈良先端科学技術大学院大学，ＵＲＬ：http://www.phontron.com/slides/nlp-programming-ja-08-rnn.pdf References: Recurrent Neural Networks: Introduction to RNN, @kiminaka, updated on February 12, 2017, URL: https://qiita.com/kiminaka/items/87afd4a433dc655d8cfd
Reference: Natural Language Processing Programming Study Group 8 Recurrent Neural Net, Graham Neubig, Nara Institute of Science and Technology, URL: http://www.phontron.com/slides/nlp-programming-ja-08-rnn.pdf

次に、図２、図３、図４を参照しながら、本実施形態における処理の概略を説明する。 Next, the outline of the processing in this embodiment will be described with reference to FIGS. 2, 3, and 4. FIG.

図２は、本実施形態の、第１エンコーダー部１２と第２エンコーダー部２２と共有エンコーダー部３１とのそれぞれにおける、エンコード処理のモデルを示す概略図である。前述の通り、第１エンコーダー部１２と第２エンコーダー部２２と共有エンコーダー部３１とのそれぞれは、例えば、再帰型ニューラルネットワークを用いて実現される。 FIG. 2 is a schematic diagram showing a model of encoding processing in each of the first encoder section 12, the second encoder section 22, and the shared encoder section 31 of this embodiment. As described above, each of the first encoder section 12, the second encoder section 22, and the shared encoder section 31 is implemented using, for example, a recursive neural network.

同図において、ｈ_１，ｈ_２，・・・，ｈ_Ｍは、入力文に対応する時系列データである。ｈ_１，ｈ_２，・・・，ｈ_Ｍの各々は、入力文に含まれる単語等に対応し、例えばワンホット（one-hot）表現などを用いたベクトルとして表わされ得る。図示する例では、「私／は／京都／に／行く／。」（スラッシュは単語の区切りを表す）という入力文がエンコーダーに入力される。この例のように、句読点等も１つの単語として扱われる。また、Ｗ_１，Ｗ_２は、それぞれ、再帰型ニューラルネットワークにおけるパラメーターである。つまり、Ｗ_１は、ベクトルｈ_ｉをベクトルｅ_ｉに変換するための行列である。また、Ｗ_２は、ベクトルｅ_ｉをベクトルｅ_ｉ＋１に変換するための行列である（１≦ｉ≦Ｍ－１）。また、行列Ｗ_２は、ベクトルｅ_Ｍを内容ベクトルｃ_１に変換する際にも用いられる。Ｗ_１，Ｗ_２の各々の要素の値は、例えば更新可能なメモリ等で記憶され、機械学習処理によって更新され得る。 In the figure, h ₁ , h ₂ , . . . , h _M are time-series data corresponding to the input sentence. _Each of h ₁ , h ₂ , . In the illustrated example, an input sentence of "I/ha/Kyoto/ni/go/." (forward slashes denote word breaks) is input to the encoder. As in this example, punctuation marks and the like are treated as one word. W ₁ and W ₂ are parameters in the recurrent neural network, respectively. That is, _W1 is a matrix for transforming vector h _i into vector e _i . W ₂ is a matrix for transforming vector e _i into vector e _i+1 (1≦i≦M−1). The matrix _W2 is also used when converting the vector _eM into the content vector _c1 . The value of each element of W ₁ and W ₂ can be stored, for example, in updatable memory or the like and updated by machine learning processing.

エンコーダーは順次、時系列の入力を処理し、最終的に入力文に対応する内容ベクトルｃ_１を出力する。つまり、エンコーダーは、最初の入力ｈ_１とパラメーターＷ_１とに基づき、ベクトルｅ_１を生成する。次に、エンコーダーは、次の入力ｈ_２およびパラメーターＷ_１と、上記のベクトルｅ_１およびパラメーターＷ_２とに基づき、ベクトルｅ_２を生成する。以後同様に、エンコーダーは、入力ｈ_ｉ＋１およびパラメーターＷ_１と、既に生成されたベクトルｅ_ｉおよびパラメーターＷ_２とに基づき、ベクトルｅ_ｉ＋１を生成する（ただし、１≦ｉ≦（Ｍ－１））。そして、エンコーダーは、生成されたベクトルｅ_ｉ＋１とパラメーターＷ_２とに基づいて生成される内容ベクトルｃ_１を出力する。 The encoder sequentially processes the time-series input and finally outputs the content vector _c1 corresponding to the input sentence. That is, the encoder generates vector e ₁ based on the initial input h ₁ and parameter W ₁ . The encoder then generates _vector e2 based on the next input _h2 and parameter _W1 and the above vector _e1 and parameter _W2 . Similarly thereafter, the encoder generates vector e i+1 based on input h _i+1 and parameter W ₁ and already generated vector e _i and parameter _{W 2} ₍ where 1≦i≦(M−1)). . The encoder then outputs a content vector _c1 generated based on the generated vector e _i+1 and the parameter _W2 .

内容ベクトルｃ_１は、入力文に対応する時系列データｈ_１，ｈ_２，・・・，ｈ_Ｍの情報を含むものである。内容ベクトルｃ_１は、例えば、２５０次元程度のベクトルである。ただし、内容ベクトルｃ_１の次元数は、例えば、２５０，５００，１０００，２０００等、適宜定められてよい。 The content vector c ₁ contains information on the time-series data h ₁ , h ₂ , . . . , h _M corresponding to the input sentence. The content vector _c1 is, for example, a vector of about 250 dimensions. However, the number of dimensions of the content vector _c1 may be determined as appropriate, such as 250, 500, 1000, 2000, for example.

図３は、本実施形態の、第１デコーダー部１４および第２デコーダー部２４のそれぞれにおける、デコード処理のモデルを示す概略図である。前述の通り、第１デコーダー部１４と第２デコーダー部２４のそれぞれは、例えば、再帰型ニューラルネットワークを用いて実現される。 FIG. 3 is a schematic diagram showing a model of decoding processing in each of the first decoder section 14 and the second decoder section 24 of this embodiment. As described above, each of the first decoder section 14 and the second decoder section 24 is implemented using, for example, a recursive neural network.

同図において、ｃ_２は、デコーダーに入力される内容ベクトルである。また、ｙ_１，ｙ_２，・・・，ｙ_Ｌ１は、デコーダーから出力される時系列データである。ｙ_１，ｙ_２，・・・，ｙ_Ｌ１は、それぞれ、ワンホット表現などを用いたベクトルであり、単語に対応する。また、Ｗ_３，Ｗ_４は、それぞれ、再帰型ニューラルネットワークにおけるパラメーターである。つまり、Ｗ_３は、ベクトルｄ_ｉをベクトルｄ_ｉ＋１に変換するための行列である（１≦ｉ≦Ｌ１－１）。また、Ｗ_３は、デコーダーに入力されるベクトルｃ_２をベクトルｄ_１に変換する際にも用いられる。また、Ｗ_４は、ベクトルｄ_ｉをベクトルｙ_ｉに変換するための行列である（１≦ｉ≦Ｌ１）。Ｗ_３，Ｗ_４の要素の値もまた、メモリ等に記憶され、機械学習処理によって更新され得る。 In the figure, _c2 is the content vector input to the decoder. y ₁ , y ₂ , . . . , y _L1 are time-series data output from the decoder. y ₁ , y ₂ _, . W ₃ and W ₄ are parameters in the recurrent neural network, respectively. That is, W ₃ is a matrix for transforming vector d _i into vector d _i+1 (1≦i≦L1−1). _W3 is also used when converting vector _c2 input to the decoder into vector _d1 . _W4 is a matrix for transforming vector d _i into vector y _i (1≦i≦L1). The values of the elements of W ₃ and W ₄ may also be stored in memory or the like and updated by machine learning processing.

デコーダーは、入力される内容ベクトルｃ_２を基に、系列データＹ_１，Ｙ_２，・・・，Ｙ_Ｌ１を生成し、出力する。つまり、デコーダーは、まず入力される内容ベクトルｃ_２とパラメーターＷ_３とに基づき、ベクトルｄ_１を生成する。そして、デコーダーは、ベクトルｄ_１とパラメーターＷ_４とに基づき、最初の出力データｙ_１を生成する。次に、デコーダーは、ベクトルｄ_１と、パラメーターＷ_３と、生成された出力データｙ_１とに基づき、ベクトルｄ_２を生成する。以後同様に、デコーダーは、ベクトルｄ_ｉ－１とパラメーターＷ_３と出力データｙ_ｉ－１とに基づきベクトルｄ_ｉを生成するとともに、ベクトルｄ_ｉおよびパラメーターＷ_４とに基づき出力データｙ_ｉを生成する（ただし、２≦ｉ≦Ｌ１）。 The decoder generates and outputs series data Y ₁ , Y ₂ , . . . , _YL1 based on the input content vector _c2 . That is, the decoder first generates vector _d1 based on the input content vector _c2 and parameter _W3 . Then the decoder generates the first output data _y1 based on vector _d1 and parameter _W4 . The decoder then generates vector d2 based on vector d1 _, parameter _W3 , and _{the generated output data y1} _. Similarly, the decoder generates vector d i based on vector d _i−1 , parameter W ₃ and output data y _i−1 , and generates output _{data y i} _based on vector d _i and parameter W ₄ . (where 2≤i≤L1).

デコーダーが出力する時系列データｙ_１，ｙ_２，・・・，ｙ_Ｌ１は、入力される内容ベクトルｃ_２をデコードして得られるデータである。一例として、時系列データｙ_１，ｙ_２，・・・，ｙ_Ｌ１は、「I／go／to／Kyoto／．」（スラッシュは単語の区切りを表す）といった単語列に対応する。ここで、文末を表すピリオドも、一単語として扱うことができる。 The time-series data y ₁ , y ₂ , . . . , y _L1 output by the decoder are data obtained by decoding the input content vector c ₂ . As an example _, the time-series data y ₁ , y ₂ , . Here, a period indicating the end of a sentence can also be treated as one word.

図４は、本実施形態の、第１低次元化部１３および第２低次元化部２３のそれぞれにおける、低次元化の処理を示す概略図である。次に述べる通り、第１低次元化部１３および第２低次元化部２３は、ニューラルネットワークを用いて実現される。 FIG. 4 is a schematic diagram showing the order reduction processing in each of the first order reduction unit 13 and the second order reduction unit 23 of the present embodiment. As described below, the first order reduction unit 13 and the second order reduction unit 23 are implemented using neural networks.

同図において、ベクトルｈ_ｅｎｃは、第１エンコーダー部１２または第２エンコーダー部２２から出力される内容ベクトルである。また、ベクトルｈ_ｓｅｎｃは、共有エンコーダー部３１から出力される内容ベクトルである。第１低次元化部１３または第２低次元化部２３は、まず、ベクトルｈ_ｅｎｃとベクトルｈ_ｓｅｎｃとを単純に連結し、その連結結果であるベクトルをｈ_ｃｏｎｃ生成する。次に、第１低次元化部１３または第２低次元化部２３は、連結されたベクトルｈ_ｃｏｎｃと、パラメーターＷ_ｌｏｗｄとに基づき、低次元化されたベクトルｈ_ｌｏｗｄを生成する。この低次元化の処理は、例えば、線形変換を用いて行われる。このパラメーターＷ_ｌｏｗｄは、行列であり、その行列の要素は学習処理によって更新され得る値である。つまり、ベクトルｈ_ｃｏｎｃに行列Ｗ_ｌｏｗｄを乗ずることによって、また適宜ベクトルを転地することによって、ベクトルｈ_ｌｏｗｄが得られる。 In the figure, vector h _enc is the content vector output from the first encoder section 12 or the second encoder section 22 . A vector h _senc is a content vector output from the shared encoder unit 31 . First, the first order reduction unit 13 or the second order reduction unit 23 simply concatenates the vector h _enc and the vector h _senc , and generates the concatenated vector h _conc . Next, the first order reduction unit 13 or the second order reduction unit 23 generates a reduced order vector h _lowd based on the concatenated vector h _conc and the parameter W _lowd . This dimensionality reduction process is performed using, for example, linear transformation. This parameter W _lowd is a matrix, and the elements of the matrix are values that can be updated by the learning process. That is, the vector h _lowd is obtained by multiplying the vector h _conc by the matrix W _lowd and transposing the vector accordingly.

具体的には、第１低次元化部１３は、第１エンコーダー部１２から出力されるベクトルｈ_ｅｎｃと共有エンコーダー部３１から出力されるベクトルｈ_ｓｅｎｃとを連結し、さらに低次元化することによって、低次元化されたベクトルｈ_ｌｏｗｄを生成する。第１低次元化部１３は、生成したベクトルｈ_ｌｏｗｄを第１デコーダー部１４に渡す。また、第２低次元化部２３は、第２エンコーダー部２２から出力されるベクトルｈ_ｅｎｃと共有エンコーダー部３１から出力されるベクトルｈ_ｓｅｎｃとを連結し、さらに低次元化することによって、低次元化されたベクトルｈ_ｌｏｗｄを生成する。第２低次元化部２３は、生成したベクトルｈ_ｌｏｗｄを第２デコーダー部２４に渡す。 Specifically, the first dimension reduction unit 13 concatenates the vector h _enc output from the first encoder unit 12 and the vector h _senc output from the shared encoder unit 31, and further reduces the dimension by , produces the reduced vector _{h_lowd} . The first dimension reduction unit 13 passes the generated vector h _low to the first decoder unit 14 . Further, the second dimension reduction unit 23 concatenates the vector _{h_enc} output from the second encoder unit 22 and the vector _{h_senc} output from the shared encoder unit 31, and further reduces the dimension, thereby reducing the dimension generates a reduced vector _{h_lowd} . The second dimension reduction unit 23 passes the generated vector h _low to the second decoder unit 24 .

ベクトルｈ_ｅｎｃとベクトルｈ_ｓｅｎｃの次元数がＨであるとき、連結結果であるベクトルｈ_ｃｏｎｃの次元数は２Ｈである。また、低次元化処理の結果であるベクトルｈ_ｌｏｗｄの次元数は、例えば、Ｈである。このように、第１低次元化部１３と第２低次元化部２３のそれぞれは、ドメイン固有のエンコード処理結果（ベクトルｈ_ｅｎｃ）と、ドメイン間で共通のエンコード処理結果（ベクトルｈ_ｓｅｎｃ）との情報を含みながら、その冗長性を削減した低次元化ベクトルｈ_ｌｏｗｄを出力する。第１低次元化部１３や第２低次元化部２３を持つことにより、翻訳装置１は、リーズナブルな計算資源を用いて翻訳モデルを実現する。 When the number of dimensions of vector h _enc and vector h _senc is H, the number of dimensions of vector h _conc , which is the result of concatenation, is 2H. Also, the number of dimensions of the vector h _lowd , which is the result of the dimension reduction process, is H, for example. In this way, each of the first order reduction unit 13 and the second order reduction unit 23 generates a domain-specific encoding process result (vector h _enc ) and a common encoding process result (vector h _senc ) between domains. output a low-dimensional vector h _lowd that contains the information of and has its redundancy reduced. By having the first order reduction unit 13 and the second order reduction unit 23, the translation device 1 realizes a translation model using reasonable computational resources.

次に、翻訳装置１の、学習処理時および翻訳処理時のそれぞれの処理手順について、フローチャートを参照しながら説明する。 Next, the processing procedures of the translation apparatus 1 during learning processing and translation processing will be described with reference to flowcharts.

図５は、本実施形態による翻訳装置の、学習処理の手順を示すフローチャートである。この学習処理の前提として、外部から大量の学習データが与えられる。学習データは、原言語による入力文と目的言語による出力文（正解文）の対の集合である。学習データに含まれる各文対は、第１ドメインまたは第２ドメインのいずれかに属するものであり、どのドメインに属するものであるかは既知である。以下、このフローチャートに沿って、学習処理の手順を説明する。 FIG. 5 is a flow chart showing the procedure of learning processing of the translation device according to the present embodiment. As a prerequisite for this learning process, a large amount of learning data is given from the outside. The learning data is a set of pairs of input sentences in the source language and output sentences (correct sentences) in the target language. Each sentence pair included in the learning data belongs to either the first domain or the second domain, and it is known to which domain it belongs. The procedure of the learning process will be described below along this flow chart.

ステップＳ１において、翻訳装置１は、外部から与えられる学習データのうち、未処理の文対の１つを選択する。 In step S1, the translation apparatus 1 selects one unprocessed sentence pair from the externally supplied learning data.

ステップＳ２において、翻訳装置１は、ステップＳ１で選択された文対が第１ドメインに属するものであるか否かを判定し、判定結果に応じて処理を分岐する。ある文対が第１ドメインまたは第２ドメインのいずれに属するものであるかを表す情報は、学習データの一部として与えられている。当該文対が第１ドメインに属するものである場合（ステップＳ２：ＹＥＳ）、第１入力部１１が当該文対を処理し、次にステップＳ３に進む。当該文対が第１ドメインに属するものでない場合、即ち当該文対が第２ドメインに属するものである場合（ステップＳ２：ＮＯ）、第２入力部２１が当該文対を処理し、次にステップＳ７に進む。ステップＳ３に進む場合には、ステップＳ３からＳ６までの処理を順次行った後で、ステップＳ１１に移る。ステップＳ７に進む場合には、ステップＳ７からＳ１０までの処理を順次行った後で、ステップＳ１１に移る。 At step S2, the translation apparatus 1 determines whether or not the sentence pair selected at step S1 belongs to the first domain, and branches the processing according to the determination result. Information indicating whether a certain sentence pair belongs to the first domain or the second domain is given as part of the learning data. If the sentence pair belongs to the first domain (step S2: YES), the first input unit 11 processes the sentence pair, and then proceeds to step S3. If the sentence pair does not belong to the first domain, that is, if the sentence pair belongs to the second domain (step S2: NO), the second input unit 21 processes the sentence pair, and then step Proceed to S7. When proceeding to step S3, the process proceeds to step S11 after sequentially performing the processes from steps S3 to S6. When proceeding to step S7, the process proceeds to step S11 after sequentially performing the processes from steps S7 to S10.

ステップＳ３において、第１入力部１１は、文対に含まれる入力文を単語ごとに分割し、適宜、時系列のベクトルデータにする。第１入力部１１は、この時系列のベクトルデータを、第１エンコーダー部１２と共有エンコーダー部３１とに渡す。第１エンコーダー部１２と共有エンコーダー部３１のそれぞれは、第１入力部１１から渡された時系列データを処理し、それぞれ、内容ベクトル（図２のベクトルｃ_１）を出力する。この内容ベクトルは、それぞれ、図４に示したベクトルｈ_ｅｎｃとベクトルｈ_ｓｅｎｃとにあたる。 In step S<b>3 , the first input unit 11 divides the input sentence included in the sentence pair into individual words, and converts it into time-series vector data as appropriate. The first input unit 11 passes the time-series vector data to the first encoder unit 12 and the shared encoder unit 31 . Each of the first encoder unit 12 and the shared encoder unit 31 processes the time-series data passed from the first input unit 11 and outputs a content vector (vector c ₁ in FIG. 2). These content vectors correspond to vector h _enc and vector h _senc shown in FIG. 4, respectively.

ステップＳ４において、第１低次元化部１３は、第１エンコーダー部１２と共有エンコーダー部３１から、それぞれ、ベクトルｈ_ｅｎｃとベクトルｈ_ｓｅｎｃを取得する。第１低次元化部１３は、図４に示したように、ベクトルｈ_ｅｎｃとベクトルｈ_ｓｅｎｃとを連結してベクトルｈ_ｃｏｎｃを生成する。そして、第１低次元化部１３は、パラメーターＷ_ｌｏｗｄに基づいてベクトルｈ_ｃｏｎｃを低次元化し、ベクトルｈ_ｌｏｗｄを出力する。 In step S4, the first order reduction unit 13 acquires the vector h _enc and the vector h _senc from the first encoder unit 12 and the shared encoder unit 31, respectively. As shown in FIG. 4, the first dimension reduction unit 13 connects the vector h _enc and the vector h _senc to generate the vector h _conc . Then, the first dimension reduction unit 13 reduces the dimension of the vector h _conc based on the parameter W _lowd and outputs the vector h _lowd .

ステップＳ５において、翻訳装置１は、第１低次元化部１３から出力されたベクトルｈ_ｌｏｗｄを、第１デコーダー部１４に入力する。第１デコーダー部１４は、ベクトルｈ_ｌｏｗｄをデコードする処理を行う。第１デコーダー部１４は、デコード処理の結果である時系列データを出力する。出力された時系列データは、必要に応じて単語の列に変換されてもよい。 At step S<b>5 , the translation apparatus 1 inputs the vector h _{low output} from the first dimension reduction unit 13 to the first decoder unit 14 . The first decoder unit 14 performs processing for decoding the vector h _low . The first decoder unit 14 outputs time-series data as a result of the decoding process. The output time-series data may be converted into a string of words as needed.

ステップＳ６において、翻訳装置１は、ステップＳ５において第１デコーダー部１４が出力したデータと、元の入力文対に含まれていた正解データとの誤差を算出する。また、翻訳装置１は、算出された誤差に基づいて、バックプロパゲーション（誤差逆伝搬法）により、第１エンコーダー部１２と、第１低次元化部１３と、第１デコーダー部１４と、共有エンコーダー部３１の中のパラメーターを調整する。つまり、翻訳装置１は、バックプロパゲーションにより、これらのパラメーターの値を更新する。具体的には、更新対象のパラメーターは、第１エンコーダー部１２におけるＷ_１とＷ_２（図２参照）、第１低次元化部１３におけるＷ_ｌｏｗｄ（図４参照）、第１デコーダー部１４におけるＷ_３とＷ_４（図３参照）、そして、共有エンコーダー部３１におけるＷ_１とＷ_２（図２参照）である。つまり、機械学習処理により、第１翻訳モデル部１７内のパラメーターが更新される。 In step S6, the translation device 1 calculates the error between the data output by the first decoder section 14 in step S5 and the correct data included in the original input sentence pair. Further, based on the calculated error, the translation device 1 uses back propagation (error backpropagation method) to share the Adjust the parameters in the encoder section 31 . That is, the translation device 1 updates the values of these parameters by back propagation. Specifically, the parameters to be updated are W ₁ and W ₂ in the first encoder unit 12 (see FIG. 2), W _lowd in the first order reduction unit 13 (see FIG. 4), and W ₃ and W ₄ (see FIG. 3), and W ₁ and W ₂ in the shared encoder section 31 (see FIG. 2). That is, the machine learning process updates the parameters in the first translation model unit 17 .

ステップＳ７にからＳ１０おいて、翻訳装置１の第２入力部２１、第２エンコーダー部２２、第２低次元化部２３、第２デコーダー部２４、および共有エンコーダー部３１は、ステップＳ３からＳ６で説明した処理と同様の処理を行う。ただし、ステップＳ３からＳ６までの処理が第１ドメインに関する処理であったのに対して、ステップＳ７にからＳ１０までの処理は第２ドメインに関する処理である。この一連の処理の結果として、第２ドメインに属する学習データに基づいて、機械学習処理により、第２翻訳モデル部２７内のパラメーターが更新される。具体的には、更新対象のパラメーターは、第２エンコーダー部２２におけるＷ_１とＷ_２（図２参照）、第２低次元化部２３におけるＷ_ｌｏｗｄ（図４参照）、第２デコーダー部２４におけるＷ_３とＷ_４（図３参照）、そして、共有エンコーダー部３１におけるＷ_１とＷ_２（図２参照）である。 In steps S7 to S10, the second input unit 21, second encoder unit 22, second order reduction unit 23, second decoder unit 24, and shared encoder unit 31 of the translation device 1 perform A process similar to the process described above is performed. However, while the processing from steps S3 to S6 is for the first domain, the processing from steps S7 to S10 is for the second domain. As a result of this series of processing, the parameters in the second translation model unit 27 are updated by machine learning processing based on the learning data belonging to the second domain. Specifically, the parameters to be updated are W ₁ and W ₂ in the second encoder unit 22 (see FIG. 2), W _lowd in the second order reduction unit 23 (see FIG. 4), and W ₃ and W ₄ (see FIG. 3), and W ₁ and W ₂ in the shared encoder section 31 (see FIG. 2).

ステップＳ６またはステップＳ１０のいずれかの処理が終了すると、ステップＳ１１に移る。 After completing either step S6 or step S10, the process proceeds to step S11.

ステップＳ１１において、学習データの処理がすべて終了したか否かを判定する。学習データの処理がすべて終了した場合（ステップＳ１１：ＹＥＳ）には、本フローチャートの処理全体を終了する。学習データの処理のすべてが完了していない場合、即ち、未処理の学習データが１文対以上残っている場合（ステップＳ１１：ＮＯ）には、次の文対を処理するために、ステップＳ１に戻る。 In step S11, it is determined whether or not the processing of all the learning data has been completed. If all learning data processing is completed (step S11: YES), the entire processing of this flowchart is completed. If all the processing of the learning data has not been completed, that is, if one or more sentence pairs remain unprocessed learning data (step S11: NO), step S1 is performed to process the next sentence pair. back to

上で説明したように、学習処理時には、翻訳装置１は、大量の学習データ（例えば、日本語の文と、その日本語文を英訳して得られた英語の文との、対の集合）を用いて、エンコーダー内、デコーダー内、低次元化部内の、パラメーターを修正する。 As described above, during the learning process, the translation device 1 receives a large amount of learning data (for example, a set of pairs of Japanese sentences and English sentences obtained by translating the Japanese sentences into English). are used to modify the parameters in the encoder, decoder and reducer.

一例として、学習データの文対が、日本語の「私は京都に行く。」と英語の「I go to Kyoto.」（正解文）であり、且つ上記日本語文を翻訳処理した結果（デコーダーからの出力）が「I went to Tokyo.」（出力文）である場合の処理は、次の通りである。正解文と出力文と差は、次の通りである。第１に、正解文における「go」に対応して、出力文では「went」が出力されている。第２に、正解文における「Kyoto」に対応して、出力文では「Tokyo」が出力されている。これらの差に基づいて、翻訳装置１は、例えば交差クロスエントロピーによる損失関数の値を計算する。つまり、翻訳装置１は、誤差を算出する。そして、翻訳装置１は、その誤差を小さくするようにパラメーター値の学習を実施する。 As an example, the sentence pair of the training data is Japanese "I go to Kyoto." and English "I go to Kyoto." (output) is "I went to Tokyo." (output sentence), the processing is as follows. The correct sentence, the output sentence, and the difference are as follows. First, "went" is output in the output sentence corresponding to "go" in the correct sentence. Second, "Tokyo" is output in the output sentence corresponding to "Kyoto" in the correct sentence. Based on these differences, the translation device 1 calculates the value of the loss function, for example by cross-cross entropy. That is, the translation device 1 calculates the error. Then, translation apparatus 1 learns parameter values so as to reduce the error.

なお、ここでは、一例として、学習データに含まれる文対が日本語文（入力側）と英語文（出力側）である場合を説明したが、入力側と出力側の言語が逆でもよい。また、日本語と英語以外の言語による文が学習データに含まれていてもよい。 Here, as an example, a sentence pair included in the learning data is a Japanese sentence (input side) and an English sentence (output side), but the languages of the input side and the output side may be reversed. Also, the learning data may include sentences in languages other than Japanese and English.

以上において説明したように、学習処理において、翻訳装置１は、第１ドメインに属する学習データに基づき、第１ドメインの文の特徴を用いて、第１エンコーダー部１２、第１低次元化部１３、第１デコーダー部１４におけるパラメーターを更新する。また、翻訳装置１は、第１ドメインに属する学習データに基づき、第２ドメインの文の特徴を用いて、第２エンコーダー部２２、第２低次元化部２３、第２デコーダー部２４におけるパラメーターを更新する。また、翻訳装置１は、第１ドメインの文および第２ドメインの両方の文の特徴を用いて、共有エンコーダー部３１のパラメーターを更新する。つまり、学習処理により、各パラメーターの値は、第１ドメインあるいは第２ドメインの特徴を表すようになる。 As described above, in the learning process, the translation device 1 converts the first encoder unit 12, the first dimensionality reduction unit 13 into , update the parameters in the first decoder unit 14 . Further, the translation apparatus 1 uses the sentence features of the second domain based on the learning data belonging to the first domain to determine the parameters in the second encoder section 22, the second dimension reduction section 23, and the second decoder section 24. Update. Also, the translation device 1 updates the parameters of the shared encoder unit 31 using the features of both the sentences in the first domain and the sentences in the second domain. In other words, the value of each parameter comes to represent the feature of the first domain or the second domain through the learning process.

つまり、学習処理は、次の通りの処理である。学習処理は、第１ドメインに属する原言語および目的言語の文の対である文対が学習データとして入力された場合には原言語による入力文を基に第１エンコーダー部１２と共有エンコーダー部３１と第１デコーダー部１４との処理によって出力文を生成し、文対の目的言語による文と、当該出力文との差に基づいて、第１エンコーダー部１２におけるエンコード処理のパラメーターと、共有エンコーダー部３１におけるエンコード処理のパラメーターと、第１デコーダー部１４におけるデコード処理のパラメーターとを更新する。この場合、学習処理は、さらに、第１低次元化部１３のパラメーターを更新する。また、学習処理は、第２ドメインに属する原言語および目的言語の文の対である文対が学習データとして入力された場合には原言語による入力文を基に第２エンコーダー部２２と共有エンコーダー部３１と第２デコーダー部２４との処理によって出力文を生成し、文対の目的言語による文と、当該出力文との差に基づいて、第２エンコーダー部２２におけるエンコード処理のパラメーターと、共有エンコーダー部３１におけるエンコード処理のパラメーターと、第２デコーダー部２４におけるデコード処理のパラメーターとを更新する。この場合、学習処理は、さらに、第２低次元化部２３のパラメーターを更新する。 That is, the learning process is as follows. In the learning process, when a sentence pair, which is a pair of sentences in the source language and the target language belonging to the first domain, is input as learning data, the first encoder unit 12 and the shared encoder unit 31 are processed based on the input sentence in the source language. and the first decoder unit 14 to generate an output sentence, and based on the difference between the sentence in the target language of the sentence pair and the output sentence, the encoding processing parameters in the first encoder unit 12 and the shared encoder unit 31 and the decoding parameters in the first decoder unit 14 are updated. In this case, the learning process further updates the parameters of the first order reduction unit 13 . In the learning process, when a sentence pair, which is a pair of sentences in the source language and the target language belonging to the second domain, is input as learning data, the second encoder unit 22 and the shared encoder are processed based on the input sentence in the source language. An output sentence is generated by the processing of the unit 31 and the second decoder unit 24, and based on the difference between the sentence in the target language of the sentence pair and the output sentence, the parameters of the encoding process in the second encoder unit 22 and the shared The parameters for the encoding process in the encoder section 31 and the parameters for the decoding process in the second decoder section 24 are updated. In this case, the learning process further updates the parameters of the second order reduction unit 23 .

図６は、本実施形態による翻訳装置の、翻訳処理の手順を示すフローチャートである。この翻訳処理の前提として、翻訳装置１内のモデルは予め学習済みである。また、翻訳装置１には外部から、原言語による翻訳対象文が与えられる。翻訳対象文は、第１ドメインまたは第２ドメインのいずれかに属するものであり、どのドメインに属するものであるかは既知である。以下、このフローチャートに沿って、翻訳処理の手順を説明する。 FIG. 6 is a flow chart showing the procedure of translation processing of the translation device according to the present embodiment. As a premise of this translation processing, the model in the translation apparatus 1 has already been learned. Also, the translation apparatus 1 is provided with a sentence to be translated in the original language from the outside. A sentence to be translated belongs to either the first domain or the second domain, and which domain it belongs to is known. The procedure of translation processing will be described below along this flow chart.

ステップＳ２１において、翻訳装置１は、外部から与えられる入力文を取得する。この入力文は、原言語（例えば、日本語）で記述された翻訳対象の文である。 At step S21, the translation device 1 acquires an input sentence given from the outside. This input sentence is a sentence to be translated written in a source language (for example, Japanese).

ステップＳ２２において、翻訳装置１は、ステップＳ２１で取得された入力文が第１ドメインに属するものであるか否かを判定し、判定結果に応じて処理を分岐する。入力文が第１ドメインまたは第２ドメインのいずれに属するものであるかを表す情報は、入力文とともに与えられる。当該入力文が第１ドメインに属するものである場合（ステップＳ２２：ＹＥＳ）、第１入力部１１が当該入力文を処理し、次にステップＳ２３に進む。当該入力文が第１ドメインに属するものでない場合、即ち当該入力文が第２ドメインに属するものである場合（ステップＳ２２：ＮＯ）、第２入力部２１が当該入力文を処理し、次にステップＳ２７に進む。ステップＳ２３に進む場合には、翻訳装置１は、ステップＳ２３からＳ２６までの処理を順次行う。ステップＳ７に進む場合には、翻訳装置１は、ステップＳ２７からＳ３０までの処理を順次行う。 At step S22, the translation apparatus 1 determines whether or not the input sentence obtained at step S21 belongs to the first domain, and branches the processing according to the determination result. Information indicating whether the input sentence belongs to the first domain or the second domain is provided together with the input sentence. If the input sentence belongs to the first domain (step S22: YES), the first input unit 11 processes the input sentence, and then proceeds to step S23. If the input sentence does not belong to the first domain, that is, if the input sentence belongs to the second domain (step S22: NO), the second input unit 21 processes the input sentence, and then step Proceed to S27. When proceeding to step S23, the translation apparatus 1 sequentially performs the processing from steps S23 to S26. When proceeding to step S7, the translation apparatus 1 sequentially performs the processes from steps S27 to S30.

ステップＳ２３において、第１入力部１１は、入力文を単語ごとに分割し、適宜、時系列のベクトルデータにする。第１入力部１１は、この時系列のベクトルデータを、第１エンコーダー部１２と共有エンコーダー部３１とに渡す。第１エンコーダー部１２と共有エンコーダー部３１のそれぞれは、第１入力部１１から渡された時系列データを処理し、それぞれ、内容ベクトル（図２のベクトルｃ_１）を出力する。この内容ベクトルは、それぞれ、図４に示したベクトルｈ_ｅｎｃとベクトルｈ_ｓｅｎｃとにあたる。 In step S<b>23 , the first input unit 11 divides the input sentence into words and converts them into time-series vector data as appropriate. The first input unit 11 passes the time-series vector data to the first encoder unit 12 and the shared encoder unit 31 . Each of the first encoder unit 12 and the shared encoder unit 31 processes the time-series data passed from the first input unit 11 and outputs a content vector (vector c ₁ in FIG. 2). These content vectors correspond to vector h _enc and vector h _senc shown in FIG. 4, respectively.

ステップＳ２４において、第１低次元化部１３は、第１エンコーダー部１２と共有エンコーダー部３１から、それぞれ、ベクトルｈ_ｅｎｃとベクトルｈ_ｓｅｎｃを取得する。第１低次元化部１３は、図４に示したように、ベクトルｈ_ｅｎｃとベクトルｈ_ｓｅｎｃとを連結してベクトルｈ_ｃｏｎｃを生成する。そして、第１低次元化部１３は、パラメーターＷ_ｌｏｗｄに基づいてベクトルｈ_ｃｏｎｃを低次元化し、ベクトルｈ_ｌｏｗｄを出力する。 In step S24, the first order reduction unit 13 acquires the vector h _enc and the vector h _senc from the first encoder unit 12 and the shared encoder unit 31, respectively. As shown in FIG. 4, the first dimension reduction unit 13 connects the vector h _enc and the vector h _senc to generate the vector h _conc . Then, the first dimension reduction unit 13 reduces the dimension of the vector h _conc based on the parameter W _lowd and outputs the vector h _lowd .

ステップＳ２５において、翻訳装置１は、第１低次元化部１３から出力されたベクトルｈ_ｌｏｗｄを、第１デコーダー部１４に入力する。第１デコーダー部１４は、ベクトルｈ_ｌｏｗｄをデコードする処理を行う。第１デコーダー部１４は、デコード処理の結果である時系列データを出力する。出力された時系列データは、単語の列に変換される。 In step S<b>25 , the translation device 1 inputs the vector h _low output from the first dimension reduction unit 13 to the first decoder unit 14 . The first decoder unit 14 performs processing for decoding the vector h _low . The first decoder unit 14 outputs time-series data as a result of the decoding process. The output time-series data is converted into a string of words.

ステップＳ２６において、第１出力部１５は、ステップＳ２５において第１デコーダー部１４が出力した単語の列を基に、出力文を作成する。第１出力部１５は、この出力文を翻訳結果として出力する。 In step S26, the first output unit 15 creates an output sentence based on the string of words output by the first decoder unit 14 in step S25. The first output unit 15 outputs this output sentence as a translation result.

ステップＳ２７にからＳ３０おいて、翻訳装置１の第２入力部２１、第２エンコーダー部２２、第２低次元化部２３、第２デコーダー部２４、第２出力部２５、および共有エンコーダー部３１は、ステップＳ２３からＳ２６で説明した処理と同様の処理を行う。ただし、ステップＳ２３からＳ２６までの処理が第１ドメインに関する処理であったのに対して、ステップＳ２７にからＳ３０までの処理は第２ドメインに関する処理である。この一連の処理の結果として、第２出力部２５は、出力文を翻訳結果として出力する。 In steps S27 to S30, the second input unit 21, the second encoder unit 22, the second order reduction unit 23, the second decoder unit 24, the second output unit 25, and the shared encoder unit 31 of the translation apparatus 1 , the same processing as that described in steps S23 to S26 is performed. However, while the processing from steps S23 to S26 is for the first domain, the processing from steps S27 to S30 is for the second domain. As a result of this series of processing, the second output unit 25 outputs the output sentence as a translation result.

ステップＳ２６またはステップＳ３０のいずれかの処理が終了すると、翻訳装置１は、本フローチャート全体の処理を終了する。 When the process of either step S26 or step S30 ends, translation device 1 ends the process of the entire flowchart.

一例として、入力文が「彼は京都に出かけた。」であり、当該入力文が第２ドメイン（例えば、旅行会話のドメイン）である場合、翻訳装置１は次の処理を行う。第２ドメインの翻訳モデルを持つ翻訳装置１は、第２エンコーダー部２２および共有エンコーダー部３１において、それぞれ、内容ベクトルｈ_ｅｎｃおよびｈ_ｓｅｎｃを獲得する。第２低次元化部２３は、これらの内容ベクトルｈ_ｅｎｃおよびｈ_ｓｅｎｃを基に、低次元化の処理を行い、ベクトルｈ_ｌｏｗｄを獲得する。第２デコーダー部２４は、ベクトルｈ_ｌｏｗｄに基づいて、出力単語列「He／went／to／Kyoto／．」を出力する。第２出力部２５は、第２デコーダー部２４の出力に基づき、翻訳結果である出力文「He went to Kyoto.」を出力する。 As an example, when the input sentence is "He went to Kyoto." Translation apparatus 1 having a second domain translation model obtains content vectors h _enc and h _senc in second encoder section 22 and shared encoder section 31, respectively. The second dimension reduction unit 23 performs dimension reduction processing based on these content vectors h _enc and h _senc , and acquires the vector h _lowd . The second decoder section 24 outputs the output word string "He/went/to/Kyoto/." based on the vector h _lowd . Based on the output of the second decoder unit 24, the second output unit 25 outputs the output sentence "He went to Kyoto."

以上説明したように、本実施形態では、翻訳装置１は、それぞれのドメインに対応した複数（２個）の翻訳モデルを有する。また、翻訳装置１は、各翻訳モデルにおいて、各ドメインに固有の部分（第１エンコーダー部１２、第１低次元化部１３、第１デコーダー部１４、第２エンコーダー部２２、第２低次元化部２３、第２デコーダー部２４）と、複数のドメインによって共有される部分（共有エンコーダー部３１）とを有する。翻訳装置１は、あるドメインに属する学習データを用いて学習処理を行うときには、当該ドメインに固有の部分と、上記の共有部分との学習を行い、モデルのパラメーターを更新する。このような構成により、ドメインに固有の知識は、上記のドメインに固有の部分に蓄積される。また、ドメイン間に跨る共通の知識は、上記の共有部分に蓄積される。つまり、あるドメインの翻訳モデルは、当該ドメインに属する学習データによる固有部分のモデルの学習だけでではなく、他のドメインに属する学習データによる共有部分のモデルの学習を行うことができる。つまり、他のドメインの学習データをも用いて、当該ドメインのモデルの学習を行うことができる。つまり、準備する学習データの量に対して、学習処理の量を多くすることができる。即ち、学習データを効率化することができる。学習データを準備することが高コストな作業であることが多いが、本実施形態では、学習のために必要なコスト（学習データを準備するコスト）を削減することができる。 As described above, in this embodiment, the translation device 1 has a plurality (two) of translation models corresponding to each domain. In each translation model, the translation apparatus 1 also includes a portion unique to each domain (first encoder section 12, first order reduction section 13, first decoder section 14, second encoder section 22, second order reduction part 23, second decoder part 24) and a part shared by multiple domains (shared encoder part 31). When the translation apparatus 1 performs learning processing using learning data belonging to a certain domain, the translation apparatus 1 learns the portion unique to the domain and the above-mentioned shared portion, and updates the parameters of the model. With such a configuration, domain-specific knowledge is accumulated in the domain-specific portion described above. In addition, common knowledge across domains is accumulated in the shared portion. In other words, a translation model of a certain domain can learn not only a model of a unique part by learning data belonging to the domain, but also a model of a common part by learning data belonging to another domain. In other words, learning data of other domains can also be used to train the model of the domain. That is, the amount of learning processing can be increased with respect to the amount of learning data to be prepared. That is, learning data can be made more efficient. Preparing learning data is often a costly task, but in this embodiment, the cost required for learning (the cost of preparing learning data) can be reduced.

一例として、第１ドメインがニュースアナウンス文であり、第２ドメインがテレビ番組の字幕テキストである場合には、本実施形態は次の貢献をする。第１ドメインのニュースアナウンス文を準備するためには高コストを要する。第２ドメインのテレビ番組の字幕テキストに関しては、学習データとして、既存の大量の資産を低コストで利用することができる。第１ドメインの学習データは、ニュース特有の言い回しを含めた文の表現を学習するために貢献する。第２ドメインの学習データは、アナウンスされるニュース原稿の文（話し言葉）ではないが、政治、経済、スポーツ、エンターテインメント等、様々な分野の文の表現（語彙等）を豊富にする学習のために貢献する。本実施形態の共有エンコーダー部３１は、第１ドメインの学習データだけからではなく、第２ドメインの学習データから得られる知識をも蓄積する。第１デコーダー部１４は、第１ドメインに専用の知識と、共有エンコーダー部３１のモデルに蓄積されるドメインに共通の知識とを反映した結果である出力文を出力する。 As an example, if the first domain is news announcement text and the second domain is television program closed caption text, the present embodiment contributes the following. It is expensive to prepare the news announcement text for the first domain. As for the subtitle texts of the television programs in the second domain, a large amount of existing resources can be used as learning data at low cost. The learning data of the first domain contributes to learning expressions of sentences including expressions peculiar to news. The learning data of the second domain is not the news manuscript sentences (spoken language) to be announced, but for learning to enrich sentence expressions (vocabulary, etc.) in various fields such as politics, economics, sports, entertainment, etc. To contribute. The shared encoder unit 31 of the present embodiment accumulates knowledge obtained not only from the learning data of the first domain but also from the learning data of the second domain. The first decoder unit 14 outputs an output sentence that is a result of reflecting the knowledge dedicated to the first domain and the knowledge common to the domains accumulated in the model of the shared encoder unit 31 .

第１低次元化部１３は、線形変換により、ベクトルの次元数を少なくする処理を行う。第１低次元化部１３は、第１デコーダー部１４が適切な出力を行うために十分な次元数まで、入力されるベクトルを低次元化する。つまり、第１低次元化部１３の処理によって、第１エンコーダー部１２と共有エンコーダー部３１の出力に含まれる、冗長な情報を削除することができる。これにより、計算の高速化、効率化が可能となる。 The first dimension reduction unit 13 performs processing to reduce the number of dimensions of the vector by linear transformation. The first dimensionality reduction unit 13 reduces the dimensionality of the input vector to a sufficient number of dimensions for the first decoder unit 14 to perform an appropriate output. In other words, redundant information included in the outputs of the first encoder unit 12 and the shared encoder unit 31 can be deleted by the processing of the first dimensionality reduction unit 13 . This enables faster and more efficient calculation.

以上、第１ドメインの翻訳モデルについて主に説明したが、翻訳装置１において、第１ドメインと第２ドメインとは対称な関係にあり、上記の説明は第２ドメインの翻訳モデルについても言えることである。 Although the translation model for the first domain has been mainly described above, the first domain and the second domain have a symmetrical relationship in the translation apparatus 1, and the above description can also be applied to the translation model for the second domain. be.

［第２実施形態］
次に、本発明の第２実施形態について説明する。なお、前実施形態において既に説明した事項については以下において説明を省略する場合がある。ここでは、本実施形態に特有の事項を中心に説明する。 [Second embodiment]
Next, a second embodiment of the invention will be described. In addition, description may be abbreviate|omitted below about the matter already demonstrated in the previous embodiment. Here, the description will focus on matters specific to this embodiment.

図７は、本実施形態による翻訳装置（学習装置）の概略機能構成を示すブロック図である。図示するように、翻訳装置２は、第１入力部１１と、第１エンコーダー部１２と、第１デコーダー部１４と、第１出力部１５と、第２入力部２１と、第２エンコーダー部２２と、第２デコーダー部２４と、第２出力部２５と、共有エンコーダー部３１とを含んで構成される。本実施形態において、第１エンコーダー部１２と、第１デコーダー部１４と、共有エンコーダー部３１とを合わせて、第１翻訳モデル部２１７と呼んでもよい。また、第２エンコーダー部２２と、第２デコーダー部２４と、共有エンコーダー部３１とを合わせて、第２翻訳モデル部２２７と呼んでもよい。第１翻訳モデル部２１７と第２翻訳モデル部２２７とは、互いに異なるドメインの翻訳モデルとして機能する。 FIG. 7 is a block diagram showing a schematic functional configuration of the translation device (learning device) according to this embodiment. As illustrated, the translation device 2 includes a first input section 11, a first encoder section 12, a first decoder section 14, a first output section 15, a second input section 21, and a second encoder section 22. , a second decoder unit 24 , a second output unit 25 , and a shared encoder unit 31 . In this embodiment, the first encoder section 12, the first decoder section 14, and the shared encoder section 31 may be collectively referred to as the first translation model section 217. Also, the second encoder section 22 , the second decoder section 24 and the shared encoder section 31 may be collectively called a second translation model section 227 . The first translation model section 217 and the second translation model section 227 function as translation models for domains different from each other.

つまり、本実施形態における翻訳装置２は、第１実施形態における翻訳装置１と異なり、第１低次元化部や第２低次元化部を持たない。即ち、翻訳装置２は、第１エンコーダー部１２から出力された意味ベクトルと、共有エンコーダー部３１から出力された意味ベクトルとを連結するものの、連結されたベクトルの低次元化の処理を行わない。第２エンコーダー部２２から出力された意味ベクトルと、共有エンコーダー部３１から出力された意味ベクトルとを連結して得られるベクトルは、低次元化されることなく、そのまま第１デコーダー部１４への入力として用いられる。また、第２エンコーダー部２２から出力された意味ベクトルと、共有エンコーダー部３１から出力された意味ベクトルとについても、同様に、連結された後に低次元化されず、そのまま第２デコーダー部２４への入力として用いられる。 That is, unlike the translation device 1 of the first embodiment, the translation device 2 of this embodiment does not have the first order reduction unit and the second order reduction unit. In other words, the translation device 2 connects the semantic vector output from the first encoder unit 12 and the semantic vector output from the shared encoder unit 31, but does not reduce the dimensionality of the concatenated vectors. A vector obtained by concatenating the semantic vector output from the second encoder unit 22 and the semantic vector output from the shared encoder unit 31 is input to the first decoder unit 14 as it is without being reduced in dimension. used as Likewise, the semantic vector output from the second encoder unit 22 and the semantic vector output from the shared encoder unit 31 are not reduced in dimension after being concatenated, and are sent to the second decoder unit 24 as they are. Used as input.

本実施形態の上記の構成によれば、低次元化処理を用いずに、翻訳モデルの学習処理を行ったり、学習済みの翻訳モデルを用いた機械翻訳処理を行ったりすることが、可能となる。 According to the above configuration of the present embodiment, it is possible to perform translation model learning processing or perform machine translation processing using a trained translation model without using dimensionality reduction processing. .

［第３実施形態］
次に、本発明の第３実施形態について説明する。なお、前実施形態までにおいて既に説明した事項については以下において説明を省略する場合がある。ここでは、本実施形態に特有の事項を中心に説明する。 [Third embodiment]
Next, a third embodiment of the invention will be described. In addition, description may be abbreviate|omitted below about the matter already demonstrated by the previous embodiment. Here, the description will focus on matters specific to this embodiment.

図８は、本実施形態による翻訳装置（学習装置）の概略機能構成を示すブロック図である。図示するように、翻訳装置３は、ｎ個（ｎ≧３）のドメインのそれぞれに関して、翻訳処理および学習処理を行う。具体的には、翻訳装置３は、第１ドメインから第ｎドメインまでの翻訳モデルを含んで構成される。そして、翻訳装置３における第ｉドメイン（１≦ｉ≦ｎ）の翻訳モデルは、第ｉ入力部ｉ－１と、第ｉエンコーダー部ｉ－２と、第ｉ低次元化部ｉ－３と、第ｉデコーダー部ｉ－４と、第ｉ出力部ｉ－５と、共有エンコーダー部３３１とを含んで構成される。共有エンコーダー部３３１は、第１ドメインから第ｎドメインまでの複数の翻訳モデルによって共有される。 FIG. 8 is a block diagram showing a schematic functional configuration of a translation device (learning device) according to this embodiment. As illustrated, the translation device 3 performs translation processing and learning processing for each of n (n≧3) domains. Specifically, the translation device 3 includes translation models from the first domain to the n-th domain. The translation model for the i-th domain (1≤i≤n) in the translation device 3 includes the i-th input section i-1, the i-th encoder section i-2, the i-th order reduction section i-3, It includes an i-th decoder unit i-4, an i-th output unit i-5, and a shared encoder unit 331. FIG. The shared encoder section 331 is shared by multiple translation models from the first domain to the n-th domain.

本実施形態では、翻訳装置３は、学習処理の際、第１ドメインから第ｎドメインまでのいずれかのドメインに属する文対（原言語および目的言語で記述された文）を取得する。翻訳装置３は、第ｉドメイン（１≦ｉｎ）に属する文対を用いて、第ｉドメインの翻訳モデルに含まれるパラメーター（共有エンコーダー部３３１のパラメーターを含む）を学習する。翻訳装置３は、翻訳処理の際、第１ドメインから第ｎドメインまでのいずれかのドメインに属する入力文（原言語文）を取得する。翻訳装置３は、第ｉドメインの翻訳モデルを用いて、当該入力文に対応する翻訳文（目的言語文）を出力する。 In this embodiment, the translation device 3 acquires sentence pairs (sentences written in the source language and the target language) belonging to any one of the first to n-th domains during the learning process. The translation device 3 learns the parameters (including the parameters of the shared encoder section 331) included in the translation model of the i-th domain (1≦in) using sentence pairs belonging to the i-th domain. The translation device 3 acquires an input sentence (original language sentence) belonging to any one of the first domain to the n-th domain during translation processing. The translation device 3 uses the i-th domain translation model to output a translation sentence (target language sentence) corresponding to the input sentence.

本実施形態によれば、２つのドメインに限らず、３つ以上のドメインにおいて翻訳のための知識を共有することができる。言い換えれば、翻訳装置３の共有エンコーダー部３３１は、３つ以上のドメインに属する文対を用いて学習されるため、これらすべてのドメインの文対に基づくパラメーターを有する。 According to this embodiment, knowledge for translation can be shared not only in two domains but also in three or more domains. In other words, since the shared encoder unit 331 of the translation device 3 is trained using sentence pairs belonging to three or more domains, it has parameters based on sentence pairs of all these domains.

なお、図８に示した構成では、第ｉドメインの翻訳モデルは、第ｉ低次元化部ｉ－３を備えていた。この第ｉ低次元化部ｉ－３を省略して実施してもよい。内容ベクトルを低次元化する機能を省略した構成の詳細については、既に第２実施形態において説明した通りである。つまり、第３実施形態と第２実施形態とを組み合わせて実施するようにしてもよい。 In the configuration shown in FIG. 8, the translation model of the i-th domain has the i-th reduction part i-3. This i-th order reduction unit i-3 may be omitted. The details of the configuration omitting the function of reducing the dimensionality of the content vectors are already described in the second embodiment. That is, the third embodiment and the second embodiment may be combined for implementation.

［第４実施形態］
次に、本発明の第４実施形態について説明する。なお、前実施形態までにおいて既に説明した事項については以下において説明を省略する場合がある。ここでは、本実施形態に特有の事項を中心に説明する。 [Fourth embodiment]
Next, a fourth embodiment of the invention will be described. In addition, description may be abbreviate|omitted below about the matter already demonstrated by the previous embodiment. Here, the description will focus on matters specific to this embodiment.

図９は、本実施形態による翻訳装置（学習装置）の概略機能構成を示すブロック図である。図示するように、翻訳装置４は、第１入力部１１と、第１エンコーダー部１２と、第１低次元化部１３と、第１デコーダー部１４と、第１出力部１５と、第２入力部２１と、第２エンコーダー部２２と、第２低次元化部２３と、第２デコーダー部２４と、第２出力部２５と、共有エンコーダー部３１と、第１直交誤差算出部４１９と、第２直交誤差算出部４２９とを含んで構成される。第１入力部１１と、第１エンコーダー部１２と、第１低次元化部１３と、第１デコーダー部１４と、第１出力部１５と、第２入力部２１と、第２エンコーダー部２２と、第２低次元化部２３と、第２デコーダー部２４と、第２出力部２５と、共有エンコーダー部３１の各部の機能等については、前実施形態までにおいてすでに説明した通りである。本実施形態の特徴は、翻訳装置４が、第１直交誤差算出部４１９と、第２直交誤差算出部４２９とを有する点である。 FIG. 9 is a block diagram showing a schematic functional configuration of a translation device (learning device) according to this embodiment. As illustrated, the translation device 4 includes a first input unit 11, a first encoder unit 12, a first dimension reduction unit 13, a first decoder unit 14, a first output unit 15, a second input section 21, second encoder section 22, second order reduction section 23, second decoder section 24, second output section 25, shared encoder section 31, first orthogonal error calculation section 419, and 2 orthogonal error calculator 429 . a first input unit 11, a first encoder unit 12, a first dimension reduction unit 13, a first decoder unit 14, a first output unit 15, a second input unit 21, and a second encoder unit 22; , the second order reduction unit 23, the second decoder unit 24, the second output unit 25, and the shared encoder unit 31 have already been described in the previous embodiments. A feature of this embodiment is that the translation apparatus 4 has a first orthogonal error calculator 419 and a second orthogonal error calculator 429 .

第１直交誤差算出部４１９は、第１エンコーダー部１２と共有エンコーダー部３１とが同一の入力文をエンコードしたときにそれぞれ出力する意味ベクトル間の直交誤差を算出する。この直交誤差をＬ_ｄｉｆｆとする。直交誤差Ｌ_ｄｉｆｆは、正の値として算出される。第１エンコーダー部１２から出力される意味ベクトルと共有エンコーダー部３１から出力される意味ベクトルとが完全に直交する場合に、直交誤差Ｌ_ｄｉｆｆの値は０である。両意味ベクトルの直交の度合いが低くなるほど、直交誤差Ｌ_ｄｉｆｆの値は大きくなる。 The first orthogonal error calculation unit 419 calculates the orthogonal error between the semantic vectors respectively output when the first encoder unit 12 and the shared encoder unit 31 encode the same input sentence. Let this orthogonal error be L _diff . The orthogonal error L _diff is calculated as a positive value. When the semantic vector output from the first encoder unit 12 and the semantic vector output from the shared encoder unit 31 are completely orthogonal, the value of the orthogonal error L _diff is zero. The lower the degree of orthogonality between both semantic vectors, the larger the value of the orthogonality error L _diff .

第２直交誤差算出部４２９は、第２エンコーダー部２２と共有エンコーダー部３１とが同一の入力文をエンコードしたときにそれぞれ出力する意味ベクトル間の直交誤差Ｌ_ｄｉｆｆを算出する。第２直交誤差算出部４２９が算出する直交誤差の値は、上記の第１直交誤差算出部４１９が算出する直交誤差の値と、同様のものであり、同様の意味を持つ。 The second orthogonal error calculator 429 calculates the orthogonal error L _diff between the semantic vectors respectively output when the second encoder 22 and the shared encoder 31 encode the same input sentence. The value of the orthogonal error calculated by the second orthogonal error calculator 429 is the same as the value of the orthogonal error calculated by the first orthogonal error calculator 419, and has the same meaning.

翻訳装置４は、学習処理時に、第１直交誤差算出部４１９あるいは第２直交誤差算出部４２９が算出する直交誤差Ｌ_ｄｉｆｆにも基づいて、パラメーター値の調整を行う。 During the learning process, the translation device 4 adjusts the parameter values based also on the orthogonal error L _diff calculated by the first orthogonal error calculator 419 or the second orthogonal error calculator 429 .

第１ドメインの翻訳モデルに関して、第１ドメインに属する文対を用いて、第１デコーダー部１４が出力する出力文と、当該文対に含まれる正解データ（目的語文）との差に基づいて誤差逆伝搬法を用いることは、第１実施形態等で既に説明した通りである。本実施形態では、翻訳装置４は、第１デコーダー部１４が出力する出力文と当該文対に含まれる正解データ（目的語文）との差だけではなく、第１直交誤差算出部４１９が算出する直交誤差Ｌ_ｄｉｆｆをも用いて、誤差逆伝搬法により、パラメーターの更新を行う。出力文と正解データとの誤差をＬ_{ｏｕｔｐｕｔ}とした場合、翻訳装置４がパラメーターの更新のために用いる総合誤差Ｌは、下の式で表される。ここで、αおよびβは、それぞれ、適宜設定される重み値である。 Regarding the translation model of the first domain, using a sentence pair belonging to the first domain, an error based on the difference between the output sentence output by the first decoder unit 14 and the correct data (object sentence) included in the sentence pair The use of the backpropagation method has already been explained in the first embodiment and the like. In this embodiment, the translation apparatus 4 not only calculates the difference between the output sentence output by the first decoder unit 14 and the correct data (object sentence) included in the sentence pair, but also the difference calculated by the first orthogonal error calculation unit 419. The orthogonal error L _diff is also used to update the parameters by error backpropagation. Assuming that the error between the output sentence and the correct answer data is L _output , the total error L used by the translation apparatus 4 for updating parameters is expressed by the following equation. Here, α and β are weight values that are appropriately set.

Ｌ＝α・Ｌ_{ｏｕｔｐｕｔ}＋β・Ｌ_ｄｉｆｆ L=α·L _output +β·L _diff

なお、第２直交誤差算出部４２９が算出する直交誤差Ｌ_ｄｉｆｆに基づいた第２ドメインの翻訳モデルのパラメーター更新についても、上述した第１ドメインの翻訳モデルのそれと同様である。 Note that parameter update of the translation model for the second domain based on the orthogonal error L _diff calculated by the second orthogonal error calculator 429 is the same as that for the translation model for the first domain described above.

本実施形態によると、第１直交誤差算出部４１９あるいは第２直交誤差算出部４２９は、それぞれ、第１エンコーダー部１２あるいは第２エンコーダー部２２（各ドメインに固有の専用エンコーダー）から出力される意味ベクトルと、共有エンコーダー部３１から出力される意味ベクトルとの直交誤差Ｌ_ｄｉｆｆを算出する。また、翻訳装置４は、学習処理時に、この直交誤差Ｌ_ｄｉｆｆにも基づく誤差伝搬法により、パラメーターを調整する。この学習処理は、第１エンコーダー部１２あるいは第２エンコーダー部２２から出力されるベクトルと、共有エンコーダー部３１から出力されるベクトルとの間の直交性が増す方向に作用する。つまり、充分な量の学習データを用いて学習を行うことにより、第１エンコーダー部１２あるいは第２エンコーダー部２２から出力されるベクトルと、共有エンコーダー部３１から出力されるベクトルとの間の直交性が高まる。これにより、各ドメインに固有の専用エンコーダーのモデルと共有エンコーダーのモデルとが重ならないようになる。即ち、専用エンコーダーのモデルと共有エンコーダーのモデルとの間の冗長性が削減され、効率的に翻訳モデルの学習を行うことができる。 According to the present embodiment, the first quadrature error calculator 419 or the second quadrature error calculator 429, respectively, the meaning output from the first encoder unit 12 or the second encoder unit 22 (dedicated encoder specific to each domain) An orthogonal error L _diff between the vector and the semantic vector output from the shared encoder unit 31 is calculated. Moreover, the translation device 4 adjusts the parameters by the error propagation method based also on this orthogonal error L _diff during the learning process. This learning process works to increase the orthogonality between the vector output from the first encoder section 12 or the second encoder section 22 and the vector output from the shared encoder section 31 . In other words, by performing learning using a sufficient amount of learning data, the orthogonality between the vector output from the first encoder unit 12 or the second encoder unit 22 and the vector output from the shared encoder unit 31 increases. This ensures that the models of dedicated encoders specific to each domain do not overlap with those of shared encoders. That is, the redundancy between the model of the dedicated encoder and the model of the shared encoder is reduced, and the translation model can be learned efficiently.

なお、図９に示した構成では、第１ドメインおよび第２ドメインの翻訳モデルは、それぞれ、第１低次元化部１３および第２低次元化部２３を備えていた。これらの第１低次元化部１３および第２低次元化部２３を省略して実施してもよい。内容ベクトルを低次元化する機能を省略した構成の詳細については、既に第２実施形態において説明した通りである。つまり、第４実施形態と第２実施形態とを組み合わせて実施するようにしてもよい。 In the configuration shown in FIG. 9, the translation models of the first domain and the second domain are provided with the first order reduction section 13 and the second order reduction section 23, respectively. The first dimension reduction unit 13 and the second dimension reduction unit 23 may be omitted. The details of the configuration omitting the function of reducing the dimensionality of the content vectors are already described in the second embodiment. That is, the fourth embodiment and the second embodiment may be combined for implementation.

また、図９に示した構成では、ドメイン数は２であった。ドメイン数を３以上として第４実施形態を実施してもよい。つまり、第４実施形態と第２実施形態とを組み合わせて実施するようにしてもよい。さらに、第４実施形態と、第３実施形態と、第２実施形態とを組み合わせて実施するようにしてもよい。 Also, in the configuration shown in FIG. 9, the number of domains was two. The fourth embodiment may be implemented with three or more domains. That is, the fourth embodiment and the second embodiment may be combined for implementation. Furthermore, the fourth embodiment, the third embodiment, and the second embodiment may be combined for implementation.

以上、複数の実施形態を説明したが、次のような変形例による実施をしてもよい。また、組み合わせることが可能な場合において、複数の変形例を組み合わせて実施してもよい。 A plurality of embodiments have been described above, but the following modifications may be implemented. Moreover, when combinations are possible, a plurality of modified examples may be combined and implemented.

［第１変形例］上で説明した実施形態において、翻訳装置（学習装置）が低次元化部を有する場合、すべてのドメインにおいて低次元化部を持つこととした。変形例として、すべてのドメインの翻訳モデルのうちの一部のドメインの翻訳モデルのみが低次元化部を持つようにしてもよい。例えば、２個のドメインのうちの片方のドメインのみに関して低次元化部を持つようにしてもよい。また、例えば、３個以上のドメインのうちの、任意の１個以上のドメインのみに関して低次元化部を持つようにしてもよい。 [First Modification] In the above-described embodiment, when the translation device (learning device) has a dimensionality reduction unit, all domains have the dimensionality reduction unit. As a modification, only some of the domain translation models among all the domain translation models may have the dimensionality reduction portion. For example, only one of the two domains may have a dimension reduction part. Also, for example, only one or more arbitrary domains among three or more domains may have a dimensionality reduction unit.

［第２変形例］ドメインとして、ニュースアナウンス文や放送字幕文等のドメインを例示して説明した箇所があるが、他のドメインに上記の各実施形態を適用してもよい。 [Second Modification] Although domains such as news announcement sentences and broadcast subtitle sentences have been described as examples of domains, the above embodiments may be applied to other domains.

［第３変形例］上記の各実施形態では、単一の装置が、学習装置と翻訳装置とを兼ねる場合を説明した。変形例として、学習装置と翻訳装置とが別の装置であってもよい。この場合、学習装置における学習処理の結果として得られる知識（パラメーターの数値等）を、例えばデータファイルとして翻訳装置側にコピーすることにより、翻訳装置は、その学習結果を用いた翻訳処理を行うことができる。 [Third Modification] In each of the above-described embodiments, a single device serves as both a learning device and a translation device. As a modification, the learning device and the translation device may be separate devices. In this case, by copying the knowledge (numerical values of parameters, etc.) obtained as a result of the learning process in the learning device to the translation device side as, for example, a data file, the translation device performs translation processing using the learning result. can be done.

なお、上述した実施形態（変形例を含む）における翻訳装置（学習装置）の少なくとも一部の機能をコンピューターで実現することができる。その場合、この機能を実現するためのプログラムをコンピューター読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピューターシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピューターシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ＵＳＢメモリ等の可搬媒体、コンピューターシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピューター読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、一時的に、動的にプログラムを保持するもの、その場合のサーバーやクライアントとなるコンピューターシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピューターシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 At least part of the functions of the translation device (learning device) in the above-described embodiments (including modifications) can be realized by a computer. In that case, a program for realizing this function may be recorded in a computer-readable recording medium, and the program recorded in this recording medium may be read into a computer system and executed. It should be noted that the "computer system" referred to here includes hardware such as an OS and peripheral devices. In addition, “computer-readable recording media” refers to portable media such as flexible discs, magneto-optical discs, ROMs, CD-ROMs, DVD-ROMs, USB memories, and storage devices such as hard disks built into computer systems. Say things. In addition, "computer-readable recording medium" means a medium that temporarily and dynamically retains a program, such as a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line. , it may also include something that holds the program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or client in that case. Further, the program may be for realizing part of the functions described above, or may be a program capable of realizing the functions described above in combination with a program already recorded in the computer system.

以上において説明した実施形態および変形例の特徴をまとめると、次の通りである。 The features of the embodiment and modifications described above are summarized as follows.

第１低次元化部または第２低次元化部の少なくともいずれかを有する場合、次の通りである。第１低次元化部１３は、第１エンコーダー部１２によるエンコード処理の結果として出力される第１意味ベクトルの要素と、共有エンコーダー部３１によるエンコード処理の結果として出力される共通意味ベクトルの要素とを並べて得られる連結ベクトルを、第１ドメインにおける低次元化処理のパラメーターに基づいて低次元化し、低次元化した結果である第１低次元化ベクトルを出力する。このとき、第１デコーダー部１４は、第１低次元化部１３が出力する第１低次元化ベクトルと、第１ドメインにおけるデコード処理のパラメーターとに基づいて、入力文に対応する出力文を生成する。この場合の学習処理としては、学習データに含まれる文対の目的言語による文と、第１デコーダー部１４からの出力文との差に基づいて、第１低次元化部１３における低次元化処理のパラメーターをも更新する。なお、第２低次元化部２３についても、ここで説明した第１低次元化部１３と同様である。 In the case of having at least one of the first order reduction section and the second order reduction section, it is as follows. The first dimensionality reduction unit 13 converts the elements of the first semantic vector output as a result of the encoding process by the first encoder unit 12 and the elements of the common semantic vector output as a result of the encoding process by the shared encoder unit 31. is reduced in dimension based on the parameters of the dimension reduction processing in the first domain, and a first reduced-order vector that is the result of the reduction in dimension is output. At this time, the first decoder unit 14 generates an output sentence corresponding to the input sentence based on the first reduced dimension vector output by the first reduced dimension unit 13 and the parameters of the decoding process in the first domain. do. As the learning process in this case, the dimensionality reduction processing in the first dimensionality reduction unit 13 is based on the difference between the sentence in the target language of the sentence pair included in the learning data and the output sentence from the first decoder unit 14. also update the parameters of The second dimension reduction unit 23 is similar to the first dimension reduction unit 13 described here.

エンコーダー出力間の直交誤差に基づく学習を行う場合には、次の通りである。即ち、学習装置は、第１直交誤差算出部４１９と第２直交誤差算出部４２９とを具備する。第１直交誤差算出部４１９は、第１エンコーダー部１２によるエンコード処理の結果として出力される第１意味ベクトルと、共有エンコーダー部３１によるエンコード処理の結果として出力される共通意味ベクトルとの直交誤差である第１直交誤差を算出する第２直交誤差算出部４２９は、第２エンコーダー部２２によるエンコード処理の結果として出力される第２意味ベクトルと、共有エンコーダー部３１によるエンコード処理の結果として出力される共通意味ベクトルとの直交誤差である第２直交誤差を算出する。逆誤差伝搬法等によるパラメーターの調整においては、学習データの文対の目的言語による文と第１デコーダー部１４から出力される出力文との差とともに、第１直交誤差算出部４１９が算出した第１直交誤差にも基づいて、第１エンコーダー部１２におけるエンコード処理のパラメーターと、共有エンコーダー部３１におけるエンコード処理のパラメーターと、第１デコーダー部１４におけるデコード処理のパラメーターとを更新する。さらに、低次元化部を有する場合には、低次元化処理のパラメーターをも更新する。また、学習データの文対の目的言語による文と第２デコーダー部２４から出力される出力文との差とともに、第２直交誤差算出部４２９が算出した第２直交誤差にも基づいて、第２エンコーダー部２２におけるエンコード処理のパラメーターと、共有エンコーダー部３１におけるエンコード処理のパラメーターと、第２デコーダー部２４におけるデコード処理のパラメーターとを更新する。さらに、低次元化部を有する場合には、低次元化処理のパラメーターをも更新する。なお、上記の直交誤差に基づくパラメーターの調整を全ドメインのうちの一部のドメインのみに関して実施するようにしてもよい。 When performing learning based on the quadrature error between encoder outputs, it is as follows. That is, the learning device includes a first orthogonal error calculator 419 and a second orthogonal error calculator 429 . The first orthogonal error calculator 419 calculates the orthogonal error between the first semantic vector output as a result of encoding processing by the first encoder unit 12 and the common semantic vector output as a result of encoding processing by the shared encoder unit 31. A second orthogonal error calculation unit 429 that calculates a certain first orthogonal error outputs a second semantic vector output as a result of encoding processing by the second encoder unit 22 and a result of encoding processing by the shared encoder unit 31. A second orthogonal error, which is an orthogonal error with the common semantic vector, is calculated. In the adjustment of the parameters by the back propagation method or the like, the difference between the sentence in the target language of the sentence pair of the learning data and the output sentence output from the first decoder unit 14 is Also based on the one orthogonal error, the parameters for the encoding process in the first encoder section 12, the parameters for the encoding process in the shared encoder section 31, and the parameters for the decoding process in the first decoder section 14 are updated. Furthermore, if the order reduction unit is included, the parameters for the order reduction process are also updated. Also, based on the second orthogonal error calculated by the second orthogonal error calculating unit 429 as well as the difference between the sentence in the target language of the sentence pair of the learning data and the output sentence output from the second decoder unit 24, the second The parameters for the encoding process in the encoder section 22, the parameters for the encoding process in the shared encoder section 31, and the parameters for the decoding process in the second decoder section 24 are updated. Furthermore, if the order reduction unit is included, the parameters for the order reduction process are also updated. It should be noted that the parameter adjustment based on the above orthogonal error may be performed for only some domains among all domains.

少なくとも、翻訳装置における、第１エンコーダー部１２におけるエンコード処理のパラメーターと、共有エンコーダー部３１におけるエンコード処理のパラメーターと、第１デコーダー部１４におけるデコード処理のパラメーターとは、上記実施形態における学習処理によって求められたものとしてよい。低次元化処理のためのパラメーターについても同様である。他のドメインのパラメーターについても同様である。 At least the parameters for the encoding process in the first encoder section 12, the parameters for the encoding process in the shared encoder section 31, and the parameters for the decoding process in the first decoder section 14 in the translation device are obtained by the learning process in the above embodiment. It can be considered as The same applies to the parameters for order reduction processing. The same is true for parameters of other domains.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiment of the present invention has been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and design and the like are included within the scope of the gist of the present invention.

［実証実験］
実証実験の結果を以下において説明する。実証実験においては、原言語を英語とし、目的言語を日本語とした。第１ドメインの言語資源として特許コーパス(ＮＴＣＩＲ， NII Testbeds and Community for Information access Research) を用いた。第２ドメインとして、科学技術論文（ＡＳＰＥＣ)を用いた。評価結果には第１ドメインの特許コーパス（ＮＴＣＩＲ）のテストセットを用いた。第１の評価対象手法は、第１ドメインの特徴を知識として取り出す第１エンコーダー部（１２）と、ドメインに共通する特徴を知識として取り出す共有エンコーダー部（３１）を利用した翻訳モデルによるものである。
第２の評価対象手法は、上記第１の評価手法の構成に加えて、第１エンコーダー部と共有エンコーダー部のパラメーター学習の際に、これら両エンコーダーからの出力間の直交性に関する制約を加えたもの（第４実施形態）である。第１の比較対象の手法は、第１ドメインと第２ドメインとをそれぞれ独立に学習させたものである。第２の比較対象の手法は、先行研究における最新手法の１つである、コーパスの先頭にドメインタグ(例：＜NTCIR＞)を付与する手法である。第２の比較対象の手法は、下の文献に記載されている。 [Demonstration experiment]
The results of the demonstration experiment are described below. In the demonstration experiment, the original language was English and the target language was Japanese. A patent corpus (NTCIR, NII Testbeds and Community for Information access Research) was used as the language resource of the first domain. Scientific and technical papers (ASPEC) were used as the second domain. A test set of the first domain patent corpus (NTCIR) was used for the evaluation results. The first evaluation target method is based on a translation model that uses a first encoder section (12) that extracts the features of the first domain as knowledge and a shared encoder section (31) that extracts features common to the domains as knowledge. .
In addition to the configuration of the first evaluation method, the second evaluation method adds a constraint on the orthogonality between the outputs from the first encoder unit and the shared encoder unit when learning the parameters of the first encoder unit and the shared encoder unit. It is a thing (4th Embodiment). The first method for comparison is to independently learn the first domain and the second domain. The second method to be compared is one of the latest methods in previous research, which adds a domain tag (eg <NTCIR>) to the beginning of the corpus. A second comparative approach is described in the literature below.

第２の比較対象の手法に関する文献： Chenhui Chu, Raj Dabre, and Sadao Kurohashi. An empirical comparison of domain adaptation methods for neural machine translation. In Proceedings of ACL, 2017. The literature on the second comparative method: Chenhui Chu, Raj Dabre, and Sadao Kurohashi. An empirical comparison of domain adaptation methods for neural machine translation. In Proceedings of ACL, 2017.

精度の計測は一般的に用いられている機械翻訳の評価尺度ＢＬＥＵを用いた。ＢＬＥＵは、その値が高いほど参照訳となる正解に近いと判断される。第１の比較対象の手法では、ＢＬＥＵスコアは４４．２であった。第２の比較対象の手法では、ＢＬＥＵスコアは４６．０であった。第１の評価対象手法では、ＢＬＥＵスコアは４８．１であった。第２の評価対象手法では、ＢＬＥＵスコアは４９．７６であった。つまり、上に記載した実施形態である第１および第２の評価対象手法のスコアは、第１および第２の比較対象の手法のスコアよりも高い。つまり、第１および第２の評価対象手法について、良好な評価が得られた。 Accuracy was measured using a commonly used evaluation scale BLEU for machine translation. BLEU is judged to be closer to the correct reference translation as its value increases. In the first comparative approach, the BLEU score was 44.2. In the second comparative approach, the BLEU score was 46.0. In the first evaluated approach, the BLEU score was 48.1. In the second evaluated approach, the BLEU score was 49.76. That is, the scores for the first and second evaluated approaches, which are the embodiments described above, are higher than the scores for the first and second comparative approaches. In other words, good evaluations were obtained for the first and second evaluation target methods.

実証実験における翻訳例は、次の通りである。原言語による文は「A data space is established in the memory.」である。これに対する参照訳（目的言語による文）は、「メモリ内にはデータ空間が形成される。」である。第１の比較対象の手法による翻訳結果は「このメモリ内にはデータ空間が設けられる。」である。第２の比較対象の手法による翻訳結果は「メモリにはデータ空間が形成される。」である。第１の評価対象手法による翻訳結果は「メモリにはデータ空間が形成される。」である。第２の評価対象手法による翻訳結果は「メモリ内にはデータ空間が形成される。」である。第２の評価対象手法による翻訳結果の出力は、参照訳と同じであり、最も良い結果であることがわかる。 Translation examples in the demonstration experiment are as follows. The sentence in the original language is "A data space is established in the memory." The reference translation (sentence in the target language) for this is "A data space is formed in memory." The translation result by the method for the first comparison is "a data space is provided in this memory". The result of translation by the second method for comparison is "a data space is formed in the memory." The translation result by the first evaluation target method is "a data space is formed in the memory." The translation result by the second evaluation target method is "a data space is formed in the memory". The output of the translation result by the second evaluation target method is the same as the reference translation, and it can be seen that it is the best result.

本発明は、機械翻訳技術に利用することができる。本発明を用いた機械翻訳処理は、例えば、放送事業などのメディア産業にも利用することができる。但し、本発明の利用範囲はここに例示したものには限られない。 INDUSTRIAL APPLICABILITY The present invention can be used for machine translation technology. Machine translation processing using the present invention can also be used, for example, in the media industry such as the broadcasting business. However, the scope of application of the present invention is not limited to those exemplified here.

１，２，３，４翻訳装置（学習装置）
１１第１入力部
１２第１エンコーダー部
１３第１低次元化部
１４第１デコーダー部
１５第１出力部
１７第１翻訳モデル部
２１第２入力部
２２第２エンコーダー部
２３第２低次元化部
２４第２デコーダー部
２５第２出力部
２７第２翻訳モデル部
３１共有エンコーダー部
２１７第１翻訳モデル部
２２７第２翻訳モデル部
３３１共有エンコーダー部
４１９第１直交誤差算出部
４２９第２直交誤差算出部 1, 2, 3, 4 translation device (learning device)
11 First input unit 12 First encoder unit 13 First order reduction unit 14 First decoder unit 15 First output unit 17 First translation model unit 21 Second input unit 22 Second encoder unit 23 Second order reduction unit 24 second decoder unit 25 second output unit 27 second translation model unit 31 shared encoder unit 217 first translation model unit 227 second translation model unit 331 shared encoder unit 419 first orthogonal error calculator 429 second orthogonal error calculator

Claims

a first encoder unit that encodes an input sentence belonging to the first domain and in a source language based on parameters for encoding in the first domain;
a second encoder unit that encodes an input sentence belonging to the second domain in the source language based on parameters for encoding in the second domain;
a shared encoder unit that encodes the input sentence belonging to either the first domain or the second domain based on encoding parameters shared between the first domain and the second domain;
Based on a first semantic vector output as a result of encoding processing in the first encoder unit, a common semantic vector output as a result of encoding processing in the shared encoder unit, and parameters of decoding processing in the first domain a first decoder unit for generating an output sentence corresponding to the input sentence;
Based on a second semantic vector output as a result of encoding processing in the second encoder unit, a common semantic vector output as a result of encoding processing in the shared encoder unit, and parameters of decoding processing in the second domain a second decoder unit for generating an output sentence corresponding to the input sentence;
calculating a first orthogonal error that is an orthogonal error between a first semantic vector output as a result of encoding processing by the first encoder unit and a common semantic vector output as a result of encoding processing by the shared encoder unit; 1 orthogonal error calculator;
calculating a second orthogonal error that is an orthogonal error between a second semantic vector output as a result of encoding processing by the second encoder unit and a common semantic vector output as a result of encoding processing by the shared encoder unit; a two-orthogonal error calculator;
and
When a sentence pair, which is a pair of sentences in the source language and the target language belonging to the first domain, is input as learning data, based on the input sentence in the source language, the first encoder unit, the shared encoder unit, and the An output sentence is generated by processing with the first decoder unit, and based on the difference between the sentence in the target language of the sentence pair and the output sentence, parameters for encoding processing in the first encoder unit and the shared encoder unit updating the parameters of the encoding process in and the parameters of the decoding process in the first decoder unit,
When a sentence pair, which is a pair of sentences in the source language and the target language belonging to the second domain, is input as learning data, based on the input sentence in the source language, the second encoder unit, the shared encoder unit, and the An output sentence is generated by processing with the second decoder unit, and based on the difference between the sentence in the target language of the sentence pair and the output sentence, parameters for encoding processing in the second encoder unit and the shared encoder unit updating the parameters of the encoding process in and the parameters of the decoding process in the second decoder unit,
Based on the first orthogonal error calculated by the first orthogonal error calculating unit as well as the difference between the sentence in the target language of the sentence pair and the output sentence output from the first decoder unit, the first encoder updating parameters for encoding processing in the unit, parameters for encoding processing in the shared encoder unit, and parameters for decoding processing in the first decoder unit;
Based on the second orthogonal error calculated by the second orthogonal error calculating unit as well as the difference between the sentence in the target language of the sentence pair and the output sentence output from the second decoder unit, the second encoder updating parameters of encoding processing in the unit, parameters of encoding processing in the shared encoder unit, and parameters of decoding processing in the second decoder unit;
learning device.

The concatenated vector obtained by arranging the elements of the first semantic vector output as a result of the encoding process by the first encoder unit and the elements of the common semantic vector output as a result of the encoding process by the shared encoder unit, A first order reduction unit that reduces the order based on the parameters of the order reduction process in the first domain and outputs a first reduced order vector that is the result of the order reduction;
and
The first decoder unit generates an output sentence corresponding to the input sentence based on the first reduced-dimensional vector output by the first reduced-order unit and parameters of decoding processing in the first domain. death,
Also updating parameters for order reduction processing in the first order reduction unit based on the difference between the sentence in the target language of the sentence pair and the output sentence from the first decoder unit;
A learning device according to claim 1.

The concatenated vector obtained by arranging the elements of the second semantic vector output as a result of the encoding process by the second encoder unit and the elements of the common semantic vector output as a result of the encoding process by the shared encoder unit, A second order reduction unit that reduces the order based on the parameter of the order reduction process in the second domain and outputs a second order reduction vector that is the result of the order reduction,
and
The second decoder unit generates an output sentence corresponding to the input sentence based on the second reduced-dimensional vector output by the second reduced-order unit and parameters of decoding processing in the second domain. death,
Also updating the parameter of the order reduction processing in the second order reduction unit based on the difference between the sentence in the target language of the sentence pair and the output sentence from the second decoder unit;
3. The learning device according to claim 1 or 2.

the computer,
a first encoder unit that encodes an input sentence belonging to the first domain and in a source language based on parameters for encoding in the first domain;
a second encoder unit that encodes an input sentence belonging to the second domain in the source language based on parameters for encoding in the second domain;
a shared encoder unit that encodes the input sentence belonging to either the first domain or the second domain based on encoding parameters shared between the first domain and the second domain;
Based on a first semantic vector output as a result of encoding processing in the first encoder unit, a common semantic vector output as a result of encoding processing in the shared encoder unit, and parameters of decoding processing in the first domain a first decoder unit for generating an output sentence corresponding to the input sentence;
Based on a second semantic vector output as a result of encoding processing in the second encoder unit, a common semantic vector output as a result of encoding processing in the shared encoder unit, and parameters of decoding processing in the second domain a second decoder unit for generating an output sentence corresponding to the input sentence;
calculating a first orthogonal error that is an orthogonal error between a first semantic vector output as a result of encoding processing by the first encoder unit and a common semantic vector output as a result of encoding processing by the shared encoder unit; 1 orthogonal error calculator;
calculating a second orthogonal error that is an orthogonal error between a second semantic vector output as a result of encoding processing by the second encoder unit and a common semantic vector output as a result of encoding processing by the shared encoder unit; a two-orthogonal error calculator;
and
When a sentence pair, which is a pair of sentences in the source language and the target language belonging to the first domain, is input as learning data, based on the input sentence in the source language, the first encoder unit, the shared encoder unit, and the An output sentence is generated by processing with the first decoder unit, and based on the difference between the sentence in the target language of the sentence pair and the output sentence, parameters for encoding processing in the first encoder unit and the shared encoder unit updating the parameters of the encoding process in and the parameters of the decoding process in the first decoder unit,
When a sentence pair, which is a pair of sentences in the source language and the target language belonging to the second domain, is input as learning data, based on the input sentence in the source language, the second encoder unit, the shared encoder unit, and the An output sentence is generated by processing with the second decoder unit, and based on the difference between the sentence in the target language of the sentence pair and the output sentence, parameters for encoding processing in the second encoder unit and the shared encoder unit updating the parameters of the encoding process in and the parameters of the decoding process in the second decoder unit;
Based on the first orthogonal error calculated by the first orthogonal error calculating unit as well as the difference between the sentence in the target language of the sentence pair and the output sentence output from the first decoder unit, the first encoder updating parameters for encoding processing in the unit, parameters for encoding processing in the shared encoder unit, and parameters for decoding processing in the first decoder unit;
Based on the second orthogonal error calculated by the second orthogonal error calculating unit as well as the difference between the sentence in the target language of the sentence pair and the output sentence output from the second decoder unit, the second encoder updating parameters of encoding processing in the unit, parameters of encoding processing in the shared encoder unit, and parameters of decoding processing in the second decoder unit;
A program to function as a learning device.