JP2022019422A

JP2022019422A - Learning device, inference device, learning method, inference method and program

Info

Publication number: JP2022019422A
Application number: JP2020123246A
Authority: JP
Inventors: 小萌武; Xiaomeng Wu; 昭悟木村; Shogo Kimura; 邦夫柏野; Kunio Kashino; ガントゥグスアタルサイハン; Atarsaikhan Gantugs; 誠一内田; Seiichi Uchida
Original assignee: Kyushu University NUC; Nippon Telegraph and Telephone Corp
Current assignee: Kyushu University NUC; Nippon Telegraph and Telephone Corp
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2022-01-27
Anticipated expiration: 2040-07-17
Also published as: JP7340199B2

Abstract

To provide a learning device, an inference device, a learning method, an inference method and a program, with which it is possible to realize a sequence alignment which can derive, and enable use of, a more complicated feature representation and also which can derive, and enable use of, a monotonous and contiguous corresponding function, without relying on the use of manually designed feature representations.SOLUTION: A learning device comprises: an attention mechanism for generating, using a first feature sequence based on a first sequence and a second feature sequence based on a second sequence, a weight sequence that represents the probability that the elements of the first and second feature sequences have a correspondence relation; an objective function value derivation unit for deriving, on the basis of the weight sequence, a label that represents whether or not the first and second sequences belong to the same class and an objective function value that corresponds to the first and second feature sequences; and an update unit for executing a prescribed learning process on the basis of the objective function value and thereby generating a learning result.SELECTED DRAWING: Figure 2

Description

本発明は、学習装置、推論装置、学習方法、推論方法及びプログラムに関する。 The present invention relates to a learning device, an inference device, a learning method, an inference method and a program.

配列とは、順番に並べられた一続きのデータである。配列の例として、音声信号と音響信号と生体信号とがある。配列の各データは、数値や数値ベクトルなどであり、配列の要素と呼ばれる。配列の各データは、自然数などの添字を用いて識別される。 An array is a series of data arranged in order. Examples of arrays are audio signals, acoustic signals, and biological signals. Each data in the array is a numerical value, a numerical vector, etc., and is called an element of the array. Each piece of data in the array is identified using a subscript such as a natural number.

配列整列とは、複数の配列において互いに類似する領域を特定できるように、各配列の要素を整列させることである。配列の関係性を知る手がかりが配列整列によって与えられるので、配列整列は、例えば、動作認識、音声分析、生体信号分類及び署名認証等の多くの応用問題において重要である。特に、２個の配列の間に、局所的な変移と速度の変化とに関する非線形の時間変動が存在する場合、配列整列が必要となる。配列整列の代表的な方法として、動的時間伸縮法がある（非特許文献１参照）。 Sequence alignment is the alignment of the elements of each sequence so that regions similar to each other can be identified in a plurality of sequences. Sequence alignment is important in many application problems such as motion recognition, speech analysis, biosignal classification and signature authentication, as clues to know the relationship of sequences are given by sequence alignment. In particular, if there is a non-linear time variation between the two sequences with respect to local and velocity changes, sequence alignment is required. As a typical method of sequence alignment, there is a dynamic time expansion / contraction method (see Non-Patent Document 1).

動的時間伸縮法では、２個の配列における各要素間の距離が導出される。対応関係にある要素間の距離の合計が最小になるように、２個の配列における各要素間の対応関係が検出される。対応関係とは、互いに対応している２個の要素の組み合わせ、又は、互いに対応している２個の要素の添字の組み合わせである。 In the dynamic time expansion and contraction method, the distance between each element in the two arrays is derived. The correspondence between the elements in the two arrays is detected so that the sum of the distances between the elements in the correspondence is minimized. The correspondence relationship is a combination of two elements corresponding to each other or a combination of subscripts of two elements corresponding to each other.

動的時間伸縮法では、処理の並列化が困難である。このため、動的時間伸縮法と深層学習とを組み合わせることは難しい。また、動的時間伸縮法は、人手によって設計された特徴表現の使用に依存し、より複雑な特徴表現が必要な場合に性能が不十分である。従って、動的時間伸縮法は、所定の目的の応用問題には最適でない場合が多い。 In the dynamic time expansion / contraction method, it is difficult to parallelize the processes. For this reason, it is difficult to combine the dynamic time expansion and contraction method with deep learning. Also, the dynamic time expansion and contraction method relies on the use of manually designed feature representations and is inadequate in performance when more complex feature representations are required. Therefore, the dynamic time expansion and contraction method is often not optimal for a given application problem.

機械翻訳、音声合成及び音声変換等の分野では、深層学習と組み合わせることが容易な配列整列の方法として、注意機構を使用する方法がある（非特許文献２、３参照）。注意機構は、第１配列と第２配列との２個の配列に関して、第２配列の各要素に対する第１配列の各要素の重みを導出する。導出された各重みは、第１配列と第２配列との２個の配列の各要素が対応関係にある確率を表す。注意機構を使用する配列整列の方法では、第２配列の各要素に対する第１配列の各要素の重みに基づいて第１配列の各要素が並べ替えられることによって、配列整列が実現される。 In the fields of machine translation, speech synthesis, speech conversion, etc., there is a method of using an attention mechanism as a method of sequence alignment that can be easily combined with deep learning (see Non-Patent Documents 2 and 3). The attention mechanism derives the weight of each element of the first array for each element of the second array with respect to the two arrays, the first array and the second array. Each derived weight represents the probability that each element of the two arrays, the first array and the second array, has a correspondence relationship. In the method of sequence alignment using the attention mechanism, the sequence alignment is realized by rearranging each element of the first array based on the weight of each element of the first array with respect to each element of the second array.

Hiroaki Sakoe and Seibi Chiba, "Dynamic programming algorithm optimization for spoken word recognition," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 26, No. 1, pp. 43-49, 1978.Hiroaki Sakoe and Seibi Chiba, "Dynamic programming algorithm optimization for spoken word recognition," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 26, No. 1, pp. 43-49, 1978. Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio, "Neural machine translation by jointly learning to align and translate," In ICLR, 2015.Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio, "Neural machine translation by jointly learning to align and translate," In ICLR, 2015. Minh-Thang Luong, Hieu Pham, and Christopher D. Manning, "Effective approaches to attention-based neural machine translation," In EMNLP, pp. 1412-1421, 2015.Minh-Thang Luong, Hieu Pham, and Christopher D. Manning, "Effective approaches to attention-based neural machine translation," In EMNLP, pp. 1412-1421, 2015.

第２配列の添字を独立変数とし、第２配列の添字との対応関係にある第１配列の添字を従属変数とする関数（以下「対応関数」という。）を用いて、２個の配列の各要素間の対応関係は表される。照合又は分類などの応用問題では、同じクラスに属する２個の配列において、対応関数が単調で連続的である場合が多い。これに対して、異なるクラスに属する２個の配列において、対応関数が非単調又は非連続的である場合が多い。 Using a function (hereinafter referred to as "correspondence function") in which the subscript of the second array is the independent variable and the subscript of the first array, which is in correspondence with the subscript of the second array, is the dependent variable, the two arrays The correspondence between each element is represented. In applied problems such as collation or classification, the corresponding functions are often monotonous and continuous in two arrays belonging to the same class. On the other hand, in two arrays belonging to different classes, the corresponding functions are often non-monotonic or discontinuous.

このような性質が利用されることによって、同じクラスに２個の配列が属するか否かを判定することが可能である。例えば、単調で連続的な対応関数が２個の配列から導出され、対応関係にある要素間の距離の合計が導出可能である。この合計が大きい場合には、異なるクラスに２個の配列が属すると判定することができる。 By utilizing such a property, it is possible to determine whether or not two arrays belong to the same class. For example, a monotonous and continuous correspondence function can be derived from two arrays, and the total distance between the elements in the correspondence can be derived. When this sum is large, it can be determined that two arrays belong to different classes.

このような性質を利用する代表的な配列整列方法として、動的時間伸縮法がある。しかしながら、動的時間伸縮法は、人手によって設計された特徴表現の使用に依存し、より複雑な特徴表現が必要な場合に性能が不十分である。従って、動的時間伸縮法は、所定の目的の応用問題には最適でない場合が多い。 As a typical sequence alignment method utilizing such a property, there is a dynamic time expansion / contraction method. However, the dynamic time expansion and contraction method relies on the use of manually designed feature representations and is inadequate in performance when more complex feature representations are required. Therefore, the dynamic time expansion and contraction method is often not optimal for a given application problem.

これに対して注意機構は、人手によって設計された特徴表現に依存しない。しかしながら従来では、注意機構を使用して照合又は分類などの応用問題を解決することができない。なぜなら、２個の配列の各要素が対応関係にある確率を従来の注意機構が導出したとしても、対応関数を確率から導出することができないためである。また、従来の注意機構が対応関数を導出したとしても、対応関数が単調で連続的であることを保証する方法がないためである。 Attention mechanisms, on the other hand, do not rely on manually designed feature representations. However, in the past, attention mechanisms have not been used to solve applied problems such as collation or classification. This is because even if the conventional attention mechanism derives the probability that each element of the two arrays has a correspondence relationship, the corresponding function cannot be derived from the probability. Also, even if the conventional attention mechanism derives the corresponding function, there is no way to guarantee that the corresponding function is monotonous and continuous.

従って、従来の注意機構を使用して整列された配列の間の距離が照合又は分類等の応用問題に適用された場合、配列間の距離が非常に小さく導出されることが多い。このため、異なるクラスに属する２個の配列を正しく区別することができないことが多い。 Therefore, when the distance between sequences aligned using conventional attention mechanisms is applied to application problems such as collation or classification, the distance between sequences is often derived very small. For this reason, it is often not possible to correctly distinguish between two arrays belonging to different classes.

図１０は、重み行列の例を示す図である。重み行列は、２個の配列の各要素が対応関係にある確率を表す行列である。図１０では、第１配列は一例として「ＬＩＳＴＥＮ」であり、第２配列は一例として「ＳＩＬＥＮＴ」である。値が「１」である重み行列の要素は、該当する要素が対応関係にあることを表す。 FIG. 10 is a diagram showing an example of a weight matrix. The weight matrix is a matrix that represents the probability that each element of the two arrays has a correspondence relationship. In FIG. 10, the first sequence is "LISTEN" as an example, and the second sequence is "SILENT" as an example. The elements of the weight matrix having a value of "1" indicate that the corresponding elements have a correspondence relationship.

図１０における左側に示された重み行列は、従来の注意機構によって導出された重み行列である。このように従来の注意機構は、非単調で非連続的な対応関数を導出する。異なるクラスに２個の配列が属していても、図１０における左側に示された重み行列では、対応関係にある要素間の距離の合計が０となっているため、２個の配列を正しく区別することができていない。 The weight matrix shown on the left side in FIG. 10 is a weight matrix derived by a conventional attention mechanism. Thus, the conventional attention mechanism derives a non-monotonic and discontinuous correspondence function. Even if two arrays belong to different classes, the weight matrix shown on the left side in FIG. 10 correctly distinguishes the two arrays because the total distance between the corresponding elements is 0. I haven't been able to.

このため、照合又は分類などの応用問題において、図１０における右側に示された重み行列における「１」の並び方のように単調で連続的な対応関数を導出及び使用可能な配列整列方法が必要とされている。このような配列整列方法によって、配列間の距離又は類似度が正しく導出され、異なるクラスに属する配列であるか否かを正しく推論することが可能である。 Therefore, in application problems such as collation or classification, there is a need for an array alignment method that can derive and use a monotonous and continuous correspondence function like the arrangement of "1" in the weight matrix shown on the right side in FIG. Has been done. By such an array alignment method, the distance or similarity between sequences can be correctly derived, and it is possible to correctly infer whether or not the sequences belong to different classes.

音声合成又は音声変換などの応用問題では、第１配列を第２配列に変換することが目的である。第１配列と第２配列との間において、局所的な変移と速度の変化とに関する非線形の時間変動が存在する場合、配列整列が必要となる。例えば、日本人の英語音声をアメリカ人の英語音声に変換する場合、英語音声のテンポに変動が存在するため、音声信号の配列を整列する必要がある。すなわち、２個の配列の各要素間の対応関係が推定され、推定された対応関係を使用して第１配列が整列され、整列された第１配列が第２配列に変換される必要がある。このような場合でも、２個の配列の間における対応関数が単調で連続的であることが多い。 In application problems such as speech synthesis or speech conversion, the purpose is to convert the first array to the second array. If there is a non-linear time variation between the first and second sequences with respect to local and velocity changes, sequence alignment is required. For example, when converting Japanese English voice to American English voice, it is necessary to align the arrangement of voice signals because the tempo of the English voice fluctuates. That is, the correspondence between each element of the two arrays needs to be estimated, the first sequence aligned using the estimated correspondence, and the aligned first sequence converted to the second array. .. Even in such cases, the corresponding function between the two arrays is often monotonous and continuous.

しかしながら、従来の注意機構を使用する方法では、単調で連続的な対応関数を注意機構が導出できるように、数理モデルの学習を誘導（ガイド）する機能がない。このため、注意機構が十分な性能を提供できるようになるまでには、長い学習時間が必要である場合が多い。 However, the conventional method using the attention mechanism does not have a function of guiding the learning of the mathematical model so that the attention mechanism can derive a monotonous and continuous correspondence function. For this reason, it often takes a long learning time before the attention mechanism can provide sufficient performance.

このため、音声合成又は音声変換などの応用問題においても、上述の配列整列方法が必要とされている。このような配列整列方法によって、音声合成又は音声変換などの推論精度の向上と学習時間の短縮とを両立させることが可能である。 Therefore, the above-mentioned arrangement method is also required for application problems such as speech synthesis or speech conversion. By such an arrangement method, it is possible to achieve both improvement of inference accuracy such as speech synthesis or speech conversion and shortening of learning time.

上記事情に鑑み、本発明は、人手によって設計された特徴表現の使用に依存することなく、より複雑な特徴表現を導出及び使用可能であると同時に、単調で連続的な対応関数を導出及び使用可能な配列整列を実現することが可能である学習装置、推論装置、学習方法、推論方法及びプログラムを提供することを目的としている。 In view of the above circumstances, the present invention can derive and use more complex feature representations without relying on the use of manually designed feature representations, while at the same time deriving and using monotonous and continuous correspondence functions. It is an object of the present invention to provide a learning device, an inference device, a learning method, an inference method, and a program capable of realizing a possible sequence arrangement.

本発明の一態様は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意機構と、同じクラスに前記第１配列と前記第２配列とが属するか否かを表すラベルと前記第１特徴配列と前記第２特徴配列とに応じた値である目的関数値を、前記重み行列に基づいて導出する目的関数値導出部と、前記目的関数値に基づいて所定の学習処理を実行することによって学習結果を生成する更新部とを備える学習装置である。 In one aspect of the present invention, the first feature sequence based on the first sequence and the second feature sequence based on the second sequence are used, and the elements of the first feature sequence and the second feature sequence correspond to each other. A caution mechanism for generating a weighting matrix, which is a matrix representing a certain probability, a label indicating whether or not the first array and the second array belong to the same class, the first feature array, and the second feature array. An objective function value derivation unit that derives an objective function value that is a value according to the above weight matrix based on the weight matrix, and an update unit that generates a learning result by executing a predetermined learning process based on the objective function value. It is a learning device provided with.

本発明の一態様は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意機構と、現在の時刻における前記第２配列の要素に対する前記第１特徴配列の各要素の重みと前記第１特徴配列とに基づいて、現在の時刻における前記第２配列の要素を導出する復号化部と、正解配列と前記第２配列とに応じた値である目的関数値を導出する目的関数値導出部と、前記目的関数値に基づいて所定の学習処理を実行することによって学習結果を生成する更新部とを備える学習装置である。 In one aspect of the present invention, the first feature sequence based on the first sequence and the second feature sequence based on the second sequence are used, and the elements of the first feature sequence and the second feature sequence correspond to each other. Based on the attention mechanism that creates a weighting matrix, which is a matrix representing a certain probability, and the weight of each element of the first feature array with respect to the elements of the second array at the current time, and the current first feature array. Based on the decoding unit that derives the elements of the second array at time, the objective function value derivation unit that derives the objective function value that is the value corresponding to the correct array and the second array, and the objective function value. It is a learning device including an update unit that generates a learning result by executing a predetermined learning process.

本発明の一態様は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意機構と、前記第１特徴配列と前記第２特徴配列と前記重み行列とに基づいて、前記第１配列と前記第２配列との間の距離を導出する照合部と、前記距離に基づいて所定の推論処理を実行することによって推論結果を生成する推論部とを備える推論装置である。 In one aspect of the present invention, the first feature sequence based on the first sequence and the second feature sequence based on the second sequence are used, and the elements of the first feature sequence and the second feature sequence correspond to each other. The distance between the first array and the second array based on the attention mechanism that generates the weight matrix, which is a matrix representing a certain probability, and the first feature array, the second feature array, and the weight matrix. This is an inference device including a collation unit for deriving the above and an inference unit for generating an inference result by executing a predetermined inference process based on the distance.

本発明の一態様は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意機構と、前記第１特徴配列と前記重み行列とに基づいて第２配列を導出する復号化部と、前記第２配列に基づいて所定の推論処理を実行することによって推論結果を生成する推論部とを備える推論装置である。 In one aspect of the present invention, the first feature sequence based on the first sequence and the second feature sequence based on the second sequence are used, and the elements of the first feature sequence and the second feature sequence correspond to each other. A caution mechanism that generates a weight matrix that represents a certain probability, a decoding unit that derives a second array based on the first feature array and the weight matrix, and a predetermined inference based on the second array. It is an inference device including an inference unit that generates an inference result by executing a process.

本発明の一態様は、学習装置が実行する学習方法であって、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意ステップと、同じクラスに前記第１配列と前記第２配列とが属するか否かを表すラベルと前記第１特徴配列と前記第２特徴配列とに応じた値である目的関数値を、前記重み行列に基づいて導出する目的関数値導出ステップと、前記目的関数値に基づいて所定の学習処理を実行することによって学習結果を生成する更新ステップとを含む学習方法である。 One aspect of the present invention is a learning method executed by a learning device, wherein the first feature sequence and the first feature sequence are used by using a first feature sequence based on the first sequence and a second feature sequence based on the second sequence. A caution step to generate a weight matrix, which is a matrix representing the probability that each element of the two feature arrays has a correspondence relationship, a label indicating whether or not the first array and the second array belong to the same class, and the above. An objective function value derivation step for deriving an objective function value which is a value corresponding to the first feature array and the second feature array based on the weight matrix, and a predetermined learning process based on the objective function value are executed. It is a learning method including an update step for generating a learning result by doing so.

本発明の一態様は、推論装置が実行する推論方法であって、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意ステップと、前記第１特徴配列と前記第２特徴配列と前記重み行列とに基づいて、前記第１配列と前記第２配列との間の距離を導出する照合ステップと、前記距離に基づいて所定の推論処理を実行することによって推論結果を生成する推論ステップとを含む推論方法である。 One aspect of the present invention is an inference method executed by an inference device, wherein the first feature sequence and the first feature sequence based on the second sequence are used. The first feature array is based on the attention step of generating a weight matrix, which is a matrix representing the probability that each element of the two feature array is in a correspondence relationship, and the first feature array, the second feature array, and the weight matrix. It is an inference method including a collation step for deriving a distance between an array and the second array, and an inference step for generating an inference result by executing a predetermined inference process based on the distance.

本発明の一態様は、学習装置が実行する学習方法であって、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意ステップと、現在の時刻における前記第２配列の要素に対する前記第１特徴配列の各要素の重みと前記第１特徴配列とに基づいて、現在の時刻における前記第２配列の要素を導出する復号化ステップと、正解配列と前記第２配列とに応じた値である目的関数値を導出する目的関数値導出ステップと、前記目的関数値に基づいて所定の学習処理を実行することによって学習結果を生成する更新ステップとを含む学習方法である。 One aspect of the present invention is a learning method executed by a learning device, wherein the first feature sequence and the first feature sequence are used by using a first feature sequence based on the first sequence and a second feature sequence based on the second sequence. A caution step to generate a weighting matrix, which is a matrix representing the probability that each element of the two feature arrays has a correspondence relationship, and the weight of each element of the first feature array with respect to the element of the second array at the current time, and the above. A decoding step that derives the elements of the second array at the current time based on the first feature array, and an objective function value that derives an objective function value that is a value corresponding to the correct array and the second array. It is a learning method including a derivation step and an update step of generating a learning result by executing a predetermined learning process based on the objective function value.

本発明の一態様は、推論装置が実行する推論方法であって、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意ステップと、前記第１特徴配列と前記重み行列とに基づいて第２配列を導出する復号化ステップと、前記第２配列に基づいて所定の推論処理を実行することによって推論結果を生成する推論ステップとを含む推論方法である。 One aspect of the present invention is an inference method executed by an inference device, wherein the first feature sequence and the first feature sequence based on the second sequence are used. A caution step for generating a weight matrix, which is a matrix representing the probability that each element of the two feature arrays has a correspondence relationship, and a decoding step for deriving a second array based on the first feature array and the weight matrix. , A reasoning method including a reasoning step of generating a reasoning result by executing a predetermined reasoning process based on the second array.

本発明の一態様は、上記に記載の学習装置としてコンピュータを機能させるためのプログラムである。 One aspect of the present invention is a program for operating a computer as the learning device described above.

本発明の一態様は、上記に記載の推論装置としてコンピュータを機能させるためのプログラムである。 One aspect of the present invention is a program for operating a computer as the inference device described above.

本発明により、人手によって設計された特徴表現の使用に依存することなく、より複雑な特徴表現を導出及び使用可能であると同時に、単調で連続的な対応関数を導出及び使用可能な配列整列を実現することが可能である。 INDUSTRIAL APPLICABILITY According to the present invention, more complicated feature representations can be derived and used without depending on the use of manually designed feature representations, and at the same time, monotonous and continuous correspondence functions can be derived and usable sequence alignment can be performed. It is possible to achieve it.

第１実施形態における、推論装置の構成例を示す図である。It is a figure which shows the structural example of the inference apparatus in 1st Embodiment. 第１実施形態における、学習装置の構成例を示す図である。It is a figure which shows the structural example of the learning apparatus in 1st Embodiment. 第１実施形態における、対応配列の例を示す図である。It is a figure which shows the example of the corresponding arrangement in 1st Embodiment. 第１実施形態における、単調性制約関数値の導出例を示す図である。It is a figure which shows the derivation example of the monotonicity constraint function value in 1st Embodiment. 第１実施形態における、連続性制約関数値の導出例を示す図である。It is a figure which shows the derivation example of the continuity constraint function value in 1st Embodiment. 第２実施形態における、推論装置の構成例を示す図である。It is a figure which shows the structural example of the inference device in 2nd Embodiment. 第２実施形態における、学習装置の構成例を示す図である。It is a figure which shows the structural example of the learning apparatus in 2nd Embodiment. 各実施形態における、推論装置のハードウェア構成例を示す図である。It is a figure which shows the hardware configuration example of the inference apparatus in each embodiment. 各実施形態における、学習装置のハードウェア構成例を示す図である。It is a figure which shows the hardware configuration example of the learning apparatus in each embodiment. 重み行列の例を示す図である。It is a figure which shows the example of the weight matrix.

本発明の実施形態について、図面を参照して詳細に説明する。
以下では、配列の照合又は分類などの応用問題において、注意機構が使用される。これによって、人手によって設計された特徴表現の使用に依存することなく、より複雑な特徴表現を導出及び使用可能な配列整列が実現される。 Embodiments of the present invention will be described in detail with reference to the drawings.
In the following, attention mechanisms are used in application problems such as sequence matching or classification. This allows for derivation and usable sequence alignment of more complex feature representations without relying on the use of manually designed feature representations.

以下では、単調性制約と連続性制約とのうちの少なくとも一方を表す制約関数値が新たに提案される。単調で連続的な対応関数を注意機構が導出できるように、制約関数値が最小化されることによって、符号化部と注意機構とを含む数理モデルの学習を誘導（ガイド）することが可能である。 In the following, a new constraint function value representing at least one of a monotonic constraint and a continuity constraint is proposed. By minimizing the constraint function value so that the attention mechanism can derive a monotonous and continuous correspondence function, it is possible to guide the learning of the mathematical model including the coding part and the attention mechanism. be.

以下、単調性制約とは、第１配列の要素と第２配列の要素とに対応関係があり、第２配列の要素の添字（番号）の増加につれて、第２配列の要素との対応関係にある第１配列の要素の添字（番号）が減少しないという制約である。以下、連続性制約とは、第１配列の要素と第２配列の要素とに対応関係があり、第２配列において隣り合う要素の添字（番号）が連続している場合に、第２配列において隣り合う要素の添字との対応関係にある第１配列の要素の添字同士の差が所定の正値以下であるという制約である。 Hereinafter, the monotonic constraint has a correspondence relationship between the elements of the first array and the elements of the second array, and as the subscript (number) of the elements of the second array increases, the correspondence relationship with the elements of the second array becomes. It is a constraint that the subscripts (numbers) of the elements of a certain first array do not decrease. Hereinafter, the continuity constraint has a correspondence relationship between the elements of the first array and the elements of the second array, and when the subscripts (numbers) of adjacent elements in the second array are continuous, in the second array. It is a constraint that the difference between the subscripts of the elements of the first array that correspond to the subscripts of the adjacent elements is not more than a predetermined positive value.

（第１実施形態）
第１実施形態では、照合又は分類などの応用問題に、学習方法及び推論方法が適用される。照合又は分類などの応用問題として、例えば、動作認識、音声認識、生体信号分類及び署名認証等がある。 (First Embodiment)
In the first embodiment, a learning method and an inference method are applied to an applied problem such as collation or classification. Applied problems such as collation or classification include, for example, motion recognition, voice recognition, biological signal classification, signature authentication, and the like.

学習段階において、学習装置が注意機構を用いて、数理モデルの学習を実行する。すなわち学習段階において、学習装置は、多数のパラメータを持つ数理モデルを、学習データを用いて学習する。学習装置は、数理モデルのパラメータの数値を決定することによって、学習済の数理モデルを生成する。実行段階において、推論装置は、学習済の数理モデルを用いて、推論処理を実行する。例えば、推論装置は、照合又は分類等の目的のタスクを実行する。 At the learning stage, the learning device uses the attention mechanism to learn the mathematical model. That is, in the learning stage, the learning device learns a mathematical model having a large number of parameters using the learning data. The learning device generates a trained mathematical model by determining the numerical values of the parameters of the mathematical model. At the execution stage, the inference device executes the inference process using the trained mathematical model. For example, the inference device performs a task of interest such as collation or classification.

まず、実行段階における、照合又は分類などの応用問題に適用される推論方法について説明する。 First, an inference method applied to an applied problem such as collation or classification in the execution stage will be described.

図１は、第１実施形態における、推論装置１の構成例を示す図である。第１実施形態の実行段階では、照合又は分類などの応用問題に推論方法が適用される。推論装置１は、第１配列と第２配列を入力として取得する。例えば、動作認識では、推論装置１は、人体における複数の特徴点（例えば、関節位置）の座標などを時間順に並べた配列を、入力として取得する。署名認証では、推論装置１は、署名収集装置のディスプレイにおける署名座標又は筆圧などを時間順に並べた配列を、入力として取得する。推論装置１は、第１配列と第２配列の間の距離を導出する。推論装置１は、距離に基づいて推論処理を実行する。推論装置１は、推論結果を所定の外部装置（不図示）に出力する。 FIG. 1 is a diagram showing a configuration example of the inference device 1 in the first embodiment. In the execution stage of the first embodiment, the inference method is applied to an applied problem such as collation or classification. The inference device 1 acquires the first array and the second array as inputs. For example, in motion recognition, the inference device 1 acquires, as an input, an array in which the coordinates of a plurality of feature points (for example, joint positions) in the human body are arranged in chronological order. In signature authentication, the inference device 1 acquires, as an input, an array in which signature coordinates, pen pressure, and the like on the display of the signature collection device are arranged in chronological order. The inference device 1 derives the distance between the first array and the second array. The inference device 1 executes the inference process based on the distance. The inference device 1 outputs the inference result to a predetermined external device (not shown).

距離は、照合又は分類などの応用問題を解決するために使用可能である。例えば、分類問題では、推論装置１は、クラスが既知である学習配列と、クラスが未知である目標配列との間の距離を導出する。推論装置１は、Ｋ近傍法又はサポートベクターマシンなどを使用して、目標配列のクラスを推定する。探索問題では、推論装置１は、クエリ配列とデータベースにある配列との間の距離を導出する。推論装置１は、距離が最も短い配列を、探索結果として導出する。 Distances can be used to solve application problems such as matching or classification. For example, in a classification problem, the inference device 1 derives the distance between a learning array whose class is known and a target array whose class is unknown. The inference device 1 estimates the class of the target sequence using the K-nearest neighbor method, a support vector machine, or the like. In the search problem, the inference device 1 derives the distance between the query array and the array in the database. The inference device 1 derives the array having the shortest distance as a search result.

推論装置１は、符号化部１０－１と、符号化部１０－２と、注意機構１１と、照合部１２と、推論部１３とを備える。 The inference device 1 includes a coding unit 10-1, a coding unit 10-2, an attention mechanism 11, a collation unit 12, and an inference unit 13.

推論装置１の機能部の詳細を説明する。
＜符号化部１０＞
符号化部１０－１は、第１配列を入力として取得する。符号化部１０－２は、第２配列を入力として取得する。符号化部１０－１は、第１特徴配列（第１特徴表現）を注意機構１１と照合部１２とに出力する。符号化部１０－２は、第２特徴配列（第２特徴表現）を注意機構１１と照合部１２とに出力する。 The details of the functional part of the inference device 1 will be described.
<Encoding unit 10>
The coding unit 10-1 acquires the first array as an input. The coding unit 10-2 acquires the second array as an input. The coding unit 10-1 outputs the first feature array (first feature expression) to the attention mechanism 11 and the collation unit 12. The coding unit 10-2 outputs the second feature array (second feature expression) to the attention mechanism 11 and the collation unit 12.

符号化部１０－１の動作は、符号化部１０－２の動作と同様である。このため以下では、符号化部１０－１の動作について説明する。また以下では、符号化部１０－１と符号化部１０－２とに共通する事項については、符号の一部を省略して、「符号化部１０」と表記する。符号化部１０は、第１配列に基づいて、数値又は数値ベクトルを要素とする配列を第１特徴配列として導出する。 The operation of the coding unit 10-1 is the same as the operation of the coding unit 10-2. Therefore, the operation of the coding unit 10-1 will be described below. In the following, items common to the coding unit 10-1 and the coding unit 10-2 will be referred to as "encoding unit 10" by omitting a part of the code. The coding unit 10 derives an array having a numerical value or a numerical vector as an element as a first feature array based on the first array.

＜符号化部１０の第１例＞
符号化部１０の第１例では、符号化部１０は、人工ニューラルネットワークを使用して、第１特徴配列を第１配列から導出する。学習段階において、人工ニューラルネットワークのパラメータは、学習データに基づいて決定される。 <First example of coding unit 10>
In the first example of the coding unit 10, the coding unit 10 derives the first feature array from the first array by using an artificial neural network. In the learning stage, the parameters of the artificial neural network are determined based on the training data.

符号化部１０の第１例の処理の詳細は、以下の通りである。
符号化部１０の第１例では、符号化部１０は、第１配列の長さを，所定の長さ（例えば、１０２４）に変更する。これは、人工ニューラルネットワークの学習が実行される場合に、バッチ学習又はミニバッチ学習を使用可能とするために必要である。第１配列の各要素は、１次元の数値又は多次元の数値ベクトルである。 The details of the processing of the first example of the coding unit 10 are as follows.
In the first example of the coding unit 10, the coding unit 10 changes the length of the first array to a predetermined length (for example, 1024). This is necessary to enable batch learning or mini-batch learning when learning of the artificial neural network is performed. Each element of the first array is a one-dimensional numerical value or a multidimensional numerical vector.

長さが変更された第１配列の要素の各次元について、当該次元の全ての数値の平均が０になり、当該次元の全ての数値の分散が１になるように、符号化部１０は、当該次元の全ての数値を正規化する。正規化された第１配列は、例えば、「１×１０２４×５」のテンソルである。この「１０２４」は、配列の長さの例である。この「５」は、配列の要素の次元数の例である。 For each dimension of the elements of the first array whose length has been changed, the coding unit 10 sets the average of all the numerical values in the dimension to 0 and the dispersion of all the numerical values in the dimension to 1. Normalize all numbers in that dimension. The normalized first sequence is, for example, a "1x1024x5" tensor. This "1024" is an example of the length of the sequence. This "5" is an example of the number of dimensions of the elements of the array.

符号化部１０は、正規化された第１配列を、畳み込みニューラルネットワークに入力する。畳み込みニューラルネットワークは、例えば、１個の「１×７×６４」の畳み込み層と、１個の最大プーリング層と、２個の「１×３×６４」の畳み込み層を備える。各畳み込み層の直後には、バッチ正規化層が備えられる。バッチ正規化層に続いて、ＲｅＬＵ層が活性化関数として備えられる。最後のＲｅＬＵ層は、多次元の数値ベクトルを要素とする配列を出力する。 The coding unit 10 inputs the normalized first array to the convolutional neural network. The convolutional neural network includes, for example, one "1x7x64" convolutional layer, one maximum pooling layer, and two "1x3x64" convolutional layers. Immediately after each convolution layer is a batch normalization layer. Following the batch normalization layer, the ReLU layer is provided as an activation function. The final ReLU layer outputs an array whose elements are multidimensional numerical vectors.

符号化部１０は、多次元の数値ベクトルを要素とする配列の各要素について、当該要素の全ての数値のＬ２ノルムが１になるように、当該要素の全ての数値を正規化する。符号化部１０は、正規化された配列を第１特徴配列として、注意機構１１と照合部１２とに出力する。符号化部１０の第１例では、畳み込みニューラルネットワークの代わりに、再帰型ニューラルネットワークなどが使用されてもよい。 The coding unit 10 normalizes all the numerical values of the element so that the L2 norm of all the numerical values of the element becomes 1 for each element of the array having the multidimensional numerical vector as an element. The coding unit 10 outputs the normalized sequence as the first feature array to the attention mechanism 11 and the collating unit 12. In the first example of the coding unit 10, a recurrent neural network or the like may be used instead of the convolutional neural network.

＜符号化部１０の第２例＞
符号化部１０の第２例では、符号化部１０は、入力された第１配列を第１特徴配列として、注意機構１１と照合部１２とに出力する。符号化部１０の第２例では、符号化部１０は、パラメータを持たない。 <Second example of coding unit 10>
In the second example of the coding unit 10, the coding unit 10 outputs the input first array as the first feature array to the attention mechanism 11 and the collating unit 12. In the second example of the coding unit 10, the coding unit 10 has no parameters.

＜注意機構１１＞
注意機構１１は、第１特徴配列を、符号化部１０－１から取得する。注意機構１１は、第２特徴配列を、符号化部１０－２から取得する。注意機構１１は、第１特徴配列の各要素と第２特徴配列の各要素とに基づいて、第２特徴配列の各要素に対する第１特徴配列の各要素の重みを導出する。第２特徴配列の各要素に対する、第１特徴配列の各要素の重みは、２個の要素が対応関係にある確率を表す。重みが大きいほど、２個の要素が対応関係にある確率が高い。注意機構１１は、重み行列を照合部１２に出力する。 <Caution mechanism 11>
The attention mechanism 11 acquires the first feature array from the coding unit 10-1. The attention mechanism 11 acquires the second feature array from the coding unit 10-2. Attention mechanism 11 derives the weight of each element of the first feature array for each element of the second feature array based on each element of the first feature array and each element of the second feature array. The weight of each element of the first feature array for each element of the second feature array represents the probability that the two elements are in a correspondence relationship. The larger the weight, the higher the probability that the two elements are in a correspondence relationship. The attention mechanism 11 outputs the weight matrix to the collating unit 12.

＜注意機構１１の第１例＞
注意機構１１の第１例では、注意機構１１は、人工ニューラルネットワークを使用して、第１特徴配列の各要素と第２特徴配列の各要素とに基づいて、第２特徴配列の各要素に対する第１特徴配列の各要素の重みを導出する。学習段階において、人工ニューラルネットワークのパラメータは、学習データに基づいて決定される。 <First example of attention mechanism 11>
In the first example of the attention mechanism 11, the attention mechanism 11 uses an artificial neural network for each element of the second feature array based on each element of the first feature array and each element of the second feature array. The weight of each element of the first feature array is derived. In the learning stage, the parameters of the artificial neural network are determined based on the training data.

注意機構１１の第１例の処理の詳細は、以下の通りである。
注意機構１１の第１例では、注意機構１１は、第１特徴配列の各要素である数値ベクトルと、第２特徴配列の各要素である数値ベクトルとを、数値ベクトルの次元方向に沿って連結する。注意機構１１は、連結された数値ベクトルを、人工ニューラルネットワークに入力する。 The details of the processing of the first example of the attention mechanism 11 are as follows.
In the first example of the attention mechanism 11, the attention mechanism 11 connects the numerical vector which is each element of the first feature array and the numerical vector which is each element of the second feature array along the dimensional direction of the numerical vector. do. Attention mechanism 11 inputs the concatenated numerical vectors into the artificial neural network.

人工ニューラルネットワークは、例えば、３個の全結合層を備える。３個の全結合層において、１個目の全結合層が６４個の隠れユニットを有し、２個目の全結合層が１６個の隠れユニットを有し、３個目の全結合層が１個の隠れユニットを有する。１個目の全結合層の直後において、活性化関数としてＲｅＬＵ層が備えられる。２個目の全結合層の直後において、活性化関数としてＲｅＬＵ層が備えられる。３個目の全結合層は、１個の実数を出力する。 The artificial neural network includes, for example, three fully connected layers. Of the three fully bonded layers, the first fully bonded layer has 64 hidden units, the second fully bonded layer has 16 hidden units, and the third fully bonded layer has 16 hidden units. It has one hidden unit. Immediately after the first fully connected layer, a ReLU layer is provided as an activation function. Immediately after the second fully connected layer, a ReLU layer is provided as an activation function. The third fully connected layer outputs one real number.

第２特徴配列の各要素について、注意機構１１は、当該要素と第１特徴配列の各要素とを用いて導出された実数を全て含む配列を、Ｓｏｆｔｍａｘ関数を用いて正規化する。この導出された実数を全て含む配列とは、第１特徴配列の各要素に対して出力された実数を配列としてまとめたものである。導出された実数を全て含む配列は、第１特徴配列の要素数と同じ数の実数を含む。注意機構１１は、第２特徴配列の各要素に対する第１特徴配列の各要素の重みとして、正規化された実数を導出する。注意機構１１は、第２特徴配列の各要素に対する第１特徴配列の各要素の重みを全て含む行列を、重み行列として照合部１２に出力する。 For each element of the second feature array, the attention mechanism 11 normalizes an array containing all the real numbers derived using the element and each element of the first feature array using the Softmax function. The array including all the derived real numbers is a collection of the real numbers output for each element of the first feature array as an array. An array containing all the derived real numbers contains the same number of real numbers as the number of elements in the first feature array. Attention mechanism 11 derives a normalized real number as the weight of each element of the first feature array for each element of the second feature array. Attention mechanism 11 outputs a matrix including all the weights of each element of the first feature array to each element of the second feature array to the collating unit 12 as a weight matrix.

＜注意機構１１の第２例＞
注意機構１１の第２例の処理の詳細は、以下の通りである。
注意機構１１の第２例では、注意機構１１は、第１特徴配列の各要素と第２特徴配列の各要素との内積を導出する。注意機構１１は、第２特徴配列の各要素について、第２特徴配列の各要素と第１特徴配列の各要素との内積を全て含む配列を、Ｓｏｆｔｍａｘ関数によって正規化する。注意機構１１は、第２特徴配列の各要素に対する第１特徴配列の各要素の重みとして、正規化された内積を導出する。注意機構１１は、第２特徴配列の各要素に対する第１特徴配列の各要素の重みを全て含む行列を、重み行列として照合部１２に出力する。 <Second example of attention mechanism 11>
The details of the processing of the second example of the attention mechanism 11 are as follows.
In the second example of the attention mechanism 11, the attention mechanism 11 derives the inner product of each element of the first feature array and each element of the second feature array. Attention mechanism 11 normalizes an array including all the inner products of each element of the second feature array and each element of the first feature array for each element of the second feature array by the Softmax function. Attention mechanism 11 derives a normalized inner product as the weight of each element of the first feature array for each element of the second feature array. Attention mechanism 11 outputs a matrix including all the weights of each element of the first feature array to each element of the second feature array to the collating unit 12 as a weight matrix.

注意機構１１の第２例では、注意機構１１は、パラメータを持たない。符号化部１０と注意機構１１とを含む数理モデルを学習するためには、数理モデルがパラメータを持たなければ、数理モデルを学習することができない。従って、符号化部１０の第２例が使用される場合には、注意機構１１の第２例を使用することはできない。すなわち、パラメータを持たない符号化部１０が使用される場合には、パラメータを持たない注意機構１１を使用することはできない。 In the second example of the attention mechanism 11, the attention mechanism 11 has no parameters. In order to learn a mathematical model including the coding unit 10 and the attention mechanism 11, the mathematical model cannot be learned unless the mathematical model has parameters. Therefore, when the second example of the coding unit 10 is used, the second example of the attention mechanism 11 cannot be used. That is, when the coding unit 10 having no parameters is used, the attention mechanism 11 having no parameters cannot be used.

＜照合部１２＞
照合部１２は、第１特徴配列を符号化部１０－１から取得する。照合部１２は、第２特徴配列を符号化部１０－２から取得する。照合部１２は、重み行列を注意機構１１から取得する。照合部１２は、第１特徴配列と第２特徴配列と重み行列とに基づいて、第１配列と第２配列との間の距離を導出する。照合部１２は、第１配列と第２配列との間の距離（距離情報）を、推論部１３に出力する。なお、照合部１２は、所定の外部装置（不図示）に距離（距離情報）を出力してもよい。 <Collation unit 12>
The collation unit 12 acquires the first feature array from the coding unit 10-1. The collation unit 12 acquires the second feature array from the coding unit 10-2. The collation unit 12 acquires the weight matrix from the attention mechanism 11. The collation unit 12 derives the distance between the first array and the second array based on the first feature array, the second feature array, and the weight matrix. The collation unit 12 outputs the distance (distance information) between the first array and the second array to the inference unit 13. The collation unit 12 may output the distance (distance information) to a predetermined external device (not shown).

＜照合部１２の第１例＞
照合部１２の第１例では、照合部１２は、重み行列を使用して、第１特徴配列の各要素に対して重み付けを実行する。照合部１２は、重み付けによって得られた新しい特徴配列を、変換特徴配列として導出する。照合部１２は、変換特徴配列と第２特徴配列との間の距離を、第１配列と第２配列との間の距離として導出する。 <First example of collation unit 12>
In the first example of the collating unit 12, the collating unit 12 uses a weighting matrix to perform weighting on each element of the first feature array. The collation unit 12 derives a new feature array obtained by weighting as a conversion feature array. The collation unit 12 derives the distance between the converted feature array and the second feature array as the distance between the first array and the second array.

照合部１２の第１例の処理の詳細は、以下の通りである。
照合部１２の第１例では、照合部１２は、第２特徴配列の各要素について、第２特徴配列の各要素に対する第１特徴配列の各要素の重みを用いて、第１特徴配列の全ての要素の加重総和を導出する。これによって、第２特徴配列の各要素との対応関係にある第１特徴配列の要素が、加重総和として特定（抽出又は生成）される。すなわち、第２特徴配列の各要素との対応関係にある第１特徴配列の要素が整列される。従って、第１配列と第２配列との間に存在する局所的な変移と速度の変化とに関する非線形の時間変動が補償される。 The details of the processing of the first example of the collating unit 12 are as follows.
In the first example of the collating unit 12, the collating unit 12 uses the weights of each element of the first feature array for each element of the second feature array for each element of the second feature array, and all of the first feature array. Derivation of the weighted sum of the elements of. As a result, the elements of the first feature array that correspond to each element of the second feature array are specified (extracted or generated) as the weighted sum. That is, the elements of the first feature array that correspond to each element of the second feature array are aligned. Therefore, the non-linear time variation with respect to the local transition and the change in velocity existing between the first sequence and the second sequence is compensated.

照合部１２は、第２特徴配列の各要素（数値又は数値ベクトル）と、当該要素に対して導出された第１特徴配列の全ての要素の加重総和（数値又は数値ベクトル）との距離（例えば、ユークリッド距離）を、局所距離として導出する。第１配列と第２配列との間の時間変動が既に補償されているため、第２特徴配列の各要素と当該要素に対して導出された加重総和とが対応関係にある確率は高い。従って、第２特徴配列の各要素と当該要素に対して導出された加重総和との距離を照合部１２が導出することによって、第１特徴配列と第２特徴配列との間の局所的な差異をより正しく表す距離を照合部１２が導出することが可能になる。 The collation unit 12 has a distance (for example, a numerical value or a numerical vector) between each element (numerical value or numerical vector) of the second feature array and the weighted sum (numerical value or numerical vector) of all the elements of the first feature array derived for the element. , Euclidean distance) is derived as a local distance. Since the time variation between the first sequence and the second sequence has already been compensated, there is a high probability that each element of the second feature sequence and the weighted sum derived for the element are in a correspondence relationship. Therefore, the collating unit 12 derives the distance between each element of the second feature array and the weighted sum derived for the element, so that the local difference between the first feature array and the second feature array is obtained. It becomes possible for the collating unit 12 to derive a distance that more accurately represents.

照合部１２は、第２特徴配列の全ての要素に関する全ての局所距離の総和又は平均を導出する。照合部１２は、局所距離の総和又は平均を、第１配列と第２配列との間の距離として推論部１３に出力する。ここで、第１特徴配列は「Ｘ∈Ｒ^Ｗ×Ｋ」と表記され、第２特徴配列は「Ｙ∈Ｒ^Ｗ×Ｋ」と表記される。「Ｗ」は、特徴配列の長さを表す。「Ｋ」は、特徴配列の要素である数値又は数値ベクトルの次元数を表す。「Ｘ」のｊ番目の行ベクトル「ｘ_ｊ∈Ｒ^１×Ｋ」は、「Ｘ」のｊ番目の要素を表す。同様に、「Ｙ」のｉ番目の行ベクトル「ｙ_ｉ∈Ｒ^１×Ｋ」は、「Ｙ」のｉ番目の要素を表す。 The collation unit 12 derives the sum or average of all local distances for all the elements of the second feature array. The collation unit 12 outputs the sum or average of the local distances to the inference unit 13 as the distance between the first array and the second array. Here, the first feature array is expressed as "X ∈ R ^{W × K} ", and the second feature array is expressed as "Y ∈ R ^{W × K} ". "W" represents the length of the feature array. "K" represents the number of dimensions of a numerical value or a numerical vector which is an element of the feature array. The j-th row vector "x _j ∈ R ^{1 × K} " of "X" represents the j-th element of "X". Similarly, the i-th row vector "y _i ∈ R ^{1 × K} " of "Y" represents the i-th element of "Y".

重み行列は「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。「Ｐ」のｉ番目の行ベクトル「ｐ_ｉ∈Ｒ^１×Ｗ」は、「ｙ_ｉ」に対する「x_１，…，ｘ_Ｗ」の重み「ｐ_ｉ１，…，ｐ_ｉＷ」を含む。「ｐ_ｉ」のｊ番目の要素「ｐ_ｉｊ」は、「ｙ_ｉ」に対する「ｘ_ｊ」の重みを表す。 The weight matrix is written as "P ∈ R ^{W × W} ". The i-th row vector "pi ∈ R ^{1 × W} " of "P" includes the weight " _pi _{1, ..., p i} _W _" of "x ₁ , ..., X W" with respect to "y _i ". The jth element " _pij " of " _pi " represents the weight of "x _j " with respect to "y _i ".

「ｐ_ｉ」がＳｏｆｔｍａｘ関数によって正規化されているので、「ｐ_ｉ１，…，ｐ_ｉＷ」の合計は１である。従って、第１配列と第２配列との間の距離は、式（１）のように表される。 Since " _pi " is normalized by the Softmax function, the sum of " _pi1 , ..., _piW " is 1. Therefore, the distance between the first sequence and the second sequence is expressed by the equation (1).

ここで、「ｐ_ｉＸ」は、「ｙ_ｉ」に対する「ｘ_１，…，ｘ_Ｗ」の加重総和を表す。「||ｐ_ｉＸ－ｙ_ｉ||」は、「ｐ_ｉＸ」と「ｙ_ｉ」との間のユークリッド距離、すなわち局所距離を表す。 Here, " _pi X" represents the weighted sum of "x ₁ , ..., X _W " with respect to "y _i ". "|| p _i X-y _i ||" represents the Euclidean distance between " _pi X" and "y _i ", that is, the local distance.

＜照合部１２の第２例＞
照合部１２の第２例では、照合部１２は、第１特徴配列の各要素と第２特徴配列の各要素との間の距離を導出する。照合部１２は、重み行列を使用して、距離に対して重み付けを実行する。照合部１２は、重みに基づいて、第１配列と第２配列との間の距離を導出する。 <Second example of collation unit 12>
In the second example of the collating unit 12, the collating unit 12 derives the distance between each element of the first feature array and each element of the second feature array. The collating unit 12 uses the weighting matrix to perform weighting on the distance. The collating unit 12 derives the distance between the first array and the second array based on the weight.

照合部１２の第２例の処理の詳細は、以下の通りである。
照合部１２の第２例では、照合部１２は、第１特徴配列の各要素と第２特徴配列の各要素との間の距離（例えば、ユークリッド距離）を、局所距離として導出する。照合部１２は、重み行列を使用して、局所距離の加重総和又は加重平均を導出する。照合部１２は、第１配列と第２配列との間の距離として、局所距離の加重総和又は加重平均を推論部１３に出力する。 The details of the processing of the second example of the collating unit 12 are as follows.
In the second example of the collating unit 12, the collating unit 12 derives the distance (for example, the Euclidean distance) between each element of the first feature array and each element of the second feature array as a local distance. The collation unit 12 uses the weight matrix to derive the weighted sum or weighted average of the local distances. The collation unit 12 outputs the weighted sum or weighted average of the local distances to the inference unit 13 as the distance between the first array and the second array.

第２特徴配列の各要素に対する第１特徴配列の各要素の重みは、２個の要素が対応関係にある確率を表す。重みが大きいほど、２個の要素が対応関係にある確率が高い。照合部１２は、対応関係にある確率の高い２個の要素に対して、２個の要素の間の局所距離に対してより大きい重みを付与する。照合部１２は、対応関係にある確率の低い２個の要素に対して、２個の要素の間の局所距離に対してより小さい重みを付与する。 The weight of each element of the first feature array for each element of the second feature array represents the probability that the two elements are in a correspondence relationship. The larger the weight, the higher the probability that the two elements are in a correspondence relationship. The collation unit 12 gives a larger weight to the two elements having a high probability of being in a correspondence relationship with respect to the local distance between the two elements. The collating unit 12 imparts a smaller weight to the local distance between the two elements to the two elements having a low probability of being in a correspondence relationship.

これによって、第１配列と第２配列との間に存在する局所的な変移と速度の変化とに関する非線形の時間変動が補償される。また、第１配列と第２配列との間の距離が、より正しく導出される。 This compensates for the non-linear time variation with respect to the local and velocity changes that exist between the first and second sequences. Also, the distance between the first array and the second array is more accurately derived.

照合部１２の第１例と同様に、照合部１２の第２例では、第１特徴配列は「Ｘ∈Ｒ^Ｗ×Ｋ」と表記され、第２特徴配列は「Ｙ∈Ｒ^Ｗ×Ｋ」と表記される。特徴配列の長さは「Ｗ」と表記される。「Ｘ」のｊ番目の要素が「ｘ_ｊ∈Ｒ^１×Ｋ」と表記され、「Ｙ」のｉ番目の要素は「ｙ_ｉ∈Ｒ^１×Ｋ」と表記される。重み行列は「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。「ｙ_ｉ」に対する「ｘ_ｊ」の重みは「ｐ_ｉｊ∈Ｐ」と表記される。従って、第１配列と第２配列との間の距離は、式（２）のように表される。 Similar to the first example of the collating unit 12, in the second example of the collating unit 12, the first feature array is written as “X ∈ R ^{W × K} ” and the second feature array is “Y ∈ R ^{W × K} ”. It is written as. The length of the feature array is written as "W". The j-th element of "X" is written as "x _j ∈ R ^{1 × K} ", and the i-th element of “Y” is written as “y _i ∈ R ^{1 × K} ”. The weight matrix is written as "P ∈ R ^{W × W} ". The weight of "x _j " for "y _i " is expressed as " _pij ∈ P". Therefore, the distance between the first sequence and the second sequence is expressed by the equation (2).

ここで、「||ｘ_ｊ－ｙ_ｉ||」は、「ｘ_ｊ」と「ｙ_ｉ」との間のユークリッド距離、すなわち局所距離を表す。 Here, "|| x _j -y _i ||" represents the Euclidean distance between "x _j " and "y _i ", that is, the local distance.

＜推論部１３＞
推論部１３は、第１配列と第２配列との間の距離として、局所距離の加重総和又は加重平均を、照合部１２から取得する。推論部１３は、第１配列と第２配列との間の距離に基づいて推論処理を実行する。推論部１３は、所定の外部装置（不図示）に推論結果を出力する。推論処理は、特定の推論処理に限定されない。例えば、複数人の手書き署名の筆者が推論される場合、筆者が未知である署名（第１配列）と筆者が既知である署名（第２配列）とが学習済の数理モデルに入力される。推論部１３は、照合部１２から取得された第１配列と第２配列との間の距離が最も短い第２配列の筆者ＩＤ（identification number）を、第１配列の筆者ＩＤ（推論結果）として出力する。各筆者について第２配列が複数存在する場合には、推論部１３は、距離の平均値が最も短い筆者ＩＤを、推論結果として出力してもよい。 <Inference unit 13>
The inference unit 13 obtains the weighted sum or the weighted average of the local distances from the collation unit 12 as the distance between the first array and the second array. The inference unit 13 executes the inference process based on the distance between the first array and the second array. The inference unit 13 outputs the inference result to a predetermined external device (not shown). The inference process is not limited to a specific inference process. For example, when the authors of a plurality of handwritten signatures are inferred, the signatures unknown to the author (first array) and the signatures known to the author (second array) are input to the trained mathematical model. The inference unit 13 uses the author ID (identification number) of the second array, which has the shortest distance between the first array and the second array acquired from the collation unit 12, as the author ID (inference result) of the first array. Output. When a plurality of second sequences exist for each writer, the reasoning unit 13 may output the writer ID having the shortest average value of distances as an inference result.

次に、学習段階における、照合又は分類などの応用問題に適用される学習方法について説明する。 Next, a learning method applied to an applied problem such as collation or classification in the learning stage will be described.

図２は、第１実施形態における、学習装置２の構成例を示す図である。第１実施形態の学習段階では、照合又は分類などの応用問題に学習方法が適用される。学習装置２は、第１配列と第２配列とラベルとを、入力として取得する。学習装置２は、目的関数値と制約関数値とを導出する。学習装置２は、目的関数値と制約関数値とに基づいて、学習済の数理モデル（学習結果）を所定の外部装置（不図示）に出力する。また、学習装置２は、学習済の数理モデルを、実行段階よりも前に推論装置１に出力する。 FIG. 2 is a diagram showing a configuration example of the learning device 2 in the first embodiment. In the learning stage of the first embodiment, the learning method is applied to applied problems such as collation or classification. The learning device 2 acquires the first array, the second array, and the label as inputs. The learning device 2 derives the objective function value and the constraint function value. The learning device 2 outputs a trained mathematical model (learning result) to a predetermined external device (not shown) based on the objective function value and the constraint function value. Further, the learning device 2 outputs the trained mathematical model to the inference device 1 before the execution stage.

第１配列と第２配列とラベルとは、所定の目的（例えば、照合又は分類）のタスクを実行するための数理モデルを学習装置２が学習するために使用される学習データである。ラベルは、同じクラスに第１配列と第２配列とが属するか否かを表す。目的関数値と制約関数値とは、数理モデルを学習装置２が学習するために使用される。例えば、多数の学習データを使用して導出された目的関数値と制約関数値との加重総和又は加重平均が可能な限り小さくなるように（例えば、最小になるように）、学習装置２は数理モデルのパラメータを更新する。学習データの数が多いほど、数理モデルの性能が向上する。学習データの数は、例えば、２万から３万程度である。 The first array, the second array, and the label are learning data used by the learning device 2 to learn a mathematical model for performing a task of a predetermined purpose (for example, collation or classification). The label indicates whether or not the first array and the second array belong to the same class. The objective function value and the constraint function value are used for the learning device 2 to learn the mathematical model. For example, the learning device 2 is mathematical so that the weighted sum or weighted average of the objective function value and the constraint function value derived using a large number of training data is as small as possible (for example, to be the minimum). Update model parameters. The larger the number of training data, the better the performance of the mathematical model. The number of training data is, for example, about 20,000 to 30,000.

学習装置２は、符号化部２０－１と、符号化部２０－２と、注意機構２１と、目的関数値導出部２２と、制約関数値導出部２３と、更新部２４とを備える。 The learning device 2 includes a coding unit 20-1, a coding unit 20-2, an attention mechanism 21, an objective function value derivation unit 22, a constraint function value derivation unit 23, and an update unit 24.

学習装置２の機能部の詳細を説明する。
＜符号化部２０＞
符号化部２０－１は、第１配列を入力として取得する。符号化部２０－２は、第２配列を入力として取得する。符号化部２０－１の動作は、符号化部２０－２の動作と同様である。学習段階における符号化部２０－１の処理は、実行段階における符号化部１０－１の処理と同じである。学習段階における符号化部２０－２の処理は、実行段階における符号化部１０－２の処理と同じである。 The details of the functional unit of the learning device 2 will be described.
<Encoding unit 20>
The coding unit 20-1 acquires the first array as an input. The coding unit 20-2 acquires the second array as an input. The operation of the coding unit 20-1 is the same as the operation of the coding unit 20-2. The processing of the coding unit 20-1 in the learning stage is the same as the processing of the coding unit 10-1 in the execution stage. The processing of the coding unit 20-2 in the learning stage is the same as the processing of the coding unit 10-2 in the execution stage.

符号化部２０－１は、第１特徴配列を注意機構２１と目的関数値導出部２２とに出力する。符号化部２０－２は、第２特徴配列を注意機構２１と目的関数値導出部２２とに出力する。以下では、符号化部２０－１と符号化部２０－２とに共通する事項については、符号の一部を省略して、「符号化部２０」と表記する。 The coding unit 20-1 outputs the first feature array to the attention mechanism 21 and the objective function value deriving unit 22. The coding unit 20-2 outputs the second feature array to the attention mechanism 21 and the objective function value deriving unit 22. In the following, the matters common to the coding unit 20-1 and the coding unit 20-2 will be referred to as "encoding unit 20" by omitting a part of the code.

＜注意機構２１＞
注意機構２１は、第１特徴配列を符号化部２０－１から取得する。注意機構２１は、第２特徴配列を符号化部２０－２から取得する。学習段階における注意機構２１の処理は、実行段階における注意機構１１の処理と同じである。注意機構２１は、重み行列を目的関数値導出部２２と制約関数値導出部２３とに出力する。 <Caution mechanism 21>
The attention mechanism 21 acquires the first feature array from the coding unit 20-1. The attention mechanism 21 acquires the second feature array from the coding unit 20-2. The processing of the attention mechanism 21 in the learning stage is the same as the processing of the attention mechanism 11 in the execution stage. The attention mechanism 21 outputs the weight matrix to the objective function value derivation unit 22 and the constraint function value derivation unit 23.

＜目的関数値導出部２２＞
目的関数値導出部２２は、ラベルを入力として取得する。目的関数値導出部２２は、第１特徴配列と第２特徴配列とを、符号化部２０から取得する。目的関数値導出部２２は、重み行列を注意機構２１から取得する。目的関数値導出部２２は、第１特徴配列と第２特徴配列と重み行列とに基づいて、第１特徴配列と第２特徴配列との間の差分を導出する。目的関数値導出部２２は、導出された差分がラベルに関連付けられるように、目的関数値を導出する。 <Objective function value derivation unit 22>
The objective function value derivation unit 22 acquires the label as an input. The objective function value derivation unit 22 acquires the first feature array and the second feature array from the coding unit 20. The objective function value derivation unit 22 acquires the weight matrix from the attention mechanism 21. The objective function value derivation unit 22 derives the difference between the first feature array and the second feature array based on the first feature array, the second feature array, and the weight matrix. The objective function value derivation unit 22 derives the objective function value so that the derived difference is associated with the label.

同じクラスに第１配列と第２配列とが属する場合、差分が大きいほど、目的関数値が大きくなる。異なるクラスに第１配列と第２配列とが属する場合、差分が小さいほど、目的関数値が大きくなる。目的関数値導出部２２は、このような目的関数値を更新部２４に出力する。 When the first array and the second array belong to the same class, the larger the difference, the larger the objective function value. When the first array and the second array belong to different classes, the smaller the difference, the larger the objective function value. The objective function value derivation unit 22 outputs such an objective function value to the update unit 24.

＜目的関数値導出部２２の第１例＞
実行段階において照合部１２の第１例が使用される場合、学習段階において、目的関数値導出部２２の第１例が使用されるほうが、目的関数値導出部２２の第２例が使用されるよりも望ましい。目的関数値導出部２２の第１例では、目的関数値導出部２２は、重み行列を使用して、第１特徴配列の各要素に対して重み付けを実行する。目的関数値導出部２２は、重み付けによって得られた新しい特徴配列を、変換特徴配列として導出する。目的関数値導出部２２は、変換特徴配列と第２特徴配列との間の差分を導出する。目的関数値導出部２２は、導出された差分がラベルに関連付けられるように、目的関数値を導出する。 <First example of objective function value derivation unit 22>
When the first example of the collation unit 12 is used in the execution stage, the first example of the objective function value derivation unit 22 is used in the learning stage, and the second example of the objective function value derivation unit 22 is used. More desirable than. In the first example of the objective function value derivation unit 22, the objective function value derivation unit 22 uses a weight matrix to perform weighting on each element of the first feature array. The objective function value derivation unit 22 derives a new feature array obtained by weighting as a conversion feature array. The objective function value derivation unit 22 derives the difference between the conversion feature array and the second feature array. The objective function value derivation unit 22 derives the objective function value so that the derived difference is associated with the label.

目的関数値導出部２２の第１例の処理の詳細は、以下の通りである。
目的関数値導出部２２の第１例では、目的関数値導出部２２は、第２特徴配列の各要素について、第２特徴配列の各要素に対する第１特徴配列の各要素の重みを用いて、第１特徴配列の全ての要素の加重総和を導出する。 The details of the processing of the first example of the objective function value derivation unit 22 are as follows.
In the first example of the objective function value derivation unit 22, the objective function value derivation unit 22 uses the weight of each element of the first feature array with respect to each element of the second feature array for each element of the second feature array. The weighted sum of all the elements of the first feature array is derived.

これによって、第２特徴配列の各要素との対応関係にある第１特徴配列の要素が、加重総和として特定（抽出又は生成）される。すなわち、第２特徴配列の各要素との対応関係にある第１特徴配列の要素が整列される。従って、第１配列と第２配列との間に存在する局所的な変移と速度の変化とに関する非線形の時間変動が補償される。 As a result, the elements of the first feature array that correspond to each element of the second feature array are specified (extracted or generated) as the weighted sum. That is, the elements of the first feature array that correspond to each element of the second feature array are aligned. Therefore, the non-linear time variation with respect to the local transition and the change in velocity existing between the first sequence and the second sequence is compensated.

目的関数値導出部２２は、第１特徴配列の全ての要素の加重総和（数値又は数値ベクトル）と、第２特徴配列の各要素（数値又は数値ベクトル）との距離（例えば、ユークリッド距離）を、局所距離として導出する。目的関数値導出部２２は、局所距離を用いて、局所目的関数値を導出する。同じクラスに第１配列と第２配列とが属する場合、局所距離が長いほど、局所目的関数値が大きくなる。異なるクラスに第１配列と第２配列とが属する場合、局所距離が短いほど、局所目的関数値が大きくなる。 The objective function value derivation unit 22 determines the distance (for example, Euclidean distance) between the weighted sum (numerical value or numerical vector) of all the elements of the first feature array and each element (numerical value or numerical vector) of the second feature array. , Derived as a local distance. The objective function value derivation unit 22 derives the local objective function value using the local distance. When the first array and the second array belong to the same class, the longer the local distance, the larger the local objective function value. When the first array and the second array belong to different classes, the shorter the local distance, the larger the local objective function value.

目的関数値導出部２２は、第２特徴配列の全ての要素に関する全ての局所目的関数値の総和又は平均を導出する。目的関数値導出部２２は、局所目的関数値の総和又は平均を、目的関数値として更新部２４に出力する。ここで、第１特徴配列は「Ｘ∈Ｒ^Ｗ×Ｋ」と表記される。第２特徴配列は「Ｙ∈Ｒ^Ｗ×Ｋ」と表記される。特徴配列の長さは「Ｗ」と表記される。「Ｘ」のｊ番目の要素は「ｘ_ｊ∈Ｒ^１×Ｋ」と表記される。「Ｙ」のｉ番目の要素は「ｙｉ∈Ｒ^１×Ｋ」と表記される。 The objective function value derivation unit 22 derives the sum or average of all local objective function values for all the elements of the second feature array. The objective function value derivation unit 22 outputs the sum or average of the local objective function values to the update unit 24 as the objective function value. Here, the first feature array is expressed as "X ∈ R ^{W × K} ". The second feature array is written as "Y ∈ R ^{W × K} ". The length of the feature array is written as "W". The jth element of "X" is written as "x _j ∈ R ^{1 × K} ". The i-th element of "Y" is written as "y ∈ R ^{1 × K} ".

重み行列は、「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。「Ｐ」のｉ番目の行ベクトル「ｐ_ｉ∈Ｒ^１×Ｗ」は、「ｙ_ｉ」に対する「ｘ_１，…，ｘ_Ｗ」の重み「ｐ_ｉ１，…，ｐ_ｉＷ」を含む。ラベルが「ｚ∈｛０,１｝」と表記される。同じクラスに第１配列と第２配列とが属する場合に、ラベルが「ｚ＝１」となる。異なるクラスに第１配列と第２配列とが属する場合に、ラベルが「ｚ＝０」となる。従って、目的関数値は、式（３）のように表される。 The weight matrix is expressed as "P ∈ R ^{W × W} ". The i-th row vector "pi ∈ R ^{1 × W} " of "P" includes the weight " _pi _{1, ..., p i} _W _" of "x ₁ , ..., X W" with respect to "y _i ". The label is written as "z ∈ {0,1}". When the first array and the second array belong to the same class, the label is "z = 1". When the first array and the second array belong to different classes, the label is "z = 0". Therefore, the objective function value is expressed as in Eq. (3).

ここで、「ｐ_ｉＸ」は、「ｙ_ｉ」に対する「ｘ_１，…，ｘ_Ｗ」の加重総和を表す。「|｜ｐ_ｉＸ－ｙ_ｉ||」は、「ｐ_ｉＸ」と「ｙ_ｉ」との間のユークリッド距離、すなわち局所距離を表す。「τ」は、ハイパーパラメータであって、正の実数である。 Here, " _pi X" represents the weighted sum of "x ₁ , ..., X _W " with respect to "y _i ". "|| p _i X-y _i ||" represents the Euclidean distance between " _pi X" and "y _i ", that is, the local distance. "Τ" is a hyperparameter and is a positive real number.

学習段階では、更新部２４は、多数の学習データを使用して導出された目的関数値と制約関数値との加重総和又は加重平均が可能な限り小さくなるように（例えば、最小になるように）、符号化部２０と注意機構２１とを含む数理モデルのパラメータを更新する。目的関数値が最小化されることによって、同じクラスに第１配列と第２配列とが属する場合において数理モデルが局所距離をより小さく導出するようにパラメータが更新される。 At the learning stage, the updater 24 makes the weighted sum or weighted average of the objective function value and the constraint function value derived using a large number of training data as small as possible (for example, to be minimized). ), The parameters of the mathematical model including the coding unit 20 and the attention mechanism 21 are updated. By minimizing the objective function value, the parameters are updated so that the mathematical model derives a smaller local distance when the first and second arrays belong to the same class.

同じクラスに第１配列と第２配列とが属する場合において第２特徴配列の各要素と類似する第１特徴配列の要素を数理モデルがより正しく特定できるように、目的関数値導出部２２の第１例の目的関数値に基づいて、パラメータが更新される。すなわち、同じクラスに第１配列と第２配列とが属する場合において第２特徴配列の各要素との対応関係にある第１特徴配列の要素を数理モデルがより正しく特定できるように、目的関数値導出部２２の第１例の目的関数値に基づいて、パラメータが更新される。 The first of the objective function value derivation unit 22 so that the mathematical model can more correctly identify the elements of the first feature array that are similar to each element of the second feature array when the first array and the second array belong to the same class. The parameters are updated based on the objective function value of one example. That is, the objective function value so that the mathematical model can more correctly identify the elements of the first feature array that correspond to each element of the second feature array when the first array and the second array belong to the same class. The parameters are updated based on the objective function value of the first example of the derivation unit 22.

このように学習された数理モデルが使用されることによって、第１特徴配列の各要素と第２特徴配列の各要素との間の対応関係が、より正しく特定される。第１配列と第２配列との間の距離が、より正しく導出される。また、人手によって設計された特徴表現の使用に依存することなく、動的時間伸縮法と比べてより複雑な特徴表現を導出及び使用可能な配列整列が実現される。 By using the mathematical model learned in this way, the correspondence between each element of the first feature array and each element of the second feature array is more accurately specified. The distance between the first array and the second array is derived more correctly. Also, without relying on the use of manually designed feature representations, more complex feature representations can be derived and available sequence alignments can be achieved as compared to the dynamic time expansion and contraction method.

＜目的関数値導出部２２の第２例＞
実行段階において照合部１２の第２例が使用される場合、学習段階において、目的関数値導出部２２の第２例が使用されたほうが、目的関数値導出部２２の第１例が使用されるよりも望ましい。目的関数値導出部２２の第２例では、目的関数値導出部２２は、第１特徴配列の各要素と第２特徴配列の各要素との間の距離を導出する。目的関数値導出部２２は、重み行列を使用して、距離に対して重み付けを実行する。目的関数値導出部２２は、第１特徴配列と第２特徴配列との間の類似度を導出する。目的関数値導出部２２は、導出された類似度がラベルに関連付けられるように、目的関数値を導出する。 <Second example of objective function value derivation unit 22>
When the second example of the collating unit 12 is used in the execution stage, the first example of the objective function value deriving unit 22 is used when the second example of the objective function value deriving unit 22 is used in the learning stage. More desirable than. In the second example of the objective function value derivation unit 22, the objective function value derivation unit 22 derives the distance between each element of the first feature array and each element of the second feature array. The objective function value derivation unit 22 uses the weight matrix to perform weighting on the distance. The objective function value derivation unit 22 derives the degree of similarity between the first feature array and the second feature array. The objective function value derivation unit 22 derives the objective function value so that the derived similarity is associated with the label.

目的関数値導出部２２の第２例の処理の詳細は、以下の通りである。
目的関数値導出部２２の第２例では、目的関数値導出部２２は、第１特徴配列の各要素と第２特徴配列の各要素の間の距離（例えば、ユークリッド距離）を、局所距離として導出する。目的関数値導出部２２は、重み行列を使用して、局所距離の加重総和又は加重平均を導出する。目的関数値導出部２２は、導出された加重総和又は加重平均がラベルに関連付けられるように、目的関数値を導出する。 The details of the processing of the second example of the objective function value derivation unit 22 are as follows.
In the second example of the objective function value derivation unit 22, the objective function value derivation unit 22 uses the distance between each element of the first feature array and each element of the second feature array (for example, the Euclidean distance) as a local distance. Derived. The objective function value derivation unit 22 uses a weight matrix to derive a weighted sum or a weighted average of local distances. The objective function value derivation unit 22 derives the objective function value so that the derived weighted sum or weighted average is associated with the label.

ここで、第１特徴配列が「Ｘ∈Ｒ^Ｗ×Ｋ」と表記される。第２特徴配列が「Ｙ∈Ｒ^Ｗ×Ｋ」と表記される。特徴配列の長さが「Ｗ」と表記される。「Ｘ」のｊ番目の要素が「ｘ_ｊ∈Ｒ^１×Ｋ」と表記される。「Ｙ」のｉ番目の要素が「ｙ_ｉ∈Ｒ^１×Ｋ」と表記される。重み行列が「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。「ｙ_ｉ」に対する「ｘ_ｊ」の重みが「ｐ_ｉｊ∈Ｐ」と表記される。ラベルが「ｚ∈｛０,１｝」と表記される。同じクラスに第１配列と第２配列とが属する場合に、ラベルが「ｚ＝１」となる。異なるクラスに第１配列と第２配列とが属する場合に、ラベルが「ｚ＝０」となる。従って、第１特徴配列と第２特徴配列との間の類似度は、式（４）のように表される。 Here, the first feature array is expressed as "X ∈ R ^{W × K} ". The second feature array is written as "Y ∈ R ^{W × K} ". The length of the feature array is written as "W". The jth element of "X" is written as "x _j ∈ R ^{1 × K} ". The i-th element of "Y" is written as "y _i ∈ R ^{1 × K} ". The weight matrix is written as "P ∈ R ^{W × W} ". The weight of "x _j " for "y _i " is expressed as " _pij ∈ P". The label is written as "z ∈ {0,1}". When the first array and the second array belong to the same class, the label is "z = 1". When the first array and the second array belong to different classes, the label is "z = 0". Therefore, the degree of similarity between the first feature sequence and the second feature sequence is expressed by the equation (4).

ここで、「||ｘ_ｊ－ｙ_ｉ||」は、「ｘ_ｊ」と「ｙ_ｉ」との間のユークリッド距離、すなわち局所距離を表す。目的関数値は、式（５）のように表される。 Here, "|| x _j -y _i ||" represents the Euclidean distance between "x _j " and "y _i ", that is, the local distance. The objective function value is expressed by the equation (5).

学習段階では、多数の学習データを使用して導出された目的関数値が可能な限り小さくなるように（例えば、最小になるように）、更新部２４は、符号化部２０と注意機構２１とを含む数理モデルのパラメータを更新する。目的関数値が最小化されることによって、同じクラスに第１配列と第２配列とが属する場合において数理モデルが局所距離をより小さく導出するようにパラメータが更新される。 In the learning stage, the update unit 24 includes the coding unit 20 and the attention mechanism 21 so that the objective function value derived using a large amount of training data becomes as small as possible (for example, to be minimized). Update the parameters of the mathematical model including. By minimizing the objective function value, the parameters are updated so that the mathematical model derives a smaller local distance when the first and second arrays belong to the same class.

同じクラスに第１配列と第２配列とが属する場合、対応関係にある確率が高い２個の要素に対してより大きい重みが導出されるように、更新部２４は数理モデルのパラメータを更新する。同じクラスに第１配列と第２配列とが属する場合、対応関係にある確率が低い２個の要素に対してより小さい重みが導出されるように、更新部２４は数理モデルのパラメータを更新する。すなわち、第１特徴配列の各要素と第２特徴配列の各要素との間の対応関係がより正しく特定できるように、数理モデルのパラメータが更新される。 When the first array and the second array belong to the same class, the update unit 24 updates the parameters of the mathematical model so that a larger weight is derived for the two elements having a high probability of being in a correspondence relationship. .. When the first array and the second array belong to the same class, the updater 24 updates the parameters of the mathematical model so that smaller weights are derived for the two elements that are unlikely to be in a correspondence relationship. .. That is, the parameters of the mathematical model are updated so that the correspondence between each element of the first feature array and each element of the second feature array can be more accurately specified.

このようにして学習された数理モデルが使用されることによって、第１特徴配列の各要素と第２特徴配列の各要素との間の対応関係がより正しく特定され、第１配列と第２配列との間の距離をより正しく導出することができる。また、人手によって設計された特徴表現の使用に依存することなく、動的時間伸縮法と比べてより複雑な特徴表現を導出及び使用可能な配列整列を実現することができる。 By using the mathematical model learned in this way, the correspondence between each element of the first feature array and each element of the second feature array is more correctly identified, and the first array and the second array are used. The distance between and can be derived more correctly. In addition, it is possible to derive more complicated feature representations and realize usable sequence alignment as compared with the dynamic time expansion / contraction method, without depending on the use of manually designed feature representations.

＜制約関数値導出部２３＞
制約関数値導出部２３は、重み行列を注意機構２１から取得する。制約関数値導出部２３は、重み行列を使用して、制約関数値を導出する。制約関数値導出部２３は、単調性制約と連続性制約とのうちの少なくとも一方を満たす度合いが大きいほど制約関数値が小さくなるように、制約関数値を導出する。制約関数値導出部２３は、制約関数値を更新部２４に出力する。 <Constraint function value derivation unit 23>
The constraint function value derivation unit 23 acquires the weight matrix from the attention mechanism 21. The constraint function value derivation unit 23 derives the constraint function value using the weight matrix. The constraint function value derivation unit 23 derives the constraint function value so that the greater the degree to which at least one of the monotonic constraint and the continuity constraint is satisfied, the smaller the constraint function value. The constraint function value derivation unit 23 outputs the constraint function value to the update unit 24.

符号化部２０と注意機構２１とを含む数理モデルは、制約関数値が最小化されることによって、第１特徴配列の各要素と第２特徴配列の各要素との間の対応関係が単調性制約と連続性制約とのうちの少なくとも一方を満たす重み行列を導出するように学習される。 In the mathematical model including the coding unit 20 and the attention mechanism 21, the correspondence between each element of the first feature array and each element of the second feature array is monotonic by minimizing the constraint function value. It is learned to derive a weighting matrix that satisfies at least one of a constraint and a continuity constraint.

制約関数値導出部２３の処理の詳細は、以下の通りである。
重み行列は、第１特徴配列の各要素と第２特徴配列の各要素とが対応関係にある確率を表す行列であり、対応関係そのものではない。従って、単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いに関して、度合いを重み行列から直接評価することはできない。 The details of the processing of the constraint function value derivation unit 23 are as follows.
The weight matrix is a matrix representing the probability that each element of the first feature array and each element of the second feature array have a correspondence relationship, and is not the correspondence relationship itself. Therefore, the degree to which at least one of the monotonic constraint and the continuity constraint is satisfied cannot be evaluated directly from the weight matrix.

単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いを評価するために、対応関数のような形に重み行列を変換する必要がある。この対応関数は、例えば、第２特徴配列の各要素の添字を独立変数とし、第２特徴配列の各要素の添字との対応関係にある第１特徴配列の要素の添字を従属変数とした関数である。 In order to evaluate the degree to which at least one of the monotonicity constraint and the continuity constraint is satisfied, it is necessary to transform the weight matrix into a form like a corresponding function. This correspondence function is, for example, a function in which the subscript of each element of the second feature array is an independent variable and the subscript of the element of the first feature array that corresponds to the subscript of each element of the second feature array is a dependent variable. Is.

そこで、制約関数値導出部２３は、重み行列と所定の等差数列との積を、対応配列として導出する。等差数列とは、隣り合う要素ごとに共通の差を持つ数列である。 Therefore, the constraint function value derivation unit 23 derives the product of the weight matrix and the predetermined arithmetic progression as a corresponding array. Arithmetic progression is a sequence that has a common difference for adjacent elements.

図３は、第１実施形態における、対応配列の例を示す図である。図３における上側には、単調性制約と連続性制約とが満たされた場合について、重み行列の例と、等差数列の例と、対応配列の例とが表されている。図３における下側には、単調性制約と連続性制約とが満たされていない場合について、重み行列の例と、等差数列の例と、対応配列の例とが表されている。すなわち、等号の左辺には、重み行列と等差数列「［１,２,３,４］^Ｔ」との積が表されている。重み行列の各行は正規化済みであり、重み行列の各行では要素の合計が１である。等号の右辺には、対応配列が表されている。 FIG. 3 is a diagram showing an example of a corresponding sequence in the first embodiment. On the upper side in FIG. 3, an example of a weight matrix, an example of an arithmetic progression, and an example of a corresponding array are shown for the case where the monotonicity constraint and the continuity constraint are satisfied. On the lower side in FIG. 3, an example of a weight matrix, an example of an arithmetic progression, and an example of a corresponding array are shown for the case where the monotonicity constraint and the continuity constraint are not satisfied. That is, on the left side of the equal sign, the product of the weight matrix and the arithmetic progression "[1,2,3,4] ^T " is represented. Each row of the weight matrix has been normalized, and each row of the weight matrix has a sum of elements of 1. The corresponding array is shown on the right side of the equal sign.

等差数列を用いて導出された対応配列の添字は、第２特徴配列の各要素の添字（番号）を表す。対応配列の要素である数値は、第２特徴配列の各要素との対応関係にある第１特徴配列の要素の添字（番号）を表す。なお、対応配列の要素である数値は、第２特徴配列の各要素との対応関係にある第１特徴配列の要素の添字に比例する数値を表してもよい。 The subscript of the corresponding array derived using the arithmetic progression represents the subscript (number) of each element of the second feature array. The numerical value which is an element of the corresponding array represents the subscript (number) of the element of the first feature array which has a correspondence relationship with each element of the second feature array. The numerical value that is an element of the corresponding array may represent a numerical value that is proportional to the subscript of the element of the first feature array that has a corresponding relationship with each element of the second feature array.

図３では、重み行列と等差数列とを使用して、対応配列が導出されている。例えば、図３における上側に表された例では、第２特徴配列の１番目の要素が第１特徴配列の１番目の要素との対応関係にあることを、対応配列が表している。第２特徴配列の２番目の要素が第１特徴配列の２番目の要素との対応関係にあることを、対応配列が表している。第２特徴配列の３番目の要素が第１特徴配列の２番目の要素との対応関係にあることを、対応配列が表している。 In FIG. 3, a corresponding array is derived using a weight matrix and an arithmetic progression. For example, in the example shown on the upper side in FIG. 3, the corresponding array represents that the first element of the second feature array has a correspondence relationship with the first element of the first feature array. The corresponding array represents that the second element of the second feature array has a correspondence with the second element of the first feature array. The correspondence array represents that the third element of the second feature array has a correspondence with the second element of the first feature array.

第２特徴配列の４番目の要素との対応関係にある第１特徴配列の要素の添字は、整数を用いて表されているのではなく、実数を用いて「３．６」と表されている。このような対応配列が使用されることによって、単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いを評価することが可能になる。 The subscripts of the elements of the first feature array, which correspond to the fourth element of the second feature array, are not represented using integers, but are represented as "3.6" using real numbers. There is. By using such a corresponding array, it becomes possible to evaluate the degree to which at least one of the monotonic constraint and the continuity constraint is satisfied.

対応配列を使用して導出される制約関数値は、単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いが大きいほど小さくなる必要がある。なお、勾配法を使用して学習装置２が数理モデルを学習するために、重み行列又は対応配列に対して制約関数値が微分可能であることが望ましい。また、より高速な学習を可能とするために、制約関数値の導出の並列化が容易であることが望ましい。 The constraint function value derived using the corresponding array needs to be smaller as the degree to which at least one of the monotonic constraint and the continuity constraint is satisfied becomes larger. In order for the learning device 2 to learn the mathematical model using the gradient method, it is desirable that the constraint function value is differentiable with respect to the weight matrix or the corresponding array. In addition, it is desirable that parallelization of the derivation of constraint function values is easy in order to enable faster learning.

制約関数値導出部２３は、単調性制約関数値と連続性制約関数値とのうちの少なくとも一方を、制約関数値として導出する。 The constraint function value derivation unit 23 derives at least one of the monotonic constraint function value and the continuity constraint function value as the constraint function value.

＜単調性制約関数値＞
制約関数値導出部２３は、対応配列の各要素について、対応配列の要素の１個前の要素と対応配列の要素との大きさを比較することによって、局所的な単調性制約の関数値（以下「局所単調性制約関数値」という。）を導出する。局所単調性制約関数値は、対応配列の要素の１個前の要素が対応配列の要素よりも大きい場合、これら２個の要素の差の絶対値となる。局所単調性制約関数値は、対応配列の要素の１個前の要素が対応配列の要素以下である場合、０となる。 <Monotonic constraint function value>
The constraint function value derivation unit 23 compares the size of the element immediately before the element of the corresponding array with the element of the corresponding array for each element of the corresponding array, so that the function value of the local monotonic constraint (the function value of the local monotonic constraint) ( Hereinafter, “local monotonic constraint function value”) is derived. The local monotonic constraint function value is the absolute value of the difference between these two elements when the element immediately preceding the element of the corresponding array is larger than the element of the corresponding array. The local monotonicity constraint function value is 0 when the element immediately preceding the element of the corresponding array is equal to or less than the element of the corresponding array.

制約関数値導出部２３は、対応配列における全ての要素に関する全ての局所単調性制約関数値の総和又は平均を導出する。制約関数値導出部２３は、局所単調性制約関数値の総和又は平均を、単調性制約関数値として更新部２４に出力する。 The constraint function value derivation unit 23 derives the sum or average of all the local monotonic constraint function values for all the elements in the corresponding array. The constraint function value derivation unit 23 outputs the sum or average of the local monotonic constraint function values to the update unit 24 as the monotonic constraint function value.

ここで、重み行列は「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。特徴配列の長さは「Ｗ」と表記される。対応配列は「Ｆ∈Ｒ^Ｗ×１」と表記される。「Ｆ」のｉ番目の要素は「ｆ_ｉ」と表記される。従って、単調性制約関数値は、式（６）のように表される。 Here, the weight matrix is expressed as "P ∈ R ^{W × W} ". The length of the feature array is written as "W". The corresponding array is written as "F ∈ R ^{W × 1} ". The _i -th element of "F" is written as "fi". Therefore, the monotonic constraint function value is expressed by Eq. (6).

ここで、「ｆ_０」は０である。畳み込みニューラルネットワークのライブラリを使用して式（６）が実装されることによって、単調性制約関数値がより高速に導出される。 Here, "f ₀ " is 0. By implementing equation (6) using a library of convolutional neural networks, the monotonic constraint function value is derived at a higher speed.

図４は、第１実施形態における、単調性制約関数値の導出例を示す図である。図４における上側には、単調性制約と連続性制約とが満たされた場合について、単調性制約関数値の導出例が表されている。図４における下側には、単調性制約と連続性制約とが満たされていない場合について、単調性制約関数値の導出例が表されている。 FIG. 4 is a diagram showing an example of deriving the monotonic constraint function value in the first embodiment. On the upper side in FIG. 4, an example of deriving the monotonicity constraint function value is shown for the case where the monotonicity constraint and the continuity constraint are satisfied. On the lower side in FIG. 4, an example of deriving the monotonicity constraint function value is shown for the case where the monotonicity constraint and the continuity constraint are not satisfied.

図４には、左側から順に、対応配列の例と、フィルタの例と、対応配列において隣り合う２個の要素の差と、局所単調性制約関数値の例と、単調性制約関数値の例とが表されている。図４において、丸印に「×」の記号は畳み込みを表す。「損失」は単調性制約関数値を表す。対応配列が単調性制約を満たす度合いが大きいほど、より小さい単調性制約関数値が導出される。対応配列が単調性制約を満たす度合いが小さいほど、より大きい単調性制約関数値が導出される。 In FIG. 4, in order from the left side, an example of a corresponding array, an example of a filter, an example of the difference between two adjacent elements in the corresponding array, an example of a local monotonic constraint function value, and an example of a monotonic constraint function value. Is expressed. In FIG. 4, the symbol “x” in the circle indicates convolution. "Loss" represents a monotonic constraint function value. The greater the degree to which the corresponding array satisfies the monotonic constraint, the smaller the monotonic constraint function value is derived. The smaller the degree to which the corresponding array satisfies the monotonic constraint, the larger the monotonic constraint function value is derived.

図４において、対応配列とフィルタ「［１,－１］^Ｔ」との畳み込みの結果として、対応配列において隣り合う２個の要素の差が導出される。制約関数値導出部２３は、隣り合う２個の要素の差の配列に対して、「ＲｅＬＵ」を活性化関数として適用する。このようにして、局所単調性制約関数値が導出される。局所単調性制約関数値の配列における全ての要素の平均が導出されることによって、式（６）のような単調性制約関数値が容易に導出される。 In FIG. 4, as a result of the convolution of the corresponding array and the filter "[1, -1] ^T ", the difference between two adjacent elements in the corresponding array is derived. The constraint function value derivation unit 23 applies "ReLU" as an activation function to an array of differences between two adjacent elements. In this way, the local monotonic constraint function value is derived. By deriving the average of all the elements in the array of local monotonic constraint function values, the monotonic constraint function value as in Eq. (6) can be easily derived.

なお、フィルタは、対応配列において位置が互いに近い２個の要素の差を導出可能な任意のフィルタでよい。例えば、「［１,０,－１］^Ｔ」又は「［２,１,－１,－２］^Ｔ」等のフィルタが、「［１,－１］^Ｔ」の代わりに使用されてもよい。 The filter may be any filter capable of deriving the difference between two elements whose positions are close to each other in the corresponding array. For example, a filter such as "[1,0, -1] ^T " or "[2,1, -1, -2] ^T " may be used in place of "[1, -1] ^T ". ..

＜連続性制約関数値＞
制約関数値導出部２３は、対応配列の各要素について、対応配列の要素の１個前の要素と対応配列の要素との差の絶対値を導出する。制約関数値導出部２３は、所定の正数を、導出された絶対値から減算する。この所定の正数は、ハイパーパラメータであり、例えば、１、２又は３などの正の整数である。「１．５」などの実数がハイパーパラメータとして使用されてもよい。 <Continuity constraint function value>
The constraint function value derivation unit 23 derives the absolute value of the difference between the element immediately before the element of the corresponding array and the element of the corresponding array for each element of the corresponding array. The constraint function value derivation unit 23 subtracts a predetermined positive number from the derived absolute value. This predetermined positive number is a hyperparameter and is a positive integer such as 1, 2 or 3. Real numbers such as "1.5" may be used as hyperparameters.

制約関数値導出部２３は、減算結果の数値と０とのうちの最大値を、局所的な連続性制約の関数値（以下「局所連続性制約関数値」という。）として導出する。制約関数値導出部２３は、対応配列における全ての要素に関する全ての局所連続性制約関数値の総和又は平均を導出する。制約関数値導出部２３は、局所連続性制約関数値の総和又は平均を、連続性制約関数値として更新部２４に出力する。 The constraint function value derivation unit 23 derives the maximum value of the numerical value of the subtraction result and 0 as the function value of the local continuity constraint (hereinafter referred to as “local continuity constraint function value”). The constraint function value derivation unit 23 derives the sum or average of all the local continuity constraint function values for all the elements in the corresponding array. The constraint function value derivation unit 23 outputs the sum or average of the local continuity constraint function values to the update unit 24 as the continuity constraint function value.

重み行列は「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。特徴配列の長さは「Ｗ」と表記される。対応配列は「Ｆ∈Ｒ^Ｗ×１」と表記される。「Ｆ」のｉ番目の要素は「ｆ_ｉ」と表記される。従って、連続性制約関数値は、式（７）のように表される。 The weight matrix is written as "P ∈ R ^{W × W} ". The length of the feature array is written as "W". The corresponding array is written as "F ∈ R ^{W × 1} ". The _i -th element of "F" is written as "fi". Therefore, the continuity constraint function value is expressed as in Eq. (7).

ここで、「ｆ_０」は０である。畳み込みニューラルネットワークのライブラリを使用して式（７）が実装されることによって、連続性制約関数値がより高速に導出される。 Here, "f ₀ " is 0. By implementing the equation (7) using the library of the convolutional neural network, the continuity constraint function value is derived at a higher speed.

図５は、第１実施形態における、連続性制約関数値の導出例を示す図である。図５における上側には、単調性制約と連続性制約とが満たされた場合について、連続性制約関数値の導出例が表されている。図５における下側には、単調性制約と連続性制約とが満たされていない場合について、連続性制約関数値の導出例が表されている。 FIG. 5 is a diagram showing an example of deriving the continuity constraint function value in the first embodiment. On the upper side in FIG. 5, an example of deriving the continuity constraint function value is shown for the case where the monotonicity constraint and the continuity constraint are satisfied. On the lower side in FIG. 5, an example of deriving the continuity constraint function value is shown for the case where the monotonicity constraint and the continuity constraint are not satisfied.

図４には、左側から順に、対応配列の例と、フィルタの例と、所定の正数の例と、対応配列において隣り合う２個の要素の差の絶対値から所定の正数が減算された結果と、局所連続性制約関数値の例と、連続性制約関数値の例とが表されている。図５において、丸印に「×」の記号は畳み込みを表す。「損失」は、連続性制約関数値を表す。 In FIG. 4, in order from the left side, an example of a corresponding array, an example of a filter, an example of a predetermined positive number, and an example of a predetermined positive number are subtracted from the absolute value of the difference between two adjacent elements in the corresponding array. The results, an example of the local continuity constraint function value, and an example of the continuity constraint function value are shown. In FIG. 5, the symbol “x” in the circle indicates convolution. "Loss" represents a continuity constraint function value.

図５において、対応配列とフィルタ「［－１,１］^Ｔ」との畳み込みによって、対応配列において隣り合う２個の要素の差が導出される。制約関数値導出部２３は、隣り合う２個の要素の差の配列における各要素の絶対値を導出する。制約関数値導出部２３は、所定の正数（図５では、１）を、導出された絶対値から減算する。制約関数値導出部２３は、減算結果の配列に対して、「ＲｅＬＵ」を活性化関数として適用する。このようにして、局所連続性制約関数値が導出される。局所連続性制約関数値の配列における全ての要素の平均が導出されることによって、式（７）のような連続性制約関数値が容易に導出される。 In FIG. 5, the convolution of the corresponding array and the filter "[-1,1] ^T " derives the difference between two adjacent elements in the corresponding array. The constraint function value derivation unit 23 derives the absolute value of each element in the array of the differences between two adjacent elements. The constraint function value derivation unit 23 subtracts a predetermined positive number (1 in FIG. 5) from the derived absolute value. The constraint function value derivation unit 23 applies "ReLU" as an activation function to the array of subtraction results. In this way, the local continuity constraint function value is derived. By deriving the average of all the elements in the array of local continuity constraint function values, the continuity constraint function value as in Eq. (7) can be easily derived.

図５に表されているように、対応配列が連続性制約を満たす度合いが大きいほど、より小さい連続性制約関数値が導出される。対応配列が連続性制約を満たす度合いが小さいほど、より大きい連続性制約関数値が導出される。 As shown in FIG. 5, the greater the degree to which the corresponding array satisfies the continuity constraint, the smaller the continuity constraint function value is derived. The smaller the degree to which the corresponding array satisfies the continuity constraint, the larger the continuity constraint function value is derived.

＜更新部２４＞
更新部２４は、目的関数値を目的関数値導出部２２から取得する。更新部２４は、制約関数値を制約関数値導出部２３から取得する。更新部２４は、目的関数値と制約関数値とに基づいて学習処理を実行する。学習処理は、特定の学習処理に限定されない。更新部２４は、制約関数値と目的関数値との加重総和又は加重平均が可能な限り小さくなるように（例えば、最小になるように）、符号化部２０と注意機構２１とを含む数理モデルのパラメータを更新する。更新部２４は、所定の外部装置（不図示）に学習済の数理モデル（学習結果）を出力する。 <Update part 24>
The update unit 24 acquires the objective function value from the objective function value derivation unit 22. The update unit 24 acquires the constraint function value from the constraint function value derivation unit 23. The update unit 24 executes the learning process based on the objective function value and the constraint function value. The learning process is not limited to a specific learning process. The update unit 24 is a mathematical model including the coding unit 20 and the attention mechanism 21 so that the weighted sum or the weighted average of the constraint function value and the objective function value is as small as possible (for example, to be minimized). Update the parameters of. The update unit 24 outputs a trained mathematical model (learning result) to a predetermined external device (not shown).

以上のように、学習段階において、注意機構２１は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、第１特徴配列と第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する。目的関数値導出部２２は、同じクラスに第１配列と第２配列とが属するか否かを表すラベルと第１特徴配列と第２特徴配列とに応じた値である目的関数値を、重み行列に基づいて導出する。制約関数値導出部２３は、単調性制約と連続性制約とのうちの少なくとも一方を表す制約関数値を、重み行列に基づいて導出する。更新部２４は、目的関数値と制約関数値とに基づいて所定の学習処理を実行することによって学習結果を生成する。目的関数値は、例えば、第１特徴配列と第２特徴配列との間の差分又は類似度と、ラベルとに応じた値である。更新部２４は、数理モデルを更新する。 As described above, in the learning stage, the attention mechanism 21 uses the first feature sequence based on the first sequence and the second feature sequence based on the second sequence, and each of the first feature sequence and the second feature sequence is used. Generate a weighting matrix, which is a matrix that represents the probability that the elements are in a correspondence. The objective function value derivation unit 22 weights the label indicating whether or not the first array and the second array belong to the same class, and the objective function value which is a value corresponding to the first feature array and the second feature array. Derived based on the matrix. The constraint function value derivation unit 23 derives a constraint function value representing at least one of a monotonic constraint and a continuity constraint based on a weight matrix. The update unit 24 generates a learning result by executing a predetermined learning process based on the objective function value and the constraint function value. The objective function value is, for example, a value according to the difference or similarity between the first feature array and the second feature array and the label. The update unit 24 updates the mathematical model.

学習段階において更新された数理モデルは、実行段階において推論処理の実行に使用される。実行段階において、注意機構１１は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、第１特徴配列と第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する。照合部１２は、第１特徴配列と第２特徴配列と重み行列とに基づいて、第１配列と第２配列との間の距離を導出する。推論部１３は、距離に基づいて所定の推論処理を実行することによって推論結果を生成する。 The mathematical model updated in the learning stage is used to execute the inference process in the execution stage. At the execution stage, the attention mechanism 11 uses the first feature sequence based on the first sequence and the second feature sequence based on the second sequence, and each element of the first feature sequence and the second feature sequence has a correspondence relationship. Generate a weighting matrix, which is a matrix representing a certain probability. The collation unit 12 derives the distance between the first array and the second array based on the first feature array, the second feature array, and the weight matrix. The inference unit 13 generates an inference result by executing a predetermined inference process based on the distance.

このように、単調性制約と連続性制約とのうちの少なくとも一方を表す制約関数値を用いて学習された数理モデルを用いて符号化部が特徴配列を導出することによって、有効に働く重み行列を注意機構が特徴配列に基づいて生成する。 In this way, a weight matrix that works effectively by the coding unit deriving a feature array using a mathematical model trained using a constraint function value that represents at least one of a monotonic constraint and a continuity constraint. Is generated by the attention mechanism based on the feature array.

これによって、人手によって設計された特徴表現の使用に依存することなく、より複雑な特徴表現を導出及び使用可能であると同時に、単調で連続的な対応関数を導出及び使用可能な配列整列を実現することが可能である。人手によって設計された特徴表現の使用に依存することなく、より複雑な特徴表現を実現することが可能である。また、推論精度の向上と学習時間の短縮とを両立させることが可能である。 This makes it possible to derive and use more complex feature representations without relying on the use of manually designed feature representations, while at the same time deriving and using monotonous and continuous correspondence functions for array alignment. It is possible to do. It is possible to realize more complicated feature representations without relying on the use of manually designed feature representations. In addition, it is possible to improve the inference accuracy and shorten the learning time at the same time.

学習装置２、学習方法及びプログラムによれば、注意機構１１が単調で連続的な対応関数を導出できるように、更新部２４が数理モデルを学習する際に数理モデルの学習を誘導（ガイド）することが可能になる。学習済の数理モデルにおける注意機構１１が使用されることによって、照合又は分類などの応用問題において、配列間の距離又は類似度を正しく導出することが可能である。異なるクラスに属する配列であるか否かを正しく推論することが可能である。また、注意機構１１が十分な性能を提供できるようになるまでの学習時間（数理モデルの学習に必要とされる時間）を短縮することが可能になる。 According to the learning device 2, the learning method and the program, the update unit 24 guides (guides) the learning of the mathematical model when learning the mathematical model so that the attention mechanism 11 can derive a monotonous and continuous correspondence function. Will be possible. By using the attention mechanism 11 in the trained mathematical model, it is possible to correctly derive the distance or similarity between sequences in application problems such as matching or classification. It is possible to correctly infer whether or not the array belongs to a different class. In addition, it is possible to shorten the learning time (time required for learning a mathematical model) until the attention mechanism 11 can provide sufficient performance.

（第２実施形態）
第２実施形態は、音声等の連続データの合成又は変換などの応用問題に学習方法及び推論方法を適用するための実施形態である。音声合成とは、人間の音声を人工的に作り出すことであり、例えば、音声を文章から合成することである。音声変換とは、個人の音声を別の個人又はキャラクタの音声に変換することである。 (Second Embodiment)
The second embodiment is an embodiment for applying a learning method and an inference method to an applied problem such as synthesis or conversion of continuous data such as voice. Speech synthesis is the artificial creation of human speech, for example, the synthesis of speech from text. Speech conversion is the conversion of an individual's speech into the speech of another individual or character.

なお、連続データとなるように不連続データ（例えば、手書き署名）が予め補正されるのであれば、第２実施形態における学習方法及び推論方法を不連続データに対して使うことは可能である。 If the discontinuous data (for example, a handwritten signature) is corrected in advance so as to be continuous data, the learning method and the inference method in the second embodiment can be used for the discontinuous data.

第２実施形態は、学習段階と実行段階とに分けられる。学習段階では、学習装置は、学習データを使用して、多数のパラメータを持つ数理モデルを学習する。学習装置は、数理モデルのパラメータの数値を決定する。実行段階では推論装置は、学習済の数理モデルを使用して、所定の目的（例えば、音声合成、音声変換）のタスクを実行する。 The second embodiment is divided into a learning stage and an execution stage. In the learning stage, the learning device uses the training data to train a mathematical model with many parameters. The learning device determines the numerical values of the parameters of the mathematical model. At the execution stage, the inference device uses the trained mathematical model to perform a task of a predetermined purpose (for example, speech synthesis, speech conversion).

まず、実行段階における、音声合成又は音声変換などの応用問題に適用される推論方法について説明する。 First, an inference method applied to an applied problem such as speech synthesis or speech conversion in the execution stage will be described.

図６は、第２実施形態における、推論装置３の構成例を示す図である。音声合成では、第１配列の要素は、例えば、文章の各単語の特徴を表す数値ベクトルである。文章の各単語の特徴は、例えば、単語のＯｎｅ－Ｈｏｔベクトルである。第２配列の要素は、例えば、音声の各時刻又は各フレームの特徴を表す数値ベクトルである。 FIG. 6 is a diagram showing a configuration example of the inference device 3 in the second embodiment. In speech synthesis, the elements of the first array are, for example, numerical vectors representing the characteristics of each word in the sentence. The characteristic of each word in a sentence is, for example, the One-Hot vector of the word. The element of the second array is, for example, a numerical vector representing the characteristics of each time or each frame of the voice.

音声変換では、第１配列の要素は、例えば、音声の各時刻又は各フレームの特徴を表す数値ベクトルである。音声の各時刻又は各フレームの特徴は、例えば、所定の抽出方法（参考文献１：Masanori Morise, Fumiya Yokomori, Kenji Ozawa, "WORLD: A vocoder-based high-quality speech synthesis system for real-time applications, " IEICE Trans. Inf. Syst. 99-D (7): 1877-1884 (2016)）を用いて抽出された、メルケプストラム係数と対数Ｆ０パターンとを含む多次元ベクトルである。第２配列の要素は、例えば、第１配列の音声の個人とは別の個人又はキャラクタの音声における、各時刻又は各フレームの特徴を表す数値ベクトルである。 In speech conversion, the elements of the first array are, for example, numerical vectors representing the characteristics of each time or frame of speech. The characteristics of each time or frame of speech are, for example, a predetermined extraction method (Reference 1: Masanori Morise, Fumiya Yokomori, Kenji Ozawa, "WORLD: A vocoder-based high-quality speech synthesis system for real-time applications," It is a multidimensional vector containing a Melkeptrum coefficient and a logarithmic F0 pattern extracted using "IEICE Trans. Inf. Syst. 99-D (7): 1877-1884 (2016)). The elements of the second array are, for example, numerical vectors representing the characteristics of each time or each frame in the voice of an individual or character different from the individual voice of the first array.

推論装置３は、第１符号化部３０と、第２符号化部３１と、注意機構３２と、復号化部３３と、推論部３４とを備える。 The inference device 3 includes a first coding unit 30, a second coding unit 31, a caution mechanism 32, a decoding unit 33, and an inference unit 34.

第１符号化部３０は、第１配列を入力として取得する。第１符号化部３０は、第１配列に対する符号化処理を例えば１回だけ実行することによって。第１特徴配列を導出する。第１符号化部３０は、第１特徴配列を注意機構３２と復号化部３３とに出力する。 The first coding unit 30 acquires the first array as an input. The first coding unit 30 executes the coding process for the first array only once, for example. The first feature array is derived. The first coding unit 30 outputs the first feature array to the attention mechanism 32 and the decoding unit 33.

第２符号化部３１は、１個前の時刻における第２配列の要素を、復号化部３３から取得する。第２符号化部３１は、１個前の時刻における第２配列の要素に対する符号化処理を実行することによって、１個前の時刻における第２特徴配列の要素を導出する。第２符号化部３１は、１個前の時刻における第２特徴配列の要素を、注意機構３２に出力する。 The second coding unit 31 acquires the elements of the second array at the previous time from the decoding unit 33. The second coding unit 31 derives the elements of the second feature array at the previous time by executing the coding process for the elements of the second array at the previous time. The second coding unit 31 outputs the element of the second feature array at the previous time to the attention mechanism 32.

注意機構３２は、第１特徴配列を、第１符号化部３０から取得する。注意機構３２は、１個前の時刻における第２特徴配列の要素を、第２符号化部３１から取得する。注意機構３２は、１個前の時刻における第２特徴配列の要素と第１特徴配列の各要素とを使用して、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを導出する。注意機構３２は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを、重み行列として復号化部３３に出力する。 The attention mechanism 32 acquires the first feature array from the first coding unit 30. The attention mechanism 32 acquires the element of the second feature array at the previous time from the second coding unit 31. Attention mechanism 32 uses the elements of the second feature array and each element of the first feature array at the previous time, and the weight of each element of the first feature array with respect to the elements of the second array at the current time. Is derived. The attention mechanism 32 outputs the weight of each element of the first feature array to the element of the second array at the current time as a weight matrix to the decoding unit 33.

復号化部３３は、第１特徴配列を第１符号化部３０から取得する。復号化部３３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを、重み行列として注意機構３２から取得する。復号化部３３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みと、第１特徴配列とに基づいて、現在の時刻における第２配列の要素を導出する。復号化部３３は、現在の時刻における第２配列の要素を、第２符号化部３１と推論部３４とに出力する。なお、復号化部３３は、現在の時刻における第２配列の要素を、所定の外部装置（不図示）に出力してもよい。 The decoding unit 33 acquires the first feature array from the first coding unit 30. The decoding unit 33 acquires the weight of each element of the first feature array with respect to the element of the second array at the current time from the attention mechanism 32 as a weight matrix. The decoding unit 33 derives the elements of the second array at the current time based on the weight of each element of the first feature array with respect to the elements of the second array at the current time and the first feature array. The decoding unit 33 outputs the elements of the second array at the current time to the second coding unit 31 and the inference unit 34. The decoding unit 33 may output the elements of the second array at the current time to a predetermined external device (not shown).

第２符号化部３１は、現在の時刻における第２配列の要素を、復号化部３３から取得する。第２符号化部３１は、現在の時刻における第２配列の要素を使用して、現在の時刻における第２特徴配列の要素を導出する。第２符号化部３１は、現在の時刻における第２特徴配列の要素を、注意機構３２に出力する。 The second coding unit 31 acquires the elements of the second array at the current time from the decoding unit 33. The second coding unit 31 uses the elements of the second array at the current time to derive the elements of the second feature array at the current time. The second coding unit 31 outputs the elements of the second feature array at the current time to the attention mechanism 32.

このように、信号が第２符号化部３１から出発し、注意機構３２と復号化部３３とを信号が経由し、第２符号化部３１に信号が再び戻るという循環が、推論装置３に存在する。最初の時刻において第２配列の要素が初期化されてから、初期化された第２配列の要素が第２符号化部３１に入力され、最後の時刻において第２配列の要素が復号化部３３から出力されるまでの単位時間ごとに、この循環における推論処理が繰り返される。 In this way, the inference device 3 has a cycle in which the signal starts from the second coding unit 31, the signal passes through the attention mechanism 32 and the decoding unit 33, and the signal returns to the second coding unit 31 again. exist. After the elements of the second array are initialized at the first time, the initialized elements of the second array are input to the second coding unit 31, and at the last time, the elements of the second array are the decoding unit 33. The inference processing in this cycle is repeated every unit time from to the output.

注意機構３２は、第２配列の各要素に対する第１特徴配列の各要素の重みを全て含む行列を、重み行列として復号化部３３に出力する。また、復号化部３３は、全ての時刻における第２配列の各要素を、第２配列として推論部３４に出力する。 The attention mechanism 32 outputs a matrix including all the weights of each element of the first feature array to each element of the second array to the decoding unit 33 as a weight matrix. Further, the decoding unit 33 outputs each element of the second array at all times to the inference unit 34 as the second array.

推論部３４は、第２配列を、復号化部３３から取得する。推論部３４は、第２配列に基づいて推論結果を生成する。音声合成又は音声変換等の応用問題では、推論結果は、音声信号である。推論部３４は、所定の外部装置（不図示）に推論結果を出力する。 The inference unit 34 acquires the second array from the decoding unit 33. The inference unit 34 generates an inference result based on the second array. In applied problems such as speech synthesis or speech conversion, the inference result is a speech signal. The inference unit 34 outputs the inference result to a predetermined external device (not shown).

推論装置３の機能部の詳細を説明する。
＜第１符号化部３０＞
第１符号化部３０は、第１配列を入力として取得する。第１符号化部３０は、第１配列を使用して、数値又は数値ベクトルを要素とする配列を、第１特徴配列として導出する。例えば、第１符号化部３０は、参考文献２（Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly,Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ-Skerrv Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu, "Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions," In ICASSP, pp.4779-4783, 2018.）の人工ニューラルネットワークを使用して、第１特徴配列を第１配列から導出する。第１符号化部３０は、人工ニューラルネットワークのパラメータを、学習段階において学習データを使用して決定する。第１符号化部３０は、第１特徴配列を注意機構３２と復号化部３３に出力する。 The details of the functional part of the inference device 3 will be described.
<First coding unit 30>
The first coding unit 30 acquires the first array as an input. The first coding unit 30 uses the first array to derive an array having a numerical value or a numerical vector as an element as a first feature array. For example, reference numeral 2 (Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ-Skerrv Ryan, Rif A) . Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu, "Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions," In ICASSP, pp.4779-4783, 2018.) Derived from one array. The first coding unit 30 determines the parameters of the artificial neural network using the training data in the learning stage. The first coding unit 30 outputs the first feature array to the attention mechanism 32 and the decoding unit 33.

第１符号化部３０の処理の詳細は、以下の通りである。
第１配列は、例えば、「1×Ｎ×５１２」のテンソルである。「Ｎ」は配列の長さを表す。「５１２」は、配列の要素の次元数の例である。第１符号化部３０は、第１配列を人工ニューラルネットワークに入力する。 The details of the processing of the first coding unit 30 are as follows.
The first sequence is, for example, a "1 x N x 512" tensor. "N" represents the length of the array. "512" is an example of the number of dimensions of the elements of the array. The first coding unit 30 inputs the first array to the artificial neural network.

人工ニューラルネットワークは、例えば、３個の「１×５×５１２」の畳み込み層と、１個の双方向長短期記憶（Bidirectional Long Short-Term Memory : BiLSTM）（以下「双方向ＬＳＴＭ」という。）とを備える。各畳み込み層の直後にバッチ正規化層が備えられる。バッチ正規化層の直後において、活性化関数としてＲｅＬＵ層が備えられる。双方向ＬＳＴＭは、合計５１２個の隠れユニットを有する。第１符号化部３０の双方向ＬＳＴＭは、数値又は数値ベクトルを要素とする配列を第１特徴配列として、注意機構３２と復号化部３３とに出力する。 The artificial neural network has, for example, three "1 x 5 x 512" convolution layers and one bidirectional long short-term memory (BiLSTM) (hereinafter referred to as "bidirectional LSTM"). And. Immediately after each convolution layer is a batch normalization layer. Immediately after the batch normalization layer, a ReLU layer is provided as an activation function. The bidirectional LSTM has a total of 512 hidden units. The bidirectional LSTM of the first coding unit 30 outputs an array having a numerical value or a numerical vector as an element to the attention mechanism 32 and the decoding unit 33 as a first feature array.

＜第２符号化部３１＞
第２符号化部３１は、第２配列を復号化部３３から取得する。１個前の時刻における第２配列の要素を、復号化部３３から取得する。第２符号化部３１は、１個前の時刻における第２配列の要素を使用して、１個前の時刻における第２特徴配列の要素として、数値又は数値ベクトルを導出する。数値又は数値ベクトルの導出には、例えば、上述の参考文献２の人工ニューラルネットワークを使用することができる。人工ニューラルネットワークのパラメータは、学習段階で学習データを使用して決定される。第２符号化部３１は、第２特徴配列を注意機構３２に出力する。 <Second coding unit 31>
The second coding unit 31 acquires the second sequence from the decoding unit 33. The element of the second array at the time immediately before is acquired from the decoding unit 33. The second coding unit 31 uses the elements of the second array at the previous time to derive a numerical value or a numerical vector as an element of the second feature array at the previous time. For the derivation of the numerical value or the numerical value vector, for example, the artificial neural network of Reference 2 described above can be used. The parameters of the artificial neural network are determined using the training data at the training stage. The second coding unit 31 outputs the second feature array to the attention mechanism 32.

第２符号化部３１の処理の詳細は、以下の通りである。
１個前の時刻における第２配列の各要素は、例えば、５１２次元の数値ベクトルである。第２符号化部３１は、１個前の時刻における第２配列の各要素を、人工ニューラルネットワークに入力する。この人工ニューラルネットワークは、例えば、２個の全結合層を備える。各全結合層は２５６個の隠れユニットを有する。各全結合層の直後には、活性化関数としてＲｅＬＵ層が備えられる。最後の全結合層は、１個前の時刻における第２特徴配列の要素として、数値又は数値ベクトルを注意機構３２に出力する。 The details of the processing of the second coding unit 31 are as follows.
Each element of the second array at the previous time is, for example, a 512-dimensional numerical vector. The second coding unit 31 inputs each element of the second array at the previous time to the artificial neural network. This artificial neural network includes, for example, two fully connected layers. Each fully connected layer has 256 hidden units. Immediately after each fully connected layer, a ReLU layer is provided as an activation function. The last fully connected layer outputs a numerical value or a numerical vector to the attention mechanism 32 as an element of the second feature array at the previous time.

＜注意機構３２＞
注意機構３２は、第１特徴配列を第１符号化部３０から取得する。注意機構３２は、第２特徴配列を第２符号化部３１から取得する。注意機構３２は、１個前の時刻における第２特徴配列の要素と、第１特徴配列の各要素とを使用して、現在の時刻に対する第２配列の要素に対する第１特徴配列の各要素の重みを導出する。注意機構３２として、例えば、人工ニューラルネットワークが使用されてもよいし、人工ニューラルネットワーク以外の数理モデル（例えば、線形回帰モデル、多項式回帰モデル、ロジスティック回帰モデル）が使用されてもよい。人工ニューラルネットワークのパラメータは、学習段階において、学習データを使用して決定される。注意機構３２は、重み行列を復号化部３３に出力する。 <Caution mechanism 32>
The attention mechanism 32 acquires the first feature array from the first coding unit 30. The attention mechanism 32 acquires the second feature array from the second coding unit 31. Attention mechanism 32 uses the elements of the second feature array at the previous time and each element of the first feature array to represent each element of the first feature array with respect to the elements of the second array with respect to the current time. Derive the weight. As the attention mechanism 32, for example, an artificial neural network may be used, or a mathematical model other than the artificial neural network (for example, a linear regression model, a polynomial regression model, a logistic regression model) may be used. The parameters of the artificial neural network are determined using the training data at the training stage. The attention mechanism 32 outputs the weight matrix to the decoding unit 33.

注意機構３２の処理の詳細は、以下の通りである。
注意機構３２は、１個前の時刻における第２特徴配列の要素である数値ベクトルと、第１特徴配列の各要素である数値ベクトルとを、数値ベクトルの次元方向に沿って連結する。注意機構３２は、連結された数値ベクトルを、人工ニューラルネットワークに入力する。人工ニューラルネットワークは、例えば、３個の全結合層を備える。３個の全結合層において、１個目の全結合層が６４個の隠れユニットを有し、２個目の全結合層が１６個の隠れユニットを有し、３個目の全結合層が１個の隠れユニットを有する。１個目の全結合層の直後において、活性化関数としてＲｅＬＵ層が備えられる。２個目の全結合層の直後において、活性化関数としてＲｅＬＵ層が備えられる。３個目の全結合層は、１個の実数を出力する。 The details of the processing of the attention mechanism 32 are as follows.
Attention mechanism 32 connects the numerical vector which is an element of the second feature array at the previous time and the numerical vector which is each element of the first feature array along the dimensional direction of the numerical vector. Attention mechanism 32 inputs the concatenated numerical vectors into the artificial neural network. The artificial neural network includes, for example, three fully connected layers. Of the three fully bonded layers, the first fully bonded layer has 64 hidden units, the second fully bonded layer has 16 hidden units, and the third fully bonded layer has 16 hidden units. It has one hidden unit. Immediately after the first fully connected layer, a ReLU layer is provided as an activation function. Immediately after the second fully connected layer, a ReLU layer is provided as an activation function. The third fully connected layer outputs one real number.

注意機構３２は、１個前の時刻における第２特徴配列の要素と第１特徴配列の各要素とを使用して導出された実数を全て含む配列を、Ｓｏｆｔｍａｘ関数によって正規化する。この導出された実数を全て含む配列とは、第１特徴配列の各要素に対して出力された実数を配列としてまとめたものである。導出された実数を全て含む配列は、第１特徴配列の要素数と同じ数の実数を含む。注意機構３２は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みとして、正規化された実数を導出する。注意機構３２は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを全て含む行列を、重み行列として復号化部３３に出力する。 Attention mechanism 32 normalizes an array containing all the real numbers derived by using the elements of the second feature array and each element of the first feature array at the previous time by the Softmax function. The array including all the derived real numbers is a collection of the real numbers output for each element of the first feature array as an array. An array containing all the derived real numbers contains the same number of real numbers as the number of elements in the first feature array. Attention mechanism 32 derives a normalized real number as the weight of each element of the first feature array with respect to the elements of the second array at the current time. The attention mechanism 32 outputs a matrix including all the weights of each element of the first feature array to the elements of the second array at the current time as a weight matrix to the decoding unit 33.

＜復号化部３３＞
復号化部３３は、第１特徴配列を第１符号化部３０から取得する。復号化部３３は、重み行列を注意機構３２から取得する。復号化部３３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを使用して、第１特徴配列の各要素に対して重み付けを実行する。復号化部３３は、重み付けによって得られた数値又は数値ベクトルを使用して、現在の時刻における第２配列の要素を導出する。例えば、復号化部３３は、上述の参考文献２の人工ニューラルネットワークを使用して、現在の時刻における第２配列の要素を導出する。復号化部３３は、人工ニューラルネットワークのパラメータを、学習段階において学習データを使用して決定する。復号化部３３は、第２配列を推論部３４に出力する。 <Decoding unit 33>
The decoding unit 33 acquires the first feature array from the first coding unit 30. The decoding unit 33 acquires the weight matrix from the attention mechanism 32. The decoding unit 33 performs weighting for each element of the first feature array by using the weight of each element of the first feature array for the elements of the second array at the current time. The decoding unit 33 uses the numerical value or the numerical value vector obtained by weighting to derive the elements of the second array at the current time. For example, the decoding unit 33 uses the artificial neural network of Reference 2 described above to derive the elements of the second array at the current time. The decoding unit 33 determines the parameters of the artificial neural network using the learning data in the learning stage. The decoding unit 33 outputs the second array to the inference unit 34.

復号化部３３の処理の詳細は、以下の通りである。
復号化部３３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを用いて、第１特徴配列の全ての要素の加重総和を導出する。これによって、現在の時刻における第２配列の要素との対応関係にある第１特徴配列の要素が、加重総和として特定（抽出又は生成）される。すなわち、現在の時刻における第２配列の要素との対応関係にある第１特徴配列の要素が整列される。従って、第１配列と第２配列との間に存在する局所的な変移と速度の変化とに関する非線形の時間変動が補償される。 The details of the processing of the decoding unit 33 are as follows.
The decoding unit 33 derives the weighted sum of all the elements of the first feature array by using the weight of each element of the first feature array with respect to the elements of the second array at the current time. As a result, the elements of the first feature array that correspond to the elements of the second array at the current time are specified (extracted or generated) as the weighted sum. That is, the elements of the first feature array that correspond to the elements of the second array at the current time are aligned. Therefore, the non-linear time variation with respect to the local transition and the change in velocity existing between the first sequence and the second sequence is compensated.

ここで、第１特徴配列は「Ｘ∈Ｒ^Ｗ×Ｋ」と表記される。重み行列は「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを全て含む行ベクトルは「ｐ_ｉ∈Ｒ^１×Ｗ」と表記される。現在の時刻は「ｉ」と表記される。現在の時刻における第２配列の要素に対する第１特徴配列の全ての要素の加重総和は、「ｐ_ｉＸ」と表記される。 Here, the first feature array is expressed as "X ∈ R ^{W × K} ". The weight matrix is written as "P ∈ R ^{W × W} ". The row vector containing all the weights of each element of the first feature array with respect to the elements of the second array at the current time is expressed as " _pi ∈ R ^{1 × W} ". The current time is written as "i". The weighted sum of all the elements of the first feature array with respect to the elements of the second array at the current time is expressed as " _pi X".

加重総和は、例えば、１２８次元の数値ベクトルである。復号化部３３は、この数値ベクトルを、人工ニューラルネットワークに入力する。人工ニューラルネットワークは、例えば、２個の双方向ＬＳＴＭと１個の全結合層とを備える。各双方向ＬＳＴＭは、１０２４個の隠れユニットを有する。全結合層は、数値又は数値ベクトルを、現在の時刻における第２配列の要素として推論部３４に出力する。 The weighted sum is, for example, a 128-dimensional numerical vector. The decoding unit 33 inputs this numerical vector into the artificial neural network. The artificial neural network includes, for example, two bidirectional LSTMs and one fully connected layer. Each bidirectional LSTM has 1024 hidden units. The fully connected layer outputs a numerical value or a numerical vector to the inference unit 34 as an element of the second array at the current time.

なお、復号化部３３は、第２符号化部３１から出力された第２特徴配列と、第１特徴配列と、重み行列とを使用して、第２配列を導出してもよい。この場合、復号化部３３は、加重総和である数値ベクトルと、１個前の時刻における第２特徴配列の要素である数値ベクトルとを、数値ベクトルの次元方向に沿って連結する。復号化部３３は、連結された数値ベクトルを、人工ニューラルネットワークに入力する。 The decoding unit 33 may derive the second array by using the second feature array, the first feature array, and the weight matrix output from the second coding unit 31. In this case, the decoding unit 33 concatenates the numerical vector which is the weighted sum and the numerical vector which is the element of the second feature array at the previous time along the dimension direction of the numerical vector. The decoding unit 33 inputs the concatenated numerical vectors to the artificial neural network.

＜推論部３４＞
推論部３４は、第２配列を復号化部３３から取得する。推論部３４は、第２配列に基づいて推論結果を生成する。音声合成又は音声変換等の応用問題では、推論結果は、音声信号である。推論部３４は、例えば、所定の生成方法（参考文献３：Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W. Senior, Koray Kavukcuoglu, "WaveNet: A generative model for raw audio, " SSW 2016: 125.）を用いて、第２配列に基づいて音声信号を生成する。推論部３４は、所定の外部装置（不図示）に推論結果を出力する。 <Inference unit 34>
The inference unit 34 acquires the second array from the decoding unit 33. The inference unit 34 generates an inference result based on the second array. In applied problems such as speech synthesis or speech conversion, the inference result is a speech signal. The inference unit 34 is, for example, a predetermined generation method (Reference 3: Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W. Senior, Koray Kavukcuoglu, "WaveNet: A generative model for raw audio, "SSW 2016: 125.) is used to generate an audio signal based on the second array. The inference unit 34 outputs the inference result to a predetermined external device (not shown).

次に、学習段階における、音声合成又は音声変換などの応用問題に適用される学習方法について説明する。 Next, a learning method applied to an applied problem such as speech synthesis or speech conversion in the learning stage will be described.

図７は、第２実施形態における、学習装置４の構成例を示す図である。第２実施形態の学習段階では、音声合成又は音声変換などの応用問題に学習方法が適用される。学習装置４は、第１配列と正解配列とを入力として取得する。学習装置４は、目的関数値と制約関数値とを導出する。学習装置４は、目的関数値と制約関数値とに基づいて数理モデルを学習し、学習済の数理モデル（学習結果）を、所定の外部装置（不図示）に出力する。また、学習装置４は、学習済の数理モデルを、実行段階よりも前に推論装置３に出力する。 FIG. 7 is a diagram showing a configuration example of the learning device 4 in the second embodiment. In the learning stage of the second embodiment, the learning method is applied to an applied problem such as speech synthesis or speech conversion. The learning device 4 acquires the first array and the correct answer array as inputs. The learning device 4 derives the objective function value and the constraint function value. The learning device 4 learns a mathematical model based on the objective function value and the constraint function value, and outputs the trained mathematical model (learning result) to a predetermined external device (not shown). Further, the learning device 4 outputs the trained mathematical model to the inference device 3 before the execution stage.

第１配列と正解配列とは、所定の目的（例えば、音声合成又は音声変換）のタスクを実行するための数理モデルを学習するために使用される学習データである。目的関数値と制約関数値とは、数理モデルを学習装置４が学習するために使用される。例えば、多数の学習データを使用して導出された目的関数値と制約関数値との加重総和又は加重平均が可能な限り小さくなるように（例えば、最小になるように）、学習装置４は、数理モデルのパラメータを更新する。学習データの数が多いほど、数理モデルの性能が向上する。学習データの数は、例えば、２万から３万程度である。 The first array and the correct array are training data used to train a mathematical model for performing a task of a predetermined purpose (for example, speech synthesis or speech conversion). The objective function value and the constraint function value are used for the learning device 4 to learn the mathematical model. For example, the learning device 4 is designed so that the weighted sum or weighted average of the objective function value and the constraint function value derived using a large number of training data is as small as possible (for example, to be the minimum). Update the parameters of the mathematical model. The larger the number of training data, the better the performance of the mathematical model. The number of training data is, for example, about 20,000 to 30,000.

学習装置４は、第１符号化部４０と、第２符号化部４１と、注意機構４２と、復号化部４３と、目的関数値導出部４４と、制約関数値導出部４５と、更新部４６とを備える。 The learning device 4 includes a first coding unit 40, a second coding unit 41, an attention mechanism 42, a decoding unit 43, an objective function value derivation unit 44, a constraint function value derivation unit 45, and an update unit. It is equipped with 46.

第１符号化部４０は、第１配列を入力として取得する。第１符号化部４０は、第１配列に対する符号化処理を例えば１回だけ実行することによって。第１特徴配列を導出する。第１符号化部４０は、第１特徴配列を注意機構４２と復号化部４３とに出力する。 The first coding unit 40 acquires the first array as an input. The first coding unit 40 executes the coding process for the first array only once, for example. The first feature array is derived. The first coding unit 40 outputs the first feature array to the attention mechanism 42 and the decoding unit 43.

第２符号化部４１は、１個前の時刻における第２配列の要素を、復号化部４３から取得する。第２符号化部４１は、１個前の時刻における第２配列の要素に対する符号化処理を実行することによって、１個前の時刻における第２特徴配列の要素を導出する。 The second coding unit 41 acquires the elements of the second array at the previous time from the decoding unit 43. The second coding unit 41 derives the elements of the second feature array at the previous time by executing the coding process for the elements of the second array at the previous time.

注意機構４２は、第１特徴配列を、第１符号化部４０から取得する。注意機構４２は、１個前の時刻における第２特徴配列の要素を、第２符号化部４１から取得する。注意機構４２は、１個前の時刻における第２特徴配列の要素と第１特徴配列の各要素とを使用して、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを導出する。注意機構３２は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを、重み行列として復号化部４３に出力する。 The attention mechanism 42 acquires the first feature array from the first coding unit 40. The attention mechanism 42 acquires the element of the second feature array at the previous time from the second coding unit 41. Attention mechanism 42 uses the elements of the second feature array and each element of the first feature array at the previous time, and the weight of each element of the first feature array with respect to the elements of the second array at the current time. Is derived. The attention mechanism 32 outputs the weight of each element of the first feature array to the element of the second array at the current time as a weight matrix to the decoding unit 43.

復号化部４３は、第１特徴配列を第１符号化部４０から取得する。復号化部４３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを、重み行列として注意機構４２から取得する。復号化部４３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みと、第１特徴配列とに基づいて、現在の時刻における第２配列の要素を導出する。復号化部４３は、現在の時刻における第２配列の要素を、第２符号化部４１と目的関数値導出部４４とに出力する。 The decoding unit 43 acquires the first feature array from the first coding unit 40. The decoding unit 43 acquires the weight of each element of the first feature array with respect to the element of the second array at the current time from the attention mechanism 42 as a weight matrix. The decoding unit 43 derives the elements of the second array at the current time based on the weight of each element of the first feature array with respect to the elements of the second array at the current time and the first feature array. The decoding unit 43 outputs the elements of the second array at the current time to the second coding unit 41 and the objective function value derivation unit 44.

第２符号化部４１は、現在の時刻における第２配列の要素を、復号化部４３から取得する。第２符号化部４１は、現在の時刻における第２配列の要素を使用して、現在の時刻における第２特徴配列の要素を導出する。第２符号化部４１は、現在の時刻における第２特徴配列の要素を、注意機構４２に出力する。 The second coding unit 41 acquires the elements of the second array at the current time from the decoding unit 43. The second coding unit 41 uses the elements of the second array at the current time to derive the elements of the second feature array at the current time. The second coding unit 41 outputs the elements of the second feature array at the current time to the attention mechanism 42.

このように、信号が第２符号化部４１から出発し、注意機構４２と復号化部４３とを信号が経由し、第２符号化部４１に信号が再び戻るという循環が、学習装置４に存在する。この循環では、最初の時刻において第２配列の要素が初期化されてから、初期化された第２配列の要素が第２符号化部４１に入力され、最後の時刻において第２配列の要素が復号化部４３から出力されるまでの単位時間ごとに、学習処理が繰り返される。 In this way, the learning device 4 has a cycle in which the signal starts from the second coding unit 41, passes through the attention mechanism 42 and the decoding unit 43, and returns to the second coding unit 41 again. exist. In this cycle, the elements of the second array are initialized at the first time, then the initialized elements of the second array are input to the second coding unit 41, and the elements of the second array are input at the last time. The learning process is repeated every unit time until the output is output from the decoding unit 43.

注意機構４２は、第２配列の各要素に対する第１特徴配列の各要素の重みを全て含む行列を、重み行列として復号化部４３に出力する。また、復号化部４３は、全ての時刻における第２配列の各要素を、第２配列として第２符号化部４１と目的関数値導出部４４とに出力する。 The attention mechanism 42 outputs a matrix including all the weights of each element of the first feature array to each element of the second array to the decoding unit 43 as a weight matrix. Further, the decoding unit 43 outputs each element of the second array at all times to the second coding unit 41 and the objective function value derivation unit 44 as the second array.

目的関数値導出部４４は、正解配列を入力として取得する。目的関数値導出部４４は、第２配列を復号化部４３から取得する。目的関数値導出部４４は、正解配列と第２配列とに基づいて、目的関数値を導出する。目的関数値導出部４４が目的関数値を導出する処理は、例えば１回だけ実行される。目的関数値導出部４４は、目的関数値を更新部４６に出力する。 The objective function value derivation unit 44 acquires the correct answer array as an input. The objective function value derivation unit 44 acquires the second array from the decoding unit 43. The objective function value derivation unit 44 derives the objective function value based on the correct answer array and the second array. The process of deriving the objective function value by the objective function value deriving unit 44 is executed only once, for example. The objective function value derivation unit 44 outputs the objective function value to the update unit 46.

制約関数値導出部４５は、重み行列を注意機構４２から取得する。制約関数値導出部４５は、重み行列を使用して、制約関数値を導出する。制約関数値導出部４５が制約関数値を導出する処理は、例えば１回だけ実行される。制約関数値導出部４５は、制約関数値を更新部４６に出力する。 The constraint function value derivation unit 45 acquires the weight matrix from the attention mechanism 42. The constraint function value derivation unit 45 derives the constraint function value using the weight matrix. The process of deriving the constraint function value by the constraint function value deriving unit 45 is executed only once, for example. The constraint function value derivation unit 45 outputs the constraint function value to the update unit 46.

更新部４６は、目的関数値を目的関数値導出部４４から取得する。更新部４６は、制約関数値を制約関数値導出部４５から取得する。更新部４６は、目的関数値と制約関数値とに基づいて学習処理を実行する。更新部４６は、制約関数値と目的関数値との加重総和又は加重平均が可能な限り小さくなるように（例えば、最小になるように）、第１符号化部４０と第２符号化部４１と注意機構４２と復号化部４３とを含む数理モデルを更新する。更新部４６は、所定の外部装置（不図示）に、学習済の数理モデル（学習結果）を出力する。 The update unit 46 acquires the objective function value from the objective function value derivation unit 44. The update unit 46 acquires the constraint function value from the constraint function value derivation unit 45. The update unit 46 executes the learning process based on the objective function value and the constraint function value. The update unit 46 has a first coding unit 40 and a second coding unit 41 so that the weighted sum or the weighted average of the constraint function value and the objective function value is as small as possible (for example, to be the minimum). The mathematical model including the attention mechanism 42 and the decoding unit 43 is updated. The update unit 46 outputs a trained mathematical model (learning result) to a predetermined external device (not shown).

学習装置４の機能部の詳細を説明する。
＜第１符号化部４０＞
第１符号化部４０は、第１配列を入力として取得する。学習段階における第１符号化部４０が実行する処理は、実行段階における第１符号化部３０が実行する処理と同じである。第１符号化部４０は、第１特徴配列を注意機構４２と復号化部４３に出力する。 The details of the functional unit of the learning device 4 will be described.
<First coding unit 40>
The first coding unit 40 acquires the first array as an input. The process executed by the first coding unit 40 in the learning stage is the same as the process executed by the first coding unit 30 in the execution stage. The first coding unit 40 outputs the first feature array to the attention mechanism 42 and the decoding unit 43.

＜第２符号化部４１＞
第２符号化部４１は、第２配列を復号化部４３から取得し、第２特徴配列を注意機構４２に出力する。学習段階における第２符号化部４１の処理は、実行段階における第２符号化部３１の処理と同じである。なお、学習段階における第２符号化部４１は、第２配列を入力として使用する代わりに、正解配列を入力として使用してもよい。この場合、第２配列に対して実行される全ての処理は、第２配列の代わりに使用される正解配列に対して実行される。 <Second coding unit 41>
The second coding unit 41 acquires the second sequence from the decoding unit 43 and outputs the second feature sequence to the attention mechanism 42. The processing of the second coding unit 41 in the learning stage is the same as the processing of the second coding unit 31 in the execution stage. The second coding unit 41 in the learning stage may use the correct answer sequence as an input instead of using the second array as an input. In this case, all the processing performed on the second array is performed on the correct array used in place of the second array.

＜注意機構４２＞
注意機構４２は、第１特徴配列を第１符号化部４０から取得する。注意機構４２は、第２特徴配列を第２符号化部４１から取得する。学習段階における注意機構４２の処理は、実行段階における注意機構３２の処理と同じである。注意機構４２は、重み行列を復号化部４３と制約関数値導出部４５とに出力する。 <Caution mechanism 42>
The attention mechanism 42 acquires the first feature array from the first coding unit 40. The attention mechanism 42 acquires the second feature array from the second coding unit 41. The processing of the attention mechanism 42 in the learning stage is the same as the processing of the attention mechanism 32 in the execution stage. The attention mechanism 42 outputs the weight matrix to the decoding unit 43 and the constraint function value derivation unit 45.

＜復号化部４３＞
復号化部４３は、第１特徴配列を第１符号化部４０から取得する。復号化部４３は、重み行列を注意機構４２から取得する。学習段階における復号化部４３の処理は、実行段階における復号化部３３の処理と同じである。復号化部４３は、第２配列を目的関数値導出部４４に出力する。 <Decoding unit 43>
The decoding unit 43 acquires the first feature array from the first coding unit 40. The decoding unit 43 acquires the weight matrix from the attention mechanism 42. The processing of the decoding unit 43 in the learning stage is the same as the processing of the decoding unit 33 in the execution stage. The decoding unit 43 outputs the second array to the objective function value derivation unit 44.

＜目的関数値導出部４４＞
目的関数値導出部４４は、正解配列を入力として取得する。目的関数値導出部４４は、第２配列を復号化部４３から取得する。目的関数値導出部４４は、正解配列と第２配列との間の差分を導出する。目的関数値導出部４４は、導出された差分が大きいほど値が大きくなるような目的関数値を導出する。目的関数値導出部４４は、目的関数値を更新部４６に出力する。 <Objective function value derivation unit 44>
The objective function value derivation unit 44 acquires the correct answer array as an input. The objective function value derivation unit 44 acquires the second array from the decoding unit 43. The objective function value derivation unit 44 derives the difference between the correct answer array and the second array. The objective function value derivation unit 44 derives the objective function value so that the larger the derived difference is, the larger the value is. The objective function value derivation unit 44 outputs the objective function value to the update unit 46.

目的関数値導出部４４の処理の詳細は、以下の通りである。
目的関数値導出部４４は、例えば、正解配列と第２配列との間の残差平方和（類似度）を、目的関数値として導出する。ここで、正解配列は「Ｚ^＊」と表記される。第２配列は「Ｚ」と表記される。従って、目的関数値は、式（８）のように表される。 The details of the processing of the objective function value derivation unit 44 are as follows.
The objective function value derivation unit 44 derives, for example, the residual sum of squares (similarity) between the correct answer array and the second array as the objective function value. Here, the correct answer sequence is written as "Z ^* ". The second sequence is written as "Z". Therefore, the objective function value is expressed as in Eq. (8).

ここで、「||・||」は、Ｌ２ノルムを表す。 Here, "|| · ||" represents the L2 norm.

＜制約関数値導出部４５＞
制約関数値導出部４５は、重み行列を注意機構４２から取得する。制約関数値導出部４５は、重み行列を使用して、制約関数値を導出する。ここで、単調性制約と連続性制約とのうちの少なくとも一方を満たす度合いが大きいほど、制約関数値が小さくなるように、制約関数値は導出される。制約関数値導出部４５は、制約関数値を更新部４６に出力する。 <Constraint function value derivation unit 45>
The constraint function value derivation unit 45 acquires the weight matrix from the attention mechanism 42. The constraint function value derivation unit 45 derives the constraint function value using the weight matrix. Here, the constraint function value is derived so that the greater the degree to which at least one of the monotonic constraint and the continuity constraint is satisfied, the smaller the constraint function value. The constraint function value derivation unit 45 outputs the constraint function value to the update unit 46.

制約関数値が最小化されることによって、第１特徴配列の各要素と第２配列の各要素との間の対応関係が単調性制約と連続性制約とのうちの少なくとも一方を満たすという重み行列を導出するように数理モデルは学習される。この数理モデルは、第１符号化部４０と、第２符号化部４１と、注意機構４２と、復号化部４３とを含む。 A weight matrix in which the correspondence between each element of the first feature array and each element of the second array satisfies at least one of the monotonic constraint and the continuity constraint by minimizing the constraint function value. The mathematical model is trained to derive. This mathematical model includes a first coding unit 40, a second coding unit 41, an attention mechanism 42, and a decoding unit 43.

制約関数値導出部４５の処理の詳細は、以下の通りである。
重み行列とは、第１特徴配列の各要素と第２配列の各要素とが対応関係にある確率を表す行列である。重み行列は、対応関係そのものではない。従って、単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いを、重み行列からは直接評価することができない。 The details of the processing of the constraint function value derivation unit 45 are as follows.
The weight matrix is a matrix that represents the probability that each element of the first feature array and each element of the second array have a correspondence relationship. The weight matrix is not the correspondence itself. Therefore, the degree to which at least one of the monotonic constraint and the continuity constraint is satisfied cannot be directly evaluated from the weight matrix.

単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いを評価することができるようになるためには、重み行列が変換される必要がある。例えば、第２配列の各要素の時刻を独立変数とし、第２配列の各要素の時刻との対応関係にある第１特徴配列の要素の添字を従属変数とした関数（対応関数）のような形に、重み行列が変換される必要がある。このために、制約関数値導出部４５は、重み行列と所定の等差数列との積を、対応配列として導出する。等差数列とは、各項（各要素）がその直前の項（要素）に一定数（公差）を加えて得られる数列である。 The weight matrix needs to be transformed in order to be able to evaluate the degree to which at least one of the monotonicity constraint and the continuity constraint is satisfied. For example, a function (correspondence function) in which the time of each element of the second array is an independent variable and the subscript of the element of the first feature array having a correspondence with the time of each element of the second array is a dependent variable. The weight matrix needs to be transformed into a form. For this purpose, the constraint function value derivation unit 45 derives the product of the weight matrix and the predetermined arithmetic progression as a corresponding array. Arithmetic progression is a sequence obtained by adding a certain number (tolerance) to the term (element) immediately before each term (each element).

例えば図３では、「［１,２,３,４］^Ｔ」が等差数列である。等差数列を用いて導出された対応配列において、対応配列の添字は第２配列の各要素の時刻を表す。対応配列の要素である数値は、第２配列の各要素との対応関係にある第１特徴配列の要素の添字又は添字に比例する数値を表す。図３における上側に表された例では、第２配列の１番目の要素が、第１特徴配列の１番目の要素との対応関係にある。第２配列の２番目の要素が第１特徴配列の２番目の要素との対応関係にある。第２配列の３番目の要素が、第１特徴配列の２番目の要素との対応関係にあることを、対応配列が表している。第２配列の４番目の要素との対応関係にある第１特徴配列の要素の添字は、整数を用いて表されているのではなく、実数を用いて「３．６」と表されている。このような対応配列が使用されることによって、単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いを評価することが可能になる。 For example, in FIG. 3, "[1,2,3,4] ^T " is an arithmetic progression. In the corresponding array derived using the arithmetic progression, the subscript of the corresponding array represents the time of each element of the second array. The numerical value that is an element of the corresponding array represents a subscript or a numerical value proportional to the subscript of the element of the first feature array that has a corresponding relationship with each element of the second array. In the example shown on the upper side in FIG. 3, the first element of the second array has a correspondence relationship with the first element of the first feature array. The second element of the second array has a correspondence with the second element of the first feature array. The correspondence array indicates that the third element of the second array has a correspondence relationship with the second element of the first feature array. The subscripts of the elements of the first feature array that correspond to the fourth element of the second array are not represented using integers, but are represented as "3.6" using real numbers. .. By using such a corresponding array, it becomes possible to evaluate the degree to which at least one of the monotonic constraint and the continuity constraint is satisfied.

対応配列を使用して導出される制約関数値は、単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いが大きいほど値が小さくなる必要がある。なお、勾配法を使用して学習装置４が数理モデルを学習するために、重み行列又は対応配列に対して制約関数値が微分可能であることが望ましい。また、より高速な学習を可能にするために、制約関数値の導出の並列化が容易であることが望ましい。 The constraint function value derived using the corresponding array needs to become smaller as the degree to which at least one of the monotonic constraint and the continuity constraint is satisfied is larger. In order for the learning device 4 to learn the mathematical model using the gradient method, it is desirable that the constraint function value is differentiable with respect to the weight matrix or the corresponding array. In addition, it is desirable that parallelization of the derivation of constraint function values is easy in order to enable faster learning.

制約関数値導出部４５は、単調性制約関数値と連続性制約関数値とのうちの少なくとも一方を、制約関数値として導出する。 The constraint function value derivation unit 45 derives at least one of the monotonic constraint function value and the continuity constraint function value as the constraint function value.

＜単調性制約関数値＞
第２実施形態における単調性制約関数値に関する説明は、第１実施形態における単調性制約関数値に関する説明と同様である。 <Monotonic constraint function value>
The description of the monotonic constraint function value in the second embodiment is the same as the description of the monotonic constraint function value in the first embodiment.

＜連続性制約関数値＞
第２実施形態における連続性制約関数値に関する説明は、第１実施形態における連続性制約関数値に関する説明と同様である。 <Continuity constraint function value>
The description of the continuity constraint function value in the second embodiment is the same as the description of the continuity constraint function value in the first embodiment.

＜更新部４６＞
更新部４６は、目的関数値を目的関数値導出部４４から取得する。更新部４６は、制約関数値を制約関数値導出部４５から取得する。更新部４６は、目的関数値と制約関数値とに基づいて学習処理を実行する。更新部４６は、所定の外部装置（不図示）に、学習済の数理モデル（学習結果）を出力する。学習処理は、特定の学習処理に限定されない。 <Update part 46>
The update unit 46 acquires the objective function value from the objective function value derivation unit 44. The update unit 46 acquires the constraint function value from the constraint function value derivation unit 45. The update unit 46 executes the learning process based on the objective function value and the constraint function value. The update unit 46 outputs a trained mathematical model (learning result) to a predetermined external device (not shown). The learning process is not limited to a specific learning process.

以上のように、注意機構４２は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、第１特徴配列と第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する。復号化部４３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みと、第１特徴配列とに基づいて、現在の時刻における第２配列の要素を導出する。目的関数値導出部４４は、正解配列と第２配列とに応じた値である目的関数値を導出する。制約関数値導出部４５は、重み行列に基づいて制約関数値を導出する。更新部４６は、目的関数値と制約関数値とに基づいて所定の学習処理を実行することによって、第１符号化部４０と第２符号化部４１と注意機構４２と復号化部４３とを含む数理モデルのパラメータを更新し、学習結果を生成する。目的関数値は、例えば、正解配列と第２配列との間の差分又は残差平方和である。更新部４６は、数理モデルを更新する。 As described above, the attention mechanism 42 uses the first feature sequence based on the first sequence and the second feature sequence based on the second sequence, and each element of the first feature sequence and the second feature sequence has a correspondence relationship. Generates a weighting matrix, which is a matrix representing the probabilities in. The decoding unit 43 derives the elements of the second array at the current time based on the weight of each element of the first feature array with respect to the elements of the second array at the current time and the first feature array. The objective function value derivation unit 44 derives the objective function value which is a value corresponding to the correct answer array and the second array. The constraint function value derivation unit 45 derives the constraint function value based on the weight matrix. The update unit 46 executes a predetermined learning process based on the objective function value and the constraint function value, whereby the first coding unit 40, the second coding unit 41, the attention mechanism 42, and the decoding unit 43 are combined. Update the parameters of the included mathematical model and generate the training result. The objective function value is, for example, the difference or residual sum of squares between the correct array and the second array. The update unit 46 updates the mathematical model.

学習段階において更新された数理モデルは、実行段階において推論処理の実行に使用される。実行段階において、注意機構３２は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、第１特徴配列と第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する。復号化部３３は、第１特徴配列と重み行列とに基づいて、第２配列を導出する。推論部３４は、第２配列に基づいて所定の推論処理を実行することによって推論結果を生成する。 The mathematical model updated in the learning stage is used to execute the inference process in the execution stage. At the execution stage, the attention mechanism 32 uses the first feature sequence based on the first sequence and the second feature sequence based on the second sequence, and each element of the first feature sequence and the second feature sequence has a correspondence relationship. Generate a weighting matrix, which is a matrix representing a certain probability. The decoding unit 33 derives the second array based on the first feature array and the weight matrix. The inference unit 34 generates an inference result by executing a predetermined inference process based on the second array.

このように、単調性制約と連続性制約とのうちの少なくとも一方を表す制約関数値を用いて学習された数理モデルを用いて符号化部が特徴配列を導出することによって、有効に働く重み行列を注意機構が生成する。 In this way, a weight matrix that works effectively by the coding unit deriving a feature array using a mathematical model trained using constraint function values that represent at least one of a monotonic constraint and a continuity constraint. The attention mechanism produces.

これによって、人手によって設計された特徴表現の使用に依存することなく、音声合成又は音声変換などの応用問題に対して、より複雑な特徴表現を導出及び使用可能であると同時に、単調で連続的な対応関数を導出及び使用可能な配列整列を実現することが可能である。人手によって設計された特徴表現の使用に依存することなく、音声合成又は音声変換などの応用問題に対して、より複雑な特徴表現を実現することが可能である。また、音声合成又は音声変換などの推論精度の向上と学習時間の短縮とを両立させることが可能である。 This makes it possible to derive and use more complex feature representations for applied problems such as speech synthesis or speech conversion without relying on the use of manually designed feature representations, while at the same time being monotonous and continuous. It is possible to derive various corresponding functions and realize usable array alignment. It is possible to realize more complicated feature expressions for application problems such as speech synthesis or speech conversion without depending on the use of manually designed feature expressions. In addition, it is possible to achieve both improvement of inference accuracy such as speech synthesis or speech conversion and shortening of learning time.

図８は、各実施形態における、推論装置１のハードウェア構成例を示す図である。推論装置１の各機能部のうちの一部又は全部は、ＣＰＵ（Central Processing Unit）等のプロセッサ１００が、不揮発性の記録媒体（非一時的な記録媒体）を有する記憶部２００に記憶されたプログラムを実行することにより、ソフトウェアとして実現される。プログラムは、コンピュータ読み取り可能な記録媒体に記録されてもよい。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ（Read Only Memory）、ＣＤ－ＲＯＭ（Compact Disc Read Only Memory）等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置などの非一時的な記録媒体である。通信部３００は、推論装置１による処理結果を外部装置（不図示）に送信する。通信部３００は、通信回線を経由してプログラムを受信してもよい。表示部４００は、推論装置１による処理結果を表示する。表示部４００は、例えば、液晶ディスプレイ、有機ＥＬ（Electro Luminescence）ディスプレイである。 FIG. 8 is a diagram showing a hardware configuration example of the inference device 1 in each embodiment. A part or all of each functional unit of the inference device 1 is stored in a storage unit 200 in which a processor 100 such as a CPU (Central Processing Unit) has a non-volatile recording medium (non-temporary recording medium). It is realized as software by executing the program. The program may be recorded on a computer-readable recording medium. Computer-readable recording media include, for example, flexible disks, optomagnetic disks, portable media such as ROM (Read Only Memory) and CD-ROM (Compact Disc Read Only Memory), and storage of hard disks built into computer systems. It is a non-temporary recording medium such as a device. The communication unit 300 transmits the processing result of the inference device 1 to an external device (not shown). The communication unit 300 may receive the program via the communication line. The display unit 400 displays the processing result of the inference device 1. The display unit 400 is, for example, a liquid crystal display or an organic EL (Electro Luminescence) display.

推論装置１の各機能部のうちの一部又は全部は、例えば、ＬＳＩ（Large Scale Integration circuit）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）又はＦＰＧＡ（Field Programmable Gate Array）等を用いた電子回路（electronic circuit又はcircuitry）を含むハードウェアを用いて実現されてもよい。なお、推論装置３のハードウェア構成例は、推論装置１のハードウェア構成例と同様である。 A part or all of each functional part of the inference device 1 includes, for example, an LSI (Large Scale Integration circuit), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), an FPGA (Field Programmable Gate Array), or the like. It may be realized using hardware including the used electronic circuit (electronic circuit or circuitry). The hardware configuration example of the inference device 3 is the same as the hardware configuration example of the inference device 1.

図９は、各実施形態における、学習装置２のハードウェア構成例を示す図である。学習装置２の各機能部のうちの一部又は全部は、ＣＰＵ等のプロセッサ１０１が、不揮発性の記録媒体（非一時的な記録媒体）を有する記憶部２０１に記憶されたプログラムを実行することにより、ソフトウェアとして実現される。プログラムは、コンピュータ読み取り可能な記録媒体に記録されてもよい。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置などの非一時的な記録媒体である。通信部３０１は、学習装置２による処理結果を外部装置（不図示）に送信する。通信部３０１は、通信回線を経由してプログラムを受信してもよい。表示部４０１は、学習装置２による処理結果を表示する。表示部４０１は、例えば、液晶ディスプレイ、有機ＥＬディスプレイである。 FIG. 9 is a diagram showing a hardware configuration example of the learning device 2 in each embodiment. In a part or all of each functional unit of the learning device 2, a processor 101 such as a CPU executes a program stored in a storage unit 201 having a non-volatile recording medium (non-temporary recording medium). Is realized as software. The program may be recorded on a computer-readable recording medium. The computer-readable recording medium is, for example, a flexible disk, a magneto-optical disk, a portable medium such as a ROM or a CD-ROM, or a non-temporary recording medium such as a storage device such as a hard disk built in a computer system. The communication unit 301 transmits the processing result of the learning device 2 to an external device (not shown). The communication unit 301 may receive the program via the communication line. The display unit 401 displays the processing result by the learning device 2. The display unit 401 is, for example, a liquid crystal display or an organic EL display.

学習装置２の各機能部のうちの一部又は全部は、例えば、ＬＳＩ、ＡＳＩＣ、ＰＬＤ又はＦＰＧＡ等を用いた電子回路（electronic circuit又はcircuitry）を含むハードウェアを用いて実現されてもよい。なお、学習装置４のハードウェア構成例は、学習装置２のハードウェア構成例と同様である。 A part or all of each functional part of the learning device 2 may be realized by using hardware including an electronic circuit (electronic circuit or circuitry) using, for example, LSI, ASIC, PLD, FPGA or the like. The hardware configuration example of the learning device 4 is the same as the hardware configuration example of the learning device 2.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiment of the present invention has been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and the design and the like within a range not deviating from the gist of the present invention are also included.

本発明は、学習装置及び推論装置に適用可能である。 The present invention is applicable to learning devices and inference devices.

１…推論装置、２…学習装置、３…推論装置、４…学習装置、１０…符号化部、１１…注意機構、１２…照合部、１３…推論部、２０…符号化部、２１…注意機構、２２…目的関数値導出部、２３…制約関数値導出部、２４…更新部、３０…第１符号化部、３１…第２符号化部、３２…注意機構、３３…復号化部、３４…推論部、４０…第１符号化部、４１…第２符号化部、４２…注意機構、４３…復号化部、４４…目的関数値導出部、４５…制約関数値導出部、４６…更新部、１００…プロセッサ、１０１…プロセッサ、２００…記憶部、２０１…記憶部、３００…通信部、３０１…通信部、４００…表示部、４０１…表示部 1 ... Inference device, 2 ... Learning device, 3 ... Inference device, 4 ... Learning device, 10 ... Coding unit, 11 ... Attention mechanism, 12 ... Collation unit, 13 ... Inference unit, 20 ... Coding unit, 21 ... Caution Mechanism, 22 ... Objective function value derivation unit, 23 ... Constraint function value derivation unit, 24 ... Update unit, 30 ... First coding unit, 31 ... Second coding unit, 32 ... Attention mechanism, 33 ... Decoding unit, 34 ... Inference unit, 40 ... First coding unit, 41 ... Second coding unit, 42 ... Attention mechanism, 43 ... Decoding unit, 44 ... Objective function value derivation unit, 45 ... Constraint function value derivation unit, 46 ... Update unit, 100 ... Processor, 101 ... Processor, 200 ... Storage unit, 201 ... Storage unit, 300 ... Communication unit, 301 ... Communication unit, 400 ... Display unit, 401 ... Display unit

Claims

It is a matrix representing the probability that each element of the first feature sequence and the second feature sequence has a correspondence relationship by using the first feature array based on the first sequence and the second feature sequence based on the second sequence. Attention mechanism to generate weight matrix and
A label indicating whether or not the first array and the second array belong to the same class, and an objective function value which is a value corresponding to the first feature array and the second feature array are based on the weight matrix. Objective function value derivation part to be derived from
A learning device including an update unit that generates a learning result by executing a predetermined learning process based on the objective function value.

In the objective function value derivation unit, the difference or similarity between the first feature array and the second feature array or between the second feature array and the feature array derived from the weight matrix is displayed on the label. Determine the objective function value so that it can be associated,
The learning device according to claim 1.

It is a matrix representing the probability that each element of the first feature sequence and the second feature sequence has a correspondence relationship by using the first feature array based on the first sequence and the second feature sequence based on the second sequence. Attention mechanism to generate weight matrix and
A decoding unit that derives the elements of the second array at the current time based on the weight of each element of the first feature array with respect to the elements of the second array at the current time and the first feature array.
An objective function value derivation unit that derives an objective function value that is a value corresponding to the correct array and the second array, and
A learning device including an update unit that generates a learning result by executing a predetermined learning process based on the objective function value.

When there is a correspondence between the elements of the first array and the elements of the second array, a constraint function that derives a constraint function value representing at least one of a monotonic constraint and a continuity constraint based on the weight matrix. Equipped with a value derivator
The monotonic constraint is a constraint that the subscripts of the elements of the first array that correspond to the elements of the second array do not decrease as the subscripts of the elements of the second array increase.
The continuity constraint is that when the subscripts of adjacent elements in the second array are continuous, the subscripts of the elements of the first array that correspond to the subscripts of the adjacent elements in the second array It is a constraint that the difference is less than or equal to a predetermined positive value.
The update unit generates a learning result by executing a predetermined learning process based on the objective function value and the constraint function value.
The learning device according to any one of claims 1 to 3.

The constraint function value derivation unit reduces the constraint function value as the degree to which at least one of the monotonic constraint and the continuity constraint is satisfied is greater.
The learning device according to claim 4.

The constraint function value derivation unit derives the product of the weight matrix and a predetermined equality sequence as a corresponding array, and sums or sums all the local monotonic constraint function values for all the elements in the corresponding array. Derivation of the mean as the constraint function value of monotonicity,
The learning device according to claim 4 or 5.

The constraint function value derivation unit derives the product of the weight matrix and a predetermined equality sequence as a corresponding array, and for each element of the corresponding array, the element immediately preceding the element of the corresponding array and the corresponding array. The absolute value of the difference from the element is derived, a predetermined positive number is subtracted from the derived absolute value, and the maximum value of the numerical value of the subtraction result and 0 is derived as the function value of the local continuity constraint. Then, the sum or average of the function values of all the local continuity constraints for all the elements in the corresponding array is derived as the constraint function values of continuity.
The learning device according to claim 4 or 5.

It is a matrix representing the probability that each element of the first feature sequence and the second feature sequence has a correspondence relationship by using the first feature array based on the first sequence and the second feature sequence based on the second sequence. Attention mechanism to generate weight matrix and
A collation unit for deriving the distance between the first array and the second array based on the first feature array, the second feature array, and the weight matrix.
An inference device including an inference unit that generates an inference result by executing a predetermined inference process based on the distance.

It is a matrix representing the probability that each element of the first feature sequence and the second feature sequence has a correspondence relationship by using the first feature array based on the first sequence and the second feature sequence based on the second sequence. Attention mechanism to generate weight matrix and
A decoding unit that derives a second array based on the first feature array and the weight matrix,
An inference device including an inference unit that generates an inference result by executing a predetermined inference process based on the second array.

It is a learning method executed by the learning device.
It is a matrix representing the probability that each element of the first feature sequence and the second feature sequence has a correspondence relationship by using the first feature array based on the first sequence and the second feature sequence based on the second sequence. Attention step to generate weight matrix and
A label indicating whether or not the first array and the second array belong to the same class, and an objective function value which is a value corresponding to the first feature array and the second feature array are based on the weight matrix. The objective function value derivation step to be derived from
A learning method including an update step that generates a learning result by executing a predetermined learning process based on the objective function value.

It is an inference method executed by an inference device.
It is a matrix representing the probability that each element of the first feature sequence and the second feature sequence has a correspondence relationship by using the first feature array based on the first sequence and the second feature sequence based on the second sequence. Attention step to generate weight matrix and
A collation step for deriving the distance between the first sequence and the second sequence based on the first feature sequence, the second feature sequence, and the weight matrix.
An inference method that includes an inference step that produces an inference result by performing a predetermined inference process based on the distance.

It is a learning method executed by the learning device.
It is a matrix representing the probability that each element of the first feature sequence and the second feature sequence has a correspondence relationship by using the first feature array based on the first sequence and the second feature sequence based on the second sequence. Attention step to generate weight matrix and
A decoding step for deriving the elements of the second array at the current time based on the weight of each element of the first feature array with respect to the elements of the second array at the current time and the first feature array.
An objective function value derivation step for deriving an objective function value that is a value corresponding to the correct array and the second array, and
A learning method including an update step that generates a learning result by executing a predetermined learning process based on the objective function value.

It is an inference method executed by an inference device.
It is a matrix representing the probability that each element of the first feature sequence and the second feature sequence has a correspondence relationship by using the first feature array based on the first sequence and the second feature sequence based on the second sequence. Attention step to generate weight matrix and
A decoding step for deriving a second sequence based on the first feature array and the weight matrix,
An inference method including an inference step that generates an inference result by executing a predetermined inference process based on the second array.

A program for operating a computer as the learning device according to any one of claims 1 to 7.

A program for operating a computer as the inference device according to claim 8 or 9.