JP7340199B2

JP7340199B2 - Learning device, reasoning device, learning method, reasoning method and program

Info

Publication number: JP7340199B2
Application number: JP2020123246A
Authority: JP
Inventors: 小萌武; 昭悟木村; 邦夫柏野; ガントゥグスアタルサイハン; 誠一内田
Original assignee: Kyushu University NUC; Nippon Telegraph and Telephone Corp
Current assignee: Kyushu University NUC; Nippon Telegraph and Telephone Corp
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2023-09-07
Anticipated expiration: 2040-07-17
Also published as: JP2022019422A

Description

本発明は、学習装置、推論装置、学習方法、推論方法及びプログラムに関する。 The present invention relates to a learning device, an inference device, a learning method, an inference method, and a program.

配列とは、順番に並べられた一続きのデータである。配列の例として、音声信号と音響信号と生体信号とがある。配列の各データは、数値や数値ベクトルなどであり、配列の要素と呼ばれる。配列の各データは、自然数などの添字を用いて識別される。 An array is a series of data arranged in order. Examples of arrays include audio signals, acoustic signals, and biological signals. Each piece of data in the array is a numeric value, a numeric vector, etc., and is called an element of the array. Each piece of data in the array is identified using a subscript such as a natural number.

配列整列とは、複数の配列において互いに類似する領域を特定できるように、各配列の要素を整列させることである。配列の関係性を知る手がかりが配列整列によって与えられるので、配列整列は、例えば、動作認識、音声分析、生体信号分類及び署名認証等の多くの応用問題において重要である。特に、２個の配列の間に、局所的な変移と速度の変化とに関する非線形の時間変動が存在する場合、配列整列が必要となる。配列整列の代表的な方法として、動的時間伸縮法がある（非特許文献１参照）。 Sequence alignment refers to arranging the elements of each array so that mutually similar regions in multiple arrays can be identified. Sequence alignment is important in many applications, such as motion recognition, speech analysis, biosignal classification, and signature authentication, because it provides clues to sequence relationships. In particular, array alignment is required when there are non-linear temporal variations between the two arrays with respect to local displacements and velocity changes. A typical method for array alignment is the dynamic time warping method (see Non-Patent Document 1).

動的時間伸縮法では、２個の配列における各要素間の距離が導出される。対応関係にある要素間の距離の合計が最小になるように、２個の配列における各要素間の対応関係が検出される。対応関係とは、互いに対応している２個の要素の組み合わせ、又は、互いに対応している２個の要素の添字の組み合わせである。 In the dynamic time warping method, the distance between each element in two arrays is derived. The correspondence between each element in the two arrays is detected so that the sum of the distances between the corresponding elements is minimized. A correspondence relationship is a combination of two elements that correspond to each other, or a combination of subscripts of two elements that correspond to each other.

動的時間伸縮法では、処理の並列化が困難である。このため、動的時間伸縮法と深層学習とを組み合わせることは難しい。また、動的時間伸縮法は、人手によって設計された特徴表現の使用に依存し、より複雑な特徴表現が必要な場合に性能が不十分である。従って、動的時間伸縮法は、所定の目的の応用問題には最適でない場合が多い。 In the dynamic time warping method, parallelization of processing is difficult. For this reason, it is difficult to combine dynamic time warping and deep learning. Additionally, dynamic time warping methods rely on the use of manually designed feature representations and have insufficient performance when more complex feature representations are required. Therefore, dynamic time warping methods are often not optimal for a given purpose application problem.

機械翻訳、音声合成及び音声変換等の分野では、深層学習と組み合わせることが容易な配列整列の方法として、注意機構を使用する方法がある（非特許文献２、３参照）。注意機構は、第１配列と第２配列との２個の配列に関して、第２配列の各要素に対する第１配列の各要素の重みを導出する。導出された各重みは、第１配列と第２配列との２個の配列の各要素が対応関係にある確率を表す。注意機構を使用する配列整列の方法では、第２配列の各要素に対する第１配列の各要素の重みに基づいて第１配列の各要素が並べ替えられることによって、配列整列が実現される。 In fields such as machine translation, speech synthesis, and speech conversion, there is a method of using an attention mechanism as a sequence alignment method that can be easily combined with deep learning (see Non-Patent Documents 2 and 3). The attention mechanism derives the weight of each element of the first array relative to each element of the second array for two arrays, the first array and the second array. Each derived weight represents the probability that each element of the two arrays, the first array and the second array, has a corresponding relationship. In a method of array alignment using an attention mechanism, array alignment is achieved by rearranging each element of the first array based on the weight of each element of the first array relative to each element of the second array.

Hiroaki Sakoe and Seibi Chiba, "Dynamic programming algorithm optimization for spoken word recognition," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 26, No. 1, pp. 43-49, 1978.Hiroaki Sakoe and Seibi Chiba, "Dynamic programming algorithm optimization for spoken word recognition," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 26, No. 1, pp. 43-49, 1978. Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio, "Neural machine translation by jointly learning to align and translate," In ICLR, 2015.Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio, "Neural machine translation by jointly learning to align and translate," In ICLR, 2015. Minh-Thang Luong, Hieu Pham, and Christopher D. Manning, "Effective approaches to attention-based neural machine translation," In EMNLP, pp. 1412-1421, 2015.Minh-Thang Luong, Hieu Pham, and Christopher D. Manning, "Effective approaches to attention-based neural machine translation," In EMNLP, pp. 1412-1421, 2015.

第２配列の添字を独立変数とし、第２配列の添字との対応関係にある第１配列の添字を従属変数とする関数（以下「対応関数」という。）を用いて、２個の配列の各要素間の対応関係は表される。照合又は分類などの応用問題では、同じクラスに属する２個の配列において、対応関数が単調で連続的である場合が多い。これに対して、異なるクラスに属する２個の配列において、対応関数が非単調又は非連続的である場合が多い。 Using a function (hereinafter referred to as "correspondence function") that takes the subscript of the second array as an independent variable and the subscript of the first array that has a correspondence relationship with the subscript of the second array as a dependent variable, The correspondence between each element is expressed. In applied problems such as matching or classification, the correspondence functions between two arrays belonging to the same class are often monotone and continuous. On the other hand, in two arrays belonging to different classes, the corresponding functions are often non-monotonic or discontinuous.

このような性質が利用されることによって、同じクラスに２個の配列が属するか否かを判定することが可能である。例えば、単調で連続的な対応関数が２個の配列から導出され、対応関係にある要素間の距離の合計が導出可能である。この合計が大きい場合には、異なるクラスに２個の配列が属すると判定することができる。 By utilizing such properties, it is possible to determine whether two arrays belong to the same class. For example, a monotonic, continuous correspondence function can be derived from two arrays, and the sum of distances between corresponding elements can be derived. If this sum is large, it can be determined that the two arrays belong to different classes.

このような性質を利用する代表的な配列整列方法として、動的時間伸縮法がある。しかしながら、動的時間伸縮法は、人手によって設計された特徴表現の使用に依存し、より複雑な特徴表現が必要な場合に性能が不十分である。従って、動的時間伸縮法は、所定の目的の応用問題には最適でない場合が多い。 A dynamic time warping method is a typical array alignment method that utilizes this property. However, dynamic time warping methods rely on the use of manually designed feature representations and perform poorly when more complex feature representations are required. Therefore, dynamic time warping methods are often not optimal for a given purpose application problem.

これに対して注意機構は、人手によって設計された特徴表現に依存しない。しかしながら従来では、注意機構を使用して照合又は分類などの応用問題を解決することができない。なぜなら、２個の配列の各要素が対応関係にある確率を従来の注意機構が導出したとしても、対応関数を確率から導出することができないためである。また、従来の注意機構が対応関数を導出したとしても、対応関数が単調で連続的であることを保証する方法がないためである。 In contrast, attention mechanisms do not rely on manually designed feature representations. However, conventionally, attention mechanisms cannot be used to solve application problems such as matching or classification. This is because even if the conventional attention mechanism derives the probability that each element of two arrays has a corresponding relationship, it is not possible to derive the correspondence function from the probability. Further, even if a conventional attention mechanism derives a correspondence function, there is no way to guarantee that the correspondence function is monotone and continuous.

従って、従来の注意機構を使用して整列された配列の間の距離が照合又は分類等の応用問題に適用された場合、配列間の距離が非常に小さく導出されることが多い。このため、異なるクラスに属する２個の配列を正しく区別することができないことが多い。 Therefore, when distances between arrays aligned using conventional attention mechanisms are applied to applied problems such as matching or classification, the distances between arrays are often derived to be very small. For this reason, it is often impossible to correctly distinguish between two arrays belonging to different classes.

図１０は、重み行列の例を示す図である。重み行列は、２個の配列の各要素が対応関係にある確率を表す行列である。図１０では、第１配列は一例として「ＬＩＳＴＥＮ」であり、第２配列は一例として「ＳＩＬＥＮＴ」である。値が「１」である重み行列の要素は、該当する要素が対応関係にあることを表す。 FIG. 10 is a diagram showing an example of a weight matrix. The weight matrix is a matrix that represents the probability that each element of two arrays has a corresponding relationship. In FIG. 10, the first array is, for example, "LISTEN", and the second array is, for example, "SILENT". An element of the weight matrix having a value of "1" indicates that the corresponding element has a corresponding relationship.

図１０における左側に示された重み行列は、従来の注意機構によって導出された重み行列である。このように従来の注意機構は、非単調で非連続的な対応関数を導出する。異なるクラスに２個の配列が属していても、図１０における左側に示された重み行列では、対応関係にある要素間の距離の合計が０となっているため、２個の配列を正しく区別することができていない。 The weight matrix shown on the left side of FIG. 10 is a weight matrix derived by a conventional attention mechanism. Conventional attention mechanisms thus derive non-monotonic and discontinuous correspondence functions. Even if two arrays belong to different classes, in the weight matrix shown on the left side of Figure 10, the sum of the distances between corresponding elements is 0, so the two arrays cannot be correctly distinguished. I haven't been able to do that.

このため、照合又は分類などの応用問題において、図１０における右側に示された重み行列における「１」の並び方のように単調で連続的な対応関数を導出及び使用可能な配列整列方法が必要とされている。このような配列整列方法によって、配列間の距離又は類似度が正しく導出され、異なるクラスに属する配列であるか否かを正しく推論することが可能である。 For this reason, in applied problems such as matching or classification, there is a need for an array alignment method that can derive and use monotonous and continuous correspondence functions, such as the arrangement of "1"s in the weight matrix shown on the right side of Figure 10. has been done. With such a sequence alignment method, the distance or similarity between sequences can be correctly derived, and it is possible to correctly infer whether the sequences belong to different classes.

音声合成又は音声変換などの応用問題では、第１配列を第２配列に変換することが目的である。第１配列と第２配列との間において、局所的な変移と速度の変化とに関する非線形の時間変動が存在する場合、配列整列が必要となる。例えば、日本人の英語音声をアメリカ人の英語音声に変換する場合、英語音声のテンポに変動が存在するため、音声信号の配列を整列する必要がある。すなわち、２個の配列の各要素間の対応関係が推定され、推定された対応関係を使用して第１配列が整列され、整列された第１配列が第２配列に変換される必要がある。このような場合でも、２個の配列の間における対応関数が単調で連続的であることが多い。 In applied problems such as speech synthesis or speech conversion, the objective is to convert a first array into a second array. Sequence alignment is required when there is a non-linear temporal variation in local displacement and velocity change between the first and second arrays. For example, when converting Japanese English speech into American English speech, it is necessary to align the array of audio signals because there are variations in the tempo of the English speech. That is, it is necessary to estimate the correspondence between each element of the two arrays, align the first array using the estimated correspondence, and convert the aligned first array into the second array. . Even in such cases, the correspondence function between the two arrays is often monotone and continuous.

しかしながら、従来の注意機構を使用する方法では、単調で連続的な対応関数を注意機構が導出できるように、数理モデルの学習を誘導（ガイド）する機能がない。このため、注意機構が十分な性能を提供できるようになるまでには、長い学習時間が必要である場合が多い。 However, the conventional method using an attention mechanism does not have a function to guide learning of a mathematical model so that the attention mechanism can derive a monotonous and continuous correspondence function. For this reason, long learning times are often required before the attention mechanism can provide sufficient performance.

このため、音声合成又は音声変換などの応用問題においても、上述の配列整列方法が必要とされている。このような配列整列方法によって、音声合成又は音声変換などの推論精度の向上と学習時間の短縮とを両立させることが可能である。 Therefore, the above-mentioned array alignment method is also required in applied problems such as speech synthesis or speech conversion. By using such an array alignment method, it is possible to both improve the inference accuracy of speech synthesis or speech conversion and shorten the learning time.

上記事情に鑑み、本発明は、人手によって設計された特徴表現の使用に依存することなく、より複雑な特徴表現を導出及び使用可能であると同時に、単調で連続的な対応関数を導出及び使用可能な配列整列を実現することが可能である学習装置、推論装置、学習方法、推論方法及びプログラムを提供することを目的としている。 In view of the above circumstances, the present invention makes it possible to derive and use more complex feature representations without relying on the use of manually designed feature representations, and at the same time derive and use monotonous and continuous correspondence functions. It is an object of the present invention to provide a learning device, an inference device, a learning method, an inference method, and a program that can realize possible array alignments.

本発明の一態様は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意機構と、同じクラスに前記第１配列と前記第２配列とが属するか否かを表すラベルと前記第１特徴配列と前記第２特徴配列とに応じた値である目的関数値を、前記重み行列に基づいて導出する目的関数値導出部と、前記目的関数値に基づいて所定の学習処理を実行することによって学習結果を生成する更新部とを備える学習装置である。 One aspect of the present invention is to use a first feature array based on a first array and a second feature array based on a second array, so that each element of the first feature array and the second feature array is in a corresponding relationship. an attention mechanism that generates a weight matrix that is a matrix representing a certain probability; a label that represents whether the first array and the second array belong to the same class; and the first feature array and the second feature array. an objective function value deriving unit that derives an objective function value that is a value according to the weight matrix based on the weight matrix; and an updating unit that generates a learning result by executing a predetermined learning process based on the objective function value. It is a learning device equipped with.

本発明の一態様は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意機構と、現在の時刻における前記第２配列の要素に対する前記第１特徴配列の各要素の重みと前記第１特徴配列とに基づいて、現在の時刻における前記第２配列の要素を導出する復号化部と、正解配列と前記第２配列とに応じた値である目的関数値を導出する目的関数値導出部と、前記目的関数値に基づいて所定の学習処理を実行することによって学習結果を生成する更新部とを備える学習装置である。 One aspect of the present invention is to use a first feature array based on a first array and a second feature array based on a second array, so that each element of the first feature array and the second feature array is in a corresponding relationship. an attention mechanism that generates a weight matrix, which is a matrix representing a certain probability, and a weight of each element of the first feature array with respect to the elements of the second array at the current time, and the first feature array. a decoding unit that derives the elements of the second array at a time; an objective function value deriving unit that derives an objective function value that is a value corresponding to the correct answer array and the second array; The learning device includes an updating unit that generates a learning result by executing a predetermined learning process.

本発明の一態様は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意機構と、前記第１特徴配列と前記第２特徴配列と前記重み行列とに基づいて、前記第１配列と前記第２配列との間の距離を導出する照合部と、前記距離に基づいて所定の推論処理を実行することによって推論結果を生成する推論部とを備える推論装置である。 One aspect of the present invention is to use a first feature array based on a first array and a second feature array based on a second array, so that each element of the first feature array and the second feature array is in a corresponding relationship. an attention mechanism that generates a weight matrix that is a matrix representing a certain probability; and a distance between the first array and the second array based on the first feature array, the second feature array, and the weight matrix. This inference device includes a matching unit that derives the distance, and an inference unit that generates an inference result by executing a predetermined inference process based on the distance.

本発明の一態様は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意機構と、前記第１特徴配列と前記重み行列とに基づいて第２配列を導出する復号化部と、前記第２配列に基づいて所定の推論処理を実行することによって推論結果を生成する推論部とを備える推論装置である。 One aspect of the present invention is to use a first feature array based on a first array and a second feature array based on a second array, so that each element of the first feature array and the second feature array is in a corresponding relationship. an attention mechanism that generates a weight matrix that is a matrix representing a certain probability; a decoding unit that derives a second array based on the first feature array and the weight matrix; and a predetermined inference based on the second array. The inference device includes an inference unit that generates inference results by executing processing.

本発明の一態様は、学習装置が実行する学習方法であって、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意ステップと、同じクラスに前記第１配列と前記第２配列とが属するか否かを表すラベルと前記第１特徴配列と前記第２特徴配列とに応じた値である目的関数値を、前記重み行列に基づいて導出する目的関数値導出ステップと、前記目的関数値に基づいて所定の学習処理を実行することによって学習結果を生成する更新ステップとを含む学習方法である。 One aspect of the present invention is a learning method executed by a learning device, which uses a first feature array based on a first array and a second feature array based on a second array. a caution step of generating a weight matrix, which is a matrix representing the probability that each element has a correspondence relationship with the two feature arrays, and a label representing whether the first array and the second array belong to the same class; an objective function value derivation step of deriving an objective function value that is a value corresponding to the first feature array and the second feature array based on the weight matrix; and executing a predetermined learning process based on the objective function value. This learning method includes an updating step of generating a learning result by performing the following steps.

本発明の一態様は、推論装置が実行する推論方法であって、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意ステップと、前記第１特徴配列と前記第２特徴配列と前記重み行列とに基づいて、前記第１配列と前記第２配列との間の距離を導出する照合ステップと、前記距離に基づいて所定の推論処理を実行することによって推論結果を生成する推論ステップとを含む推論方法である。 One aspect of the present invention is an inference method executed by an inference device, which uses a first feature array based on a first array and a second feature array based on a second array. a step of generating a weight matrix that is a matrix representing the probability that each element has a correspondence relationship with the first feature array, the second feature array, and the weight matrix; The inference method includes a matching step of deriving a distance between an array and the second array, and an inference step of generating an inference result by performing a predetermined inference process based on the distance.

本発明の一態様は、学習装置が実行する学習方法であって、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意ステップと、現在の時刻における前記第２配列の要素に対する前記第１特徴配列の各要素の重みと前記第１特徴配列とに基づいて、現在の時刻における前記第２配列の要素を導出する復号化ステップと、正解配列と前記第２配列とに応じた値である目的関数値を導出する目的関数値導出ステップと、前記目的関数値に基づいて所定の学習処理を実行することによって学習結果を生成する更新ステップとを含む学習方法である。 One aspect of the present invention is a learning method executed by a learning device, which uses a first feature array based on a first array and a second feature array based on a second array. a step of generating a weight matrix, which is a matrix representing the probability that each element has a correspondence relationship with the two feature arrays; a decoding step for deriving the elements of the second array at the current time based on the first feature array; and an objective function value for deriving an objective function value that is a value according to the correct answer array and the second array. The learning method includes a deriving step and an updating step of generating a learning result by executing a predetermined learning process based on the objective function value.

本発明の一態様は、推論装置が実行する推論方法であって、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、前記第１特徴配列と前記第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する注意ステップと、前記第１特徴配列と前記重み行列とに基づいて第２配列を導出する復号化ステップと、前記第２配列に基づいて所定の推論処理を実行することによって推論結果を生成する推論ステップとを含む推論方法である。 One aspect of the present invention is an inference method executed by an inference device, which uses a first feature array based on a first array and a second feature array based on a second array. a caution step of generating a weight matrix, which is a matrix representing the probability that each element has a correspondence relationship with the two feature arrays; and a decoding step of deriving a second array based on the first feature array and the weight matrix. and an inference step of generating an inference result by performing a predetermined inference process based on the second array.

本発明の一態様は、上記に記載の学習装置としてコンピュータを機能させるためのプログラムである。 One aspect of the present invention is a program for causing a computer to function as the learning device described above.

本発明の一態様は、上記に記載の推論装置としてコンピュータを機能させるためのプログラムである。 One aspect of the present invention is a program for causing a computer to function as the inference device described above.

本発明により、人手によって設計された特徴表現の使用に依存することなく、より複雑な特徴表現を導出及び使用可能であると同時に、単調で連続的な対応関数を導出及び使用可能な配列整列を実現することが可能である。 The present invention allows for the derivation and use of more complex feature representations without relying on the use of manually designed feature representations, while at the same time providing array alignment that allows the derivation and use of monotonic and continuous correspondence functions. It is possible to achieve this.

第１実施形態における、推論装置の構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of an inference device in the first embodiment. 第１実施形態における、学習装置の構成例を示す図である。It is a diagram showing an example of the configuration of a learning device in the first embodiment. 第１実施形態における、対応配列の例を示す図である。It is a figure showing an example of correspondence arrangement in a 1st embodiment. 第１実施形態における、単調性制約関数値の導出例を示す図である。FIG. 6 is a diagram showing an example of deriving a monotonicity constraint function value in the first embodiment. 第１実施形態における、連続性制約関数値の導出例を示す図である。FIG. 3 is a diagram showing an example of deriving a continuity constraint function value in the first embodiment. 第２実施形態における、推論装置の構成例を示す図である。FIG. 7 is a diagram illustrating a configuration example of an inference device in a second embodiment. 第２実施形態における、学習装置の構成例を示す図である。It is a figure showing an example of composition of a learning device in a 2nd embodiment. 各実施形態における、推論装置のハードウェア構成例を示す図である。FIG. 2 is a diagram illustrating an example of the hardware configuration of an inference device in each embodiment. 各実施形態における、学習装置のハードウェア構成例を示す図である。It is a diagram showing an example of the hardware configuration of a learning device in each embodiment. 重み行列の例を示す図である。FIG. 3 is a diagram showing an example of a weight matrix.

本発明の実施形態について、図面を参照して詳細に説明する。
以下では、配列の照合又は分類などの応用問題において、注意機構が使用される。これによって、人手によって設計された特徴表現の使用に依存することなく、より複雑な特徴表現を導出及び使用可能な配列整列が実現される。 Embodiments of the present invention will be described in detail with reference to the drawings.
In the following, attention mechanisms are used in applied problems such as sequence matching or classification. This provides an array alignment that allows more complex feature representations to be derived and used without relying on the use of manually designed feature representations.

以下では、単調性制約と連続性制約とのうちの少なくとも一方を表す制約関数値が新たに提案される。単調で連続的な対応関数を注意機構が導出できるように、制約関数値が最小化されることによって、符号化部と注意機構とを含む数理モデルの学習を誘導（ガイド）することが可能である。 In the following, a new constraint function value representing at least one of the monotonicity constraint and the continuity constraint is proposed. By minimizing the constraint function value so that the attention mechanism can derive a monotone and continuous correspondence function, it is possible to guide the learning of the mathematical model that includes the encoding part and the attention mechanism. be.

以下、単調性制約とは、第１配列の要素と第２配列の要素とに対応関係があり、第２配列の要素の添字（番号）の増加につれて、第２配列の要素との対応関係にある第１配列の要素の添字（番号）が減少しないという制約である。以下、連続性制約とは、第１配列の要素と第２配列の要素とに対応関係があり、第２配列において隣り合う要素の添字（番号）が連続している場合に、第２配列において隣り合う要素の添字との対応関係にある第１配列の要素の添字同士の差が所定の正値以下であるという制約である。 Hereinafter, the monotonicity constraint means that there is a correspondence between the elements of the first array and the elements of the second array, and as the subscript (number) of the element of the second array increases, the correspondence with the element of the second array increases. This is a restriction that the subscript (number) of an element in a certain first array does not decrease. Hereinafter, a continuity constraint means that there is a correspondence between elements in the first array and elements in the second array, and if the subscripts (numbers) of adjacent elements in the second array are consecutive, then in the second array This is a constraint that the difference between the subscripts of elements in the first array that correspond to the subscripts of adjacent elements is less than or equal to a predetermined positive value.

（第１実施形態）
第１実施形態では、照合又は分類などの応用問題に、学習方法及び推論方法が適用される。照合又は分類などの応用問題として、例えば、動作認識、音声認識、生体信号分類及び署名認証等がある。 (First embodiment)
In the first embodiment, the learning method and the inference method are applied to applied problems such as matching or classification. Examples of applied problems such as matching or classification include motion recognition, voice recognition, biological signal classification, and signature authentication.

学習段階において、学習装置が注意機構を用いて、数理モデルの学習を実行する。すなわち学習段階において、学習装置は、多数のパラメータを持つ数理モデルを、学習データを用いて学習する。学習装置は、数理モデルのパラメータの数値を決定することによって、学習済の数理モデルを生成する。実行段階において、推論装置は、学習済の数理モデルを用いて、推論処理を実行する。例えば、推論装置は、照合又は分類等の目的のタスクを実行する。 In the learning phase, the learning device uses an attention mechanism to perform learning of the mathematical model. That is, in the learning stage, the learning device learns a mathematical model having a large number of parameters using learning data. The learning device generates a learned mathematical model by determining numerical values of parameters of the mathematical model. In the execution stage, the inference device uses the learned mathematical model to execute inference processing. For example, the reasoning device performs a desired task such as matching or classification.

まず、実行段階における、照合又は分類などの応用問題に適用される推論方法について説明する。 First, the inference method applied to applied problems such as matching or classification in the execution stage will be explained.

図１は、第１実施形態における、推論装置１の構成例を示す図である。第１実施形態の実行段階では、照合又は分類などの応用問題に推論方法が適用される。推論装置１は、第１配列と第２配列を入力として取得する。例えば、動作認識では、推論装置１は、人体における複数の特徴点（例えば、関節位置）の座標などを時間順に並べた配列を、入力として取得する。署名認証では、推論装置１は、署名収集装置のディスプレイにおける署名座標又は筆圧などを時間順に並べた配列を、入力として取得する。推論装置１は、第１配列と第２配列の間の距離を導出する。推論装置１は、距離に基づいて推論処理を実行する。推論装置１は、推論結果を所定の外部装置（不図示）に出力する。 FIG. 1 is a diagram showing a configuration example of an inference device 1 in the first embodiment. In the execution stage of the first embodiment, the inference method is applied to applied problems such as matching or classification. The inference device 1 obtains the first array and the second array as input. For example, in motion recognition, the inference device 1 obtains as input an array in which the coordinates of a plurality of feature points (for example, joint positions) on a human body are arranged in chronological order. In signature authentication, the inference device 1 obtains as input an array in which signature coordinates or pen pressure on the display of the signature collection device are arranged in chronological order. The inference device 1 derives the distance between the first array and the second array. The inference device 1 executes inference processing based on distance. The inference device 1 outputs the inference result to a predetermined external device (not shown).

距離は、照合又は分類などの応用問題を解決するために使用可能である。例えば、分類問題では、推論装置１は、クラスが既知である学習配列と、クラスが未知である目標配列との間の距離を導出する。推論装置１は、Ｋ近傍法又はサポートベクターマシンなどを使用して、目標配列のクラスを推定する。探索問題では、推論装置１は、クエリ配列とデータベースにある配列との間の距離を導出する。推論装置１は、距離が最も短い配列を、探索結果として導出する。 Distance can be used to solve application problems such as matching or classification. For example, in a classification problem, the inference device 1 derives the distance between a learning array whose class is known and a target array whose class is unknown. The inference device 1 estimates the class of the target array using the K-nearest neighbor method, support vector machine, or the like. In the search problem, the inference device 1 derives the distance between the query sequence and the sequences in the database. The inference device 1 derives the array with the shortest distance as a search result.

推論装置１は、符号化部１０－１と、符号化部１０－２と、注意機構１１と、照合部１２と、推論部１３とを備える。 The inference device 1 includes an encoding section 10-1, an encoding section 10-2, a caution mechanism 11, a collation section 12, and an inference section 13.

推論装置１の機能部の詳細を説明する。
＜符号化部１０＞
符号化部１０－１は、第１配列を入力として取得する。符号化部１０－２は、第２配列を入力として取得する。符号化部１０－１は、第１特徴配列（第１特徴表現）を注意機構１１と照合部１２とに出力する。符号化部１０－２は、第２特徴配列（第２特徴表現）を注意機構１１と照合部１２とに出力する。 The details of the functional units of the inference device 1 will be explained.
<Encoding unit 10>
The encoding unit 10-1 obtains the first array as input. The encoding unit 10-2 obtains the second array as input. The encoding unit 10-1 outputs the first feature array (first feature expression) to the attention mechanism 11 and the matching unit 12. The encoding unit 10-2 outputs the second feature array (second feature expression) to the attention mechanism 11 and the matching unit 12.

符号化部１０－１の動作は、符号化部１０－２の動作と同様である。このため以下では、符号化部１０－１の動作について説明する。また以下では、符号化部１０－１と符号化部１０－２とに共通する事項については、符号の一部を省略して、「符号化部１０」と表記する。符号化部１０は、第１配列に基づいて、数値又は数値ベクトルを要素とする配列を第１特徴配列として導出する。 The operation of encoding section 10-1 is similar to that of encoding section 10-2. Therefore, the operation of the encoding section 10-1 will be explained below. Further, in the following, items common to the encoding unit 10-1 and the encoding unit 10-2 will be referred to as "encoding unit 10" with a part of the code omitted. Based on the first array, the encoding unit 10 derives an array whose elements are numerical values or numerical vectors as a first feature array.

＜符号化部１０の第１例＞
符号化部１０の第１例では、符号化部１０は、人工ニューラルネットワークを使用して、第１特徴配列を第１配列から導出する。学習段階において、人工ニューラルネットワークのパラメータは、学習データに基づいて決定される。 <First example of encoding unit 10>
In a first example of the encoding unit 10, the encoding unit 10 uses an artificial neural network to derive a first feature array from the first array. In the learning phase, parameters of the artificial neural network are determined based on training data.

符号化部１０の第１例の処理の詳細は、以下の通りである。
符号化部１０の第１例では、符号化部１０は、第１配列の長さを，所定の長さ（例えば、１０２４）に変更する。これは、人工ニューラルネットワークの学習が実行される場合に、バッチ学習又はミニバッチ学習を使用可能とするために必要である。第１配列の各要素は、１次元の数値又は多次元の数値ベクトルである。 Details of the first example of processing by the encoding unit 10 are as follows.
In the first example of the encoding unit 10, the encoding unit 10 changes the length of the first array to a predetermined length (for example, 1024). This is necessary to be able to use batch or mini-batch learning when training the artificial neural network is performed. Each element of the first array is a one-dimensional numerical value or a multidimensional numerical vector.

長さが変更された第１配列の要素の各次元について、当該次元の全ての数値の平均が０になり、当該次元の全ての数値の分散が１になるように、符号化部１０は、当該次元の全ての数値を正規化する。正規化された第１配列は、例えば、「１×１０２４×５」のテンソルである。この「１０２４」は、配列の長さの例である。この「５」は、配列の要素の次元数の例である。 For each dimension of the elements of the first array whose length has been changed, the encoding unit 10 performs the following: Normalize all numbers in that dimension. The normalized first array is, for example, a "1x1024x5" tensor. This "1024" is an example of the length of the array. This "5" is an example of the number of dimensions of the elements of the array.

符号化部１０は、正規化された第１配列を、畳み込みニューラルネットワークに入力する。畳み込みニューラルネットワークは、例えば、１個の「１×７×６４」の畳み込み層と、１個の最大プーリング層と、２個の「１×３×６４」の畳み込み層を備える。各畳み込み層の直後には、バッチ正規化層が備えられる。バッチ正規化層に続いて、ＲｅＬＵ層が活性化関数として備えられる。最後のＲｅＬＵ層は、多次元の数値ベクトルを要素とする配列を出力する。 The encoding unit 10 inputs the normalized first array to the convolutional neural network. The convolutional neural network includes, for example, one "1x7x64" convolutional layer, one maximum pooling layer, and two "1x3x64" convolutional layers. A batch normalization layer is provided immediately after each convolutional layer. Following the batch normalization layer, a ReLU layer is provided as an activation function. The final ReLU layer outputs an array whose elements are multidimensional numerical vectors.

符号化部１０は、多次元の数値ベクトルを要素とする配列の各要素について、当該要素の全ての数値のＬ２ノルムが１になるように、当該要素の全ての数値を正規化する。符号化部１０は、正規化された配列を第１特徴配列として、注意機構１１と照合部１２とに出力する。符号化部１０の第１例では、畳み込みニューラルネットワークの代わりに、再帰型ニューラルネットワークなどが使用されてもよい。 For each element of an array whose elements are multidimensional numerical vectors, the encoding unit 10 normalizes all the numerical values of the element so that the L2 norm of all the numerical values of the element becomes 1. The encoding unit 10 outputs the normalized array to the attention mechanism 11 and the matching unit 12 as a first feature array. In the first example of the encoding unit 10, a recurrent neural network or the like may be used instead of a convolutional neural network.

＜符号化部１０の第２例＞
符号化部１０の第２例では、符号化部１０は、入力された第１配列を第１特徴配列として、注意機構１１と照合部１２とに出力する。符号化部１０の第２例では、符号化部１０は、パラメータを持たない。 <Second example of encoding unit 10>
In the second example of the encoding unit 10, the encoding unit 10 outputs the input first array to the attention mechanism 11 and the matching unit 12 as the first feature array. In the second example of the encoding unit 10, the encoding unit 10 does not have parameters.

＜注意機構１１＞
注意機構１１は、第１特徴配列を、符号化部１０－１から取得する。注意機構１１は、第２特徴配列を、符号化部１０－２から取得する。注意機構１１は、第１特徴配列の各要素と第２特徴配列の各要素とに基づいて、第２特徴配列の各要素に対する第１特徴配列の各要素の重みを導出する。第２特徴配列の各要素に対する、第１特徴配列の各要素の重みは、２個の要素が対応関係にある確率を表す。重みが大きいほど、２個の要素が対応関係にある確率が高い。注意機構１１は、重み行列を照合部１２に出力する。 <Caution mechanism 11>
The attention mechanism 11 acquires the first feature array from the encoding unit 10-1. The attention mechanism 11 acquires the second feature array from the encoding unit 10-2. The attention mechanism 11 derives the weight of each element of the first feature array with respect to each element of the second feature array based on each element of the first feature array and each element of the second feature array. The weight of each element of the first feature array with respect to each element of the second feature array represents the probability that the two elements have a corresponding relationship. The larger the weight, the higher the probability that two elements have a corresponding relationship. The attention mechanism 11 outputs the weight matrix to the matching unit 12.

＜注意機構１１の第１例＞
注意機構１１の第１例では、注意機構１１は、人工ニューラルネットワークを使用して、第１特徴配列の各要素と第２特徴配列の各要素とに基づいて、第２特徴配列の各要素に対する第１特徴配列の各要素の重みを導出する。学習段階において、人工ニューラルネットワークのパラメータは、学習データに基づいて決定される。 <First example of attention mechanism 11>
In the first example of the attention mechanism 11, the attention mechanism 11 uses an artificial neural network to perform a calculation for each element of the second feature array based on each element of the first feature array and each element of the second feature array. Derive the weight of each element of the first feature array. In the learning phase, parameters of the artificial neural network are determined based on training data.

注意機構１１の第１例の処理の詳細は、以下の通りである。
注意機構１１の第１例では、注意機構１１は、第１特徴配列の各要素である数値ベクトルと、第２特徴配列の各要素である数値ベクトルとを、数値ベクトルの次元方向に沿って連結する。注意機構１１は、連結された数値ベクトルを、人工ニューラルネットワークに入力する。 Details of the first example of processing by the attention mechanism 11 are as follows.
In the first example of the attention mechanism 11, the attention mechanism 11 connects the numerical vector that is each element of the first feature array and the numerical vector that is each element of the second feature array along the dimension direction of the numerical vector. do. The attention mechanism 11 inputs the concatenated numerical vectors into the artificial neural network.

人工ニューラルネットワークは、例えば、３個の全結合層を備える。３個の全結合層において、１個目の全結合層が６４個の隠れユニットを有し、２個目の全結合層が１６個の隠れユニットを有し、３個目の全結合層が１個の隠れユニットを有する。１個目の全結合層の直後において、活性化関数としてＲｅＬＵ層が備えられる。２個目の全結合層の直後において、活性化関数としてＲｅＬＵ層が備えられる。３個目の全結合層は、１個の実数を出力する。 The artificial neural network comprises, for example, three fully connected layers. Among the three fully connected layers, the first fully connected layer has 64 hidden units, the second fully connected layer has 16 hidden units, and the third fully connected layer has 64 hidden units. Contains 1 hidden unit. Immediately after the first fully connected layer, a ReLU layer is provided as an activation function. Immediately after the second fully connected layer, a ReLU layer is provided as an activation function. The third fully connected layer outputs one real number.

第２特徴配列の各要素について、注意機構１１は、当該要素と第１特徴配列の各要素とを用いて導出された実数を全て含む配列を、Ｓｏｆｔｍａｘ関数を用いて正規化する。この導出された実数を全て含む配列とは、第１特徴配列の各要素に対して出力された実数を配列としてまとめたものである。導出された実数を全て含む配列は、第１特徴配列の要素数と同じ数の実数を含む。注意機構１１は、第２特徴配列の各要素に対する第１特徴配列の各要素の重みとして、正規化された実数を導出する。注意機構１１は、第２特徴配列の各要素に対する第１特徴配列の各要素の重みを全て含む行列を、重み行列として照合部１２に出力する。 For each element of the second feature array, the attention mechanism 11 normalizes an array including all real numbers derived using the element and each element of the first feature array using the Softmax function. The array including all the derived real numbers is an array of real numbers output for each element of the first feature array. The array containing all the derived real numbers contains the same number of real numbers as the number of elements of the first feature array. The attention mechanism 11 derives a normalized real number as the weight of each element of the first feature array with respect to each element of the second feature array. The attention mechanism 11 outputs a matrix including all the weights of each element of the first feature array for each element of the second feature array to the matching unit 12 as a weight matrix.

＜注意機構１１の第２例＞
注意機構１１の第２例の処理の詳細は、以下の通りである。
注意機構１１の第２例では、注意機構１１は、第１特徴配列の各要素と第２特徴配列の各要素との内積を導出する。注意機構１１は、第２特徴配列の各要素について、第２特徴配列の各要素と第１特徴配列の各要素との内積を全て含む配列を、Ｓｏｆｔｍａｘ関数によって正規化する。注意機構１１は、第２特徴配列の各要素に対する第１特徴配列の各要素の重みとして、正規化された内積を導出する。注意機構１１は、第２特徴配列の各要素に対する第１特徴配列の各要素の重みを全て含む行列を、重み行列として照合部１２に出力する。 <Second example of attention mechanism 11>
Details of the second example of processing by the attention mechanism 11 are as follows.
In a second example of the attention mechanism 11, the attention mechanism 11 derives the inner product of each element of the first feature array and each element of the second feature array. For each element of the second feature array, the attention mechanism 11 normalizes an array including all inner products of each element of the second feature array and each element of the first feature array using a Softmax function. The attention mechanism 11 derives a normalized inner product as a weight of each element of the first feature array with respect to each element of the second feature array. The attention mechanism 11 outputs a matrix including all the weights of each element of the first feature array for each element of the second feature array to the matching unit 12 as a weight matrix.

注意機構１１の第２例では、注意機構１１は、パラメータを持たない。符号化部１０と注意機構１１とを含む数理モデルを学習するためには、数理モデルがパラメータを持たなければ、数理モデルを学習することができない。従って、符号化部１０の第２例が使用される場合には、注意機構１１の第２例を使用することはできない。すなわち、パラメータを持たない符号化部１０が使用される場合には、パラメータを持たない注意機構１１を使用することはできない。 In the second example of the attention mechanism 11, the attention mechanism 11 has no parameters. In order to learn a mathematical model including the encoding unit 10 and the attention mechanism 11, the mathematical model cannot be learned unless the mathematical model has parameters. Therefore, when the second example of the encoder 10 is used, the second example of the caution mechanism 11 cannot be used. That is, when the encoding unit 10 without parameters is used, the attention mechanism 11 without parameters cannot be used.

＜照合部１２＞
照合部１２は、第１特徴配列を符号化部１０－１から取得する。照合部１２は、第２特徴配列を符号化部１０－２から取得する。照合部１２は、重み行列を注意機構１１から取得する。照合部１２は、第１特徴配列と第２特徴配列と重み行列とに基づいて、第１配列と第２配列との間の距離を導出する。照合部１２は、第１配列と第２配列との間の距離（距離情報）を、推論部１３に出力する。なお、照合部１２は、所定の外部装置（不図示）に距離（距離情報）を出力してもよい。 <Verification section 12>
The matching unit 12 obtains the first feature array from the encoding unit 10-1. The matching unit 12 obtains the second feature array from the encoding unit 10-2. The matching unit 12 acquires the weight matrix from the attention mechanism 11. The matching unit 12 derives the distance between the first array and the second array based on the first feature array, the second feature array, and the weight matrix. The matching unit 12 outputs the distance (distance information) between the first array and the second array to the inference unit 13. Note that the matching unit 12 may output the distance (distance information) to a predetermined external device (not shown).

＜照合部１２の第１例＞
照合部１２の第１例では、照合部１２は、重み行列を使用して、第１特徴配列の各要素に対して重み付けを実行する。照合部１２は、重み付けによって得られた新しい特徴配列を、変換特徴配列として導出する。照合部１２は、変換特徴配列と第２特徴配列との間の距離を、第１配列と第２配列との間の距離として導出する。 <First example of matching unit 12>
In the first example of the matching unit 12, the matching unit 12 weights each element of the first feature array using a weight matrix. The matching unit 12 derives the new feature array obtained by weighting as a transformed feature array. The matching unit 12 derives the distance between the converted feature array and the second feature array as the distance between the first array and the second array.

照合部１２の第１例の処理の詳細は、以下の通りである。
照合部１２の第１例では、照合部１２は、第２特徴配列の各要素について、第２特徴配列の各要素に対する第１特徴配列の各要素の重みを用いて、第１特徴配列の全ての要素の加重総和を導出する。これによって、第２特徴配列の各要素との対応関係にある第１特徴配列の要素が、加重総和として特定（抽出又は生成）される。すなわち、第２特徴配列の各要素との対応関係にある第１特徴配列の要素が整列される。従って、第１配列と第２配列との間に存在する局所的な変移と速度の変化とに関する非線形の時間変動が補償される。 Details of the first example of processing by the matching unit 12 are as follows.
In the first example of the matching unit 12, for each element of the second feature array, the matching unit 12 uses the weight of each element of the first feature array for each element of the second feature array to calculate all of the first feature array. Derive the weighted sum of the elements of . As a result, elements of the first feature array that have a corresponding relationship with each element of the second feature array are specified (extracted or generated) as a weighted sum. That is, the elements of the first feature array that have a corresponding relationship with each element of the second feature array are arranged. Therefore, non-linear temporal variations in local displacements and velocity changes that exist between the first and second arrays are compensated for.

照合部１２は、第２特徴配列の各要素（数値又は数値ベクトル）と、当該要素に対して導出された第１特徴配列の全ての要素の加重総和（数値又は数値ベクトル）との距離（例えば、ユークリッド距離）を、局所距離として導出する。第１配列と第２配列との間の時間変動が既に補償されているため、第２特徴配列の各要素と当該要素に対して導出された加重総和とが対応関係にある確率は高い。従って、第２特徴配列の各要素と当該要素に対して導出された加重総和との距離を照合部１２が導出することによって、第１特徴配列と第２特徴配列との間の局所的な差異をより正しく表す距離を照合部１２が導出することが可能になる。 The matching unit 12 calculates the distance (for example, , Euclidean distance) is derived as the local distance. Since the time variation between the first array and the second array has already been compensated for, there is a high probability that each element of the second feature array and the weighted sum derived for the element have a corresponding relationship. Therefore, by deriving the distance between each element of the second feature array and the weighted sum derived for the element, the matching unit 12 calculates the local difference between the first feature array and the second feature array. It becomes possible for the matching unit 12 to derive a distance that more accurately represents the distance.

照合部１２は、第２特徴配列の全ての要素に関する全ての局所距離の総和又は平均を導出する。照合部１２は、局所距離の総和又は平均を、第１配列と第２配列との間の距離として推論部１３に出力する。ここで、第１特徴配列は「Ｘ∈Ｒ^Ｗ×Ｋ」と表記され、第２特徴配列は「Ｙ∈Ｒ^Ｗ×Ｋ」と表記される。「Ｗ」は、特徴配列の長さを表す。「Ｋ」は、特徴配列の要素である数値又は数値ベクトルの次元数を表す。「Ｘ」のｊ番目の行ベクトル「ｘ_ｊ∈Ｒ^１×Ｋ」は、「Ｘ」のｊ番目の要素を表す。同様に、「Ｙ」のｉ番目の行ベクトル「ｙ_ｉ∈Ｒ^１×Ｋ」は、「Ｙ」のｉ番目の要素を表す。 The matching unit 12 derives the sum or average of all local distances regarding all elements of the second feature array. The matching unit 12 outputs the sum or average of the local distances to the inference unit 13 as the distance between the first array and the second array. Here, the first feature array is expressed as "XεR ^W×K ", and the second feature array is expressed as "YεR ^W×K ". "W" represents the length of the feature array. “K” represents the number of dimensions of a numerical value or a numerical vector that is an element of the feature array. The j-th row vector “x _j ∈R ^1×K ” of “X” represents the j-th element of “X”. Similarly, the i-th row vector “y _i ∈R ^1×K ” of “Y” represents the i-th element of “Y”.

重み行列は「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。「Ｐ」のｉ番目の行ベクトル「ｐ_ｉ∈Ｒ^１×Ｗ」は、「ｙ_ｉ」に対する「x_１，…，ｘ_Ｗ」の重み「ｐ_ｉ１，…，ｐ_ｉＷ」を含む。「ｐ_ｉ」のｊ番目の要素「ｐ_ｉｊ」は、「ｙ_ｉ」に対する「ｘ_ｊ」の重みを表す。 The weight matrix is expressed as "P∈R ^W×W ". The i-th row vector “p _i εR ^1×W ” of “P” includes weights “p _i1 ,…, p _iW ” of “x ₁ , . . . , x _W ” with respect to “y _i ”. The j-th element “p _ij ” of “p _i ” represents the weight of “x _j ” with respect to “y _i ”.

「ｐ_ｉ」がＳｏｆｔｍａｘ関数によって正規化されているので、「ｐ_ｉ１，…，ｐ_ｉＷ」の合計は１である。従って、第１配列と第２配列との間の距離は、式（１）のように表される。 Since “p _i ” has been normalized by the Softmax function, the sum of “p _i1 , . . . , p _iW ” is 1. Therefore, the distance between the first array and the second array is expressed as in equation (1).

ここで、「ｐ_ｉＸ」は、「ｙ_ｉ」に対する「ｘ_１，…，ｘ_Ｗ」の加重総和を表す。「||ｐ_ｉＸ－ｙ_ｉ||」は、「ｐ_ｉＸ」と「ｙ_ｉ」との間のユークリッド距離、すなわち局所距離を表す。 Here, "p _i X" represents the weighted sum of "x ₁ , . . . , x _W " with respect to "y _{i "} . “||p _i X−y _i ||” represents the Euclidean distance between “p _i X” and “y _i ”, that is, the local distance.

＜照合部１２の第２例＞
照合部１２の第２例では、照合部１２は、第１特徴配列の各要素と第２特徴配列の各要素との間の距離を導出する。照合部１２は、重み行列を使用して、距離に対して重み付けを実行する。照合部１２は、重みに基づいて、第１配列と第２配列との間の距離を導出する。 <Second example of matching unit 12>
In the second example of the matching unit 12, the matching unit 12 derives the distance between each element of the first feature array and each element of the second feature array. The matching unit 12 weights the distance using a weight matrix. The matching unit 12 derives the distance between the first array and the second array based on the weight.

照合部１２の第２例の処理の詳細は、以下の通りである。
照合部１２の第２例では、照合部１２は、第１特徴配列の各要素と第２特徴配列の各要素との間の距離（例えば、ユークリッド距離）を、局所距離として導出する。照合部１２は、重み行列を使用して、局所距離の加重総和又は加重平均を導出する。照合部１２は、第１配列と第２配列との間の距離として、局所距離の加重総和又は加重平均を推論部１３に出力する。 Details of the second example of processing by the matching unit 12 are as follows.
In the second example of the matching unit 12, the matching unit 12 derives the distance (for example, Euclidean distance) between each element of the first feature array and each element of the second feature array as a local distance. The matching unit 12 uses the weight matrix to derive a weighted sum or weighted average of local distances. The matching unit 12 outputs the weighted sum or weighted average of local distances to the inference unit 13 as the distance between the first array and the second array.

第２特徴配列の各要素に対する第１特徴配列の各要素の重みは、２個の要素が対応関係にある確率を表す。重みが大きいほど、２個の要素が対応関係にある確率が高い。照合部１２は、対応関係にある確率の高い２個の要素に対して、２個の要素の間の局所距離に対してより大きい重みを付与する。照合部１２は、対応関係にある確率の低い２個の要素に対して、２個の要素の間の局所距離に対してより小さい重みを付与する。 The weight of each element of the first feature array relative to each element of the second feature array represents the probability that the two elements have a corresponding relationship. The larger the weight, the higher the probability that two elements have a corresponding relationship. The matching unit 12 assigns a larger weight to the local distance between two elements that have a high probability of being in a corresponding relationship. The collation unit 12 assigns a smaller weight to the local distance between two elements that have a low probability of being in a correspondence relationship.

これによって、第１配列と第２配列との間に存在する局所的な変移と速度の変化とに関する非線形の時間変動が補償される。また、第１配列と第２配列との間の距離が、より正しく導出される。 This compensates for non-linear temporal variations in local displacements and velocity changes that exist between the first and second arrays. Furthermore, the distance between the first array and the second array can be derived more accurately.

照合部１２の第１例と同様に、照合部１２の第２例では、第１特徴配列は「Ｘ∈Ｒ^Ｗ×Ｋ」と表記され、第２特徴配列は「Ｙ∈Ｒ^Ｗ×Ｋ」と表記される。特徴配列の長さは「Ｗ」と表記される。「Ｘ」のｊ番目の要素が「ｘ_ｊ∈Ｒ^１×Ｋ」と表記され、「Ｙ」のｉ番目の要素は「ｙ_ｉ∈Ｒ^１×Ｋ」と表記される。重み行列は「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。「ｙ_ｉ」に対する「ｘ_ｊ」の重みは「ｐ_ｉｊ∈Ｐ」と表記される。従って、第１配列と第２配列との間の距離は、式（２）のように表される。 Similar to the first example of the matching unit 12, in the second example of the matching unit 12, the first feature array is expressed as “X∈R ^W×K ”, and the second feature array is expressed as “Y∈R ^W×K ”. It is written as The length of the feature array is denoted as "W". The j-th element of "X" is written as "x _j ∈R ^1xK ", and the i-th element of "Y" is written as "y _i ∈R ^1xK ". The weight matrix is expressed as "P∈R ^W×W ". The weight of “x _j ” with respect to “y _i ” is expressed as “p _ij ∈P”. Therefore, the distance between the first array and the second array is expressed as in equation (2).

ここで、「||ｘ_ｊ－ｙ_ｉ||」は、「ｘ_ｊ」と「ｙ_ｉ」との間のユークリッド距離、すなわち局所距離を表す。 Here, “||x _j −y _i ||” represents the Euclidean distance between “x _j ” and “y _i ”, that is, the local distance.

＜推論部１３＞
推論部１３は、第１配列と第２配列との間の距離として、局所距離の加重総和又は加重平均を、照合部１２から取得する。推論部１３は、第１配列と第２配列との間の距離に基づいて推論処理を実行する。推論部１３は、所定の外部装置（不図示）に推論結果を出力する。推論処理は、特定の推論処理に限定されない。例えば、複数人の手書き署名の筆者が推論される場合、筆者が未知である署名（第１配列）と筆者が既知である署名（第２配列）とが学習済の数理モデルに入力される。推論部１３は、照合部１２から取得された第１配列と第２配列との間の距離が最も短い第２配列の筆者ＩＤ（identification number）を、第１配列の筆者ＩＤ（推論結果）として出力する。各筆者について第２配列が複数存在する場合には、推論部１３は、距離の平均値が最も短い筆者ＩＤを、推論結果として出力してもよい。 <Inference part 13>
The inference unit 13 obtains the weighted sum or weighted average of local distances from the matching unit 12 as the distance between the first array and the second array. The inference unit 13 performs inference processing based on the distance between the first array and the second array. The inference unit 13 outputs the inference result to a predetermined external device (not shown). Inference processing is not limited to specific inference processing. For example, when the authors of handwritten signatures of multiple people are inferred, signatures whose authors are unknown (first array) and signatures whose authors are known (second array) are input to a trained mathematical model. The inference unit 13 uses the author ID (identification number) of the second array with the shortest distance between the first array and the second array obtained from the collation unit 12 as the author ID (inference result) of the first array. Output. If a plurality of second arrays exist for each author, the inference unit 13 may output the author ID with the shortest average distance value as the inference result.

次に、学習段階における、照合又は分類などの応用問題に適用される学習方法について説明する。 Next, a learning method applied to applied problems such as matching or classification in the learning stage will be explained.

図２は、第１実施形態における、学習装置２の構成例を示す図である。第１実施形態の学習段階では、照合又は分類などの応用問題に学習方法が適用される。学習装置２は、第１配列と第２配列とラベルとを、入力として取得する。学習装置２は、目的関数値と制約関数値とを導出する。学習装置２は、目的関数値と制約関数値とに基づいて、学習済の数理モデル（学習結果）を所定の外部装置（不図示）に出力する。また、学習装置２は、学習済の数理モデルを、実行段階よりも前に推論装置１に出力する。 FIG. 2 is a diagram showing a configuration example of the learning device 2 in the first embodiment. In the learning stage of the first embodiment, the learning method is applied to applied problems such as matching or classification. The learning device 2 obtains the first array, the second array, and the label as input. The learning device 2 derives an objective function value and a constraint function value. The learning device 2 outputs the learned mathematical model (learning result) to a predetermined external device (not shown) based on the objective function value and the constraint function value. Further, the learning device 2 outputs the learned mathematical model to the inference device 1 before the execution stage.

第１配列と第２配列とラベルとは、所定の目的（例えば、照合又は分類）のタスクを実行するための数理モデルを学習装置２が学習するために使用される学習データである。ラベルは、同じクラスに第１配列と第２配列とが属するか否かを表す。目的関数値と制約関数値とは、数理モデルを学習装置２が学習するために使用される。例えば、多数の学習データを使用して導出された目的関数値と制約関数値との加重総和又は加重平均が可能な限り小さくなるように（例えば、最小になるように）、学習装置２は数理モデルのパラメータを更新する。学習データの数が多いほど、数理モデルの性能が向上する。学習データの数は、例えば、２万から３万程度である。 The first array, the second array, and the label are learning data used by the learning device 2 to learn a mathematical model for executing a task for a predetermined purpose (for example, matching or classification). The label indicates whether the first array and the second array belong to the same class. The objective function value and the constraint function value are used by the learning device 2 to learn the mathematical model. For example, the learning device 2 uses a mathematical method to Update model parameters. The larger the number of training data, the better the performance of the mathematical model. The number of learning data is, for example, about 20,000 to 30,000.

学習装置２は、符号化部２０－１と、符号化部２０－２と、注意機構２１と、目的関数値導出部２２と、制約関数値導出部２３と、更新部２４とを備える。 The learning device 2 includes an encoding section 20-1, an encoding section 20-2, an attention mechanism 21, an objective function value deriving section 22, a constraint function value deriving section 23, and an updating section 24.

学習装置２の機能部の詳細を説明する。
＜符号化部２０＞
符号化部２０－１は、第１配列を入力として取得する。符号化部２０－２は、第２配列を入力として取得する。符号化部２０－１の動作は、符号化部２０－２の動作と同様である。学習段階における符号化部２０－１の処理は、実行段階における符号化部１０－１の処理と同じである。学習段階における符号化部２０－２の処理は、実行段階における符号化部１０－２の処理と同じである。 The details of the functional units of the learning device 2 will be explained.
<Encoding unit 20>
The encoding unit 20-1 obtains the first array as input. The encoding unit 20-2 obtains the second array as input. The operation of encoding section 20-1 is similar to that of encoding section 20-2. The processing of the encoding unit 20-1 in the learning stage is the same as the processing of the encoding unit 10-1 in the execution stage. The processing of the encoding unit 20-2 in the learning stage is the same as the processing of the encoding unit 10-2 in the execution stage.

符号化部２０－１は、第１特徴配列を注意機構２１と目的関数値導出部２２とに出力する。符号化部２０－２は、第２特徴配列を注意機構２１と目的関数値導出部２２とに出力する。以下では、符号化部２０－１と符号化部２０－２とに共通する事項については、符号の一部を省略して、「符号化部２０」と表記する。 The encoding unit 20-1 outputs the first feature array to the attention mechanism 21 and the objective function value deriving unit 22. The encoding unit 20-2 outputs the second feature array to the attention mechanism 21 and the objective function value deriving unit 22. Hereinafter, items common to the encoding unit 20-1 and the encoding unit 20-2 will be referred to as "encoding unit 20" with a part of the code omitted.

＜注意機構２１＞
注意機構２１は、第１特徴配列を符号化部２０－１から取得する。注意機構２１は、第２特徴配列を符号化部２０－２から取得する。学習段階における注意機構２１の処理は、実行段階における注意機構１１の処理と同じである。注意機構２１は、重み行列を目的関数値導出部２２と制約関数値導出部２３とに出力する。 <Caution mechanism 21>
The attention mechanism 21 acquires the first feature array from the encoding unit 20-1. The attention mechanism 21 acquires the second feature array from the encoding unit 20-2. The processing of the attention mechanism 21 in the learning stage is the same as the processing of the attention mechanism 11 in the execution stage. The attention mechanism 21 outputs the weight matrix to the objective function value deriving unit 22 and the constraint function value deriving unit 23.

＜目的関数値導出部２２＞
目的関数値導出部２２は、ラベルを入力として取得する。目的関数値導出部２２は、第１特徴配列と第２特徴配列とを、符号化部２０から取得する。目的関数値導出部２２は、重み行列を注意機構２１から取得する。目的関数値導出部２２は、第１特徴配列と第２特徴配列と重み行列とに基づいて、第１特徴配列と第２特徴配列との間の差分を導出する。目的関数値導出部２２は、導出された差分がラベルに関連付けられるように、目的関数値を導出する。 <Objective function value deriving unit 22>
The objective function value deriving unit 22 obtains the label as input. The objective function value deriving unit 22 obtains the first feature array and the second feature array from the encoding unit 20. The objective function value deriving unit 22 acquires the weight matrix from the attention mechanism 21. The objective function value deriving unit 22 derives the difference between the first feature array and the second feature array based on the first feature array, the second feature array, and the weight matrix. The objective function value deriving unit 22 derives an objective function value so that the derived difference is associated with a label.

同じクラスに第１配列と第２配列とが属する場合、差分が大きいほど、目的関数値が大きくなる。異なるクラスに第１配列と第２配列とが属する場合、差分が小さいほど、目的関数値が大きくなる。目的関数値導出部２２は、このような目的関数値を更新部２４に出力する。 When the first array and the second array belong to the same class, the larger the difference, the larger the objective function value. When the first array and the second array belong to different classes, the smaller the difference, the larger the objective function value. The objective function value deriving unit 22 outputs such an objective function value to the updating unit 24.

＜目的関数値導出部２２の第１例＞
実行段階において照合部１２の第１例が使用される場合、学習段階において、目的関数値導出部２２の第１例が使用されるほうが、目的関数値導出部２２の第２例が使用されるよりも望ましい。目的関数値導出部２２の第１例では、目的関数値導出部２２は、重み行列を使用して、第１特徴配列の各要素に対して重み付けを実行する。目的関数値導出部２２は、重み付けによって得られた新しい特徴配列を、変換特徴配列として導出する。目的関数値導出部２２は、変換特徴配列と第２特徴配列との間の差分を導出する。目的関数値導出部２２は、導出された差分がラベルに関連付けられるように、目的関数値を導出する。 <First example of objective function value deriving unit 22>
When the first example of the matching unit 12 is used in the execution stage, it is better to use the first example of the objective function value deriving unit 22 in the learning stage, and it is better to use the second example of the objective function value deriving unit 22 in the learning stage. more desirable than In the first example of the objective function value deriving unit 22, the objective function value deriving unit 22 weights each element of the first feature array using a weight matrix. The objective function value deriving unit 22 derives the new feature array obtained by weighting as a transformed feature array. The objective function value deriving unit 22 derives the difference between the transformed feature array and the second feature array. The objective function value deriving unit 22 derives an objective function value so that the derived difference is associated with a label.

目的関数値導出部２２の第１例の処理の詳細は、以下の通りである。
目的関数値導出部２２の第１例では、目的関数値導出部２２は、第２特徴配列の各要素について、第２特徴配列の各要素に対する第１特徴配列の各要素の重みを用いて、第１特徴配列の全ての要素の加重総和を導出する。 Details of the first example of processing by the objective function value deriving unit 22 are as follows.
In the first example of the objective function value deriving unit 22, the objective function value deriving unit 22 uses, for each element of the second feature array, the weight of each element of the first feature array with respect to each element of the second feature array, Derive a weighted sum of all elements of the first feature array.

これによって、第２特徴配列の各要素との対応関係にある第１特徴配列の要素が、加重総和として特定（抽出又は生成）される。すなわち、第２特徴配列の各要素との対応関係にある第１特徴配列の要素が整列される。従って、第１配列と第２配列との間に存在する局所的な変移と速度の変化とに関する非線形の時間変動が補償される。 As a result, elements of the first feature array that have a corresponding relationship with each element of the second feature array are specified (extracted or generated) as a weighted sum. That is, the elements of the first feature array that have a corresponding relationship with each element of the second feature array are arranged. Therefore, non-linear temporal variations in local displacements and velocity changes that exist between the first and second arrays are compensated for.

目的関数値導出部２２は、第１特徴配列の全ての要素の加重総和（数値又は数値ベクトル）と、第２特徴配列の各要素（数値又は数値ベクトル）との距離（例えば、ユークリッド距離）を、局所距離として導出する。目的関数値導出部２２は、局所距離を用いて、局所目的関数値を導出する。同じクラスに第１配列と第２配列とが属する場合、局所距離が長いほど、局所目的関数値が大きくなる。異なるクラスに第１配列と第２配列とが属する場合、局所距離が短いほど、局所目的関数値が大きくなる。 The objective function value deriving unit 22 calculates the distance (e.g., Euclidean distance) between the weighted sum of all elements (numeric value or numerical vector) of the first feature array and each element (numeric value or numerical vector) of the second feature array. , is derived as a local distance. The objective function value deriving unit 22 derives a local objective function value using the local distance. When the first array and the second array belong to the same class, the longer the local distance, the larger the local objective function value. When the first array and the second array belong to different classes, the shorter the local distance, the larger the local objective function value.

目的関数値導出部２２は、第２特徴配列の全ての要素に関する全ての局所目的関数値の総和又は平均を導出する。目的関数値導出部２２は、局所目的関数値の総和又は平均を、目的関数値として更新部２４に出力する。ここで、第１特徴配列は「Ｘ∈Ｒ^Ｗ×Ｋ」と表記される。第２特徴配列は「Ｙ∈Ｒ^Ｗ×Ｋ」と表記される。特徴配列の長さは「Ｗ」と表記される。「Ｘ」のｊ番目の要素は「ｘ_ｊ∈Ｒ^１×Ｋ」と表記される。「Ｙ」のｉ番目の要素は「ｙｉ∈Ｒ^１×Ｋ」と表記される。 The objective function value deriving unit 22 derives the sum or average of all local objective function values regarding all elements of the second feature array. The objective function value deriving unit 22 outputs the sum or average of the local objective function values to the updating unit 24 as an objective function value. Here, the first feature array is expressed as "X∈R ^W×K ". The second feature array is written as “YεR ^W×K ”. The length of the feature array is denoted as "W". The j-th element of "X" is written as "x _j ∈R ^1×K ". The i-th element of "Y" is written as "yi∈R ^1×K ".

重み行列は、「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。「Ｐ」のｉ番目の行ベクトル「ｐ_ｉ∈Ｒ^１×Ｗ」は、「ｙ_ｉ」に対する「ｘ_１，…，ｘ_Ｗ」の重み「ｐ_ｉ１，…，ｐ_ｉＷ」を含む。ラベルが「ｚ∈｛０,１｝」と表記される。同じクラスに第１配列と第２配列とが属する場合に、ラベルが「ｚ＝１」となる。異なるクラスに第１配列と第２配列とが属する場合に、ラベルが「ｚ＝０」となる。従って、目的関数値は、式（３）のように表される。 The weight matrix is expressed as "P∈R ^W×W ". The i-th row vector “p _i εR ^1×W ” of “P” includes weights “p _i1 ,…, p _iW ” of “x ₁ , . . . , x _W ” with respect to “y _i ”. The label is written as "z∈{0,1}". When the first array and the second array belong to the same class, the label becomes "z=1". When the first array and the second array belong to different classes, the label becomes "z=0". Therefore, the objective function value is expressed as in equation (3).

ここで、「ｐ_ｉＸ」は、「ｙ_ｉ」に対する「ｘ_１，…，ｘ_Ｗ」の加重総和を表す。「|｜ｐ_ｉＸ－ｙ_ｉ||」は、「ｐ_ｉＸ」と「ｙ_ｉ」との間のユークリッド距離、すなわち局所距離を表す。「τ」は、ハイパーパラメータであって、正の実数である。 Here, "p _i X" represents the weighted sum of "x ₁ , . . . , x _W " with respect to "y _{i "} . “||p _i X−y _i ||” represents the Euclidean distance between “p _i X” and “y _i ”, that is, the local distance. “τ” is a hyperparameter and is a positive real number.

学習段階では、更新部２４は、多数の学習データを使用して導出された目的関数値と制約関数値との加重総和又は加重平均が可能な限り小さくなるように（例えば、最小になるように）、符号化部２０と注意機構２１とを含む数理モデルのパラメータを更新する。目的関数値が最小化されることによって、同じクラスに第１配列と第２配列とが属する場合において数理モデルが局所距離をより小さく導出するようにパラメータが更新される。 In the learning stage, the updating unit 24 updates the update unit 24 so that the weighted sum or weighted average of the objective function value and the constraint function value derived using a large amount of learning data is as small as possible (for example, the weighted average is minimized). ), the parameters of the mathematical model including the encoding unit 20 and the attention mechanism 21 are updated. By minimizing the objective function value, the parameters are updated so that the mathematical model derives a smaller local distance when the first array and the second array belong to the same class.

同じクラスに第１配列と第２配列とが属する場合において第２特徴配列の各要素と類似する第１特徴配列の要素を数理モデルがより正しく特定できるように、目的関数値導出部２２の第１例の目的関数値に基づいて、パラメータが更新される。すなわち、同じクラスに第１配列と第２配列とが属する場合において第２特徴配列の各要素との対応関係にある第１特徴配列の要素を数理モデルがより正しく特定できるように、目的関数値導出部２２の第１例の目的関数値に基づいて、パラメータが更新される。 When the first array and the second array belong to the same class, the objective function value deriving unit 22 Parameters are updated based on one example objective function value. In other words, when the first array and the second array belong to the same class, the objective function value is The parameters are updated based on the first example objective function value of the derivation unit 22.

このように学習された数理モデルが使用されることによって、第１特徴配列の各要素と第２特徴配列の各要素との間の対応関係が、より正しく特定される。第１配列と第２配列との間の距離が、より正しく導出される。また、人手によって設計された特徴表現の使用に依存することなく、動的時間伸縮法と比べてより複雑な特徴表現を導出及び使用可能な配列整列が実現される。 By using the mathematical model learned in this way, the correspondence between each element of the first feature array and each element of the second feature array can be specified more accurately. The distance between the first array and the second array is more correctly derived. Furthermore, without relying on the use of manually designed feature representations, array alignment is realized that allows more complex feature representations to be derived and used than with dynamic time warping methods.

＜目的関数値導出部２２の第２例＞
実行段階において照合部１２の第２例が使用される場合、学習段階において、目的関数値導出部２２の第２例が使用されたほうが、目的関数値導出部２２の第１例が使用されるよりも望ましい。目的関数値導出部２２の第２例では、目的関数値導出部２２は、第１特徴配列の各要素と第２特徴配列の各要素との間の距離を導出する。目的関数値導出部２２は、重み行列を使用して、距離に対して重み付けを実行する。目的関数値導出部２２は、第１特徴配列と第２特徴配列との間の類似度を導出する。目的関数値導出部２２は、導出された類似度がラベルに関連付けられるように、目的関数値を導出する。 <Second example of objective function value deriving unit 22>
When the second example of the matching unit 12 is used in the execution stage, it is better to use the second example of the objective function value deriving unit 22 in the learning stage, and it is better to use the first example of the objective function value deriving unit 22 in the learning stage. more desirable than In the second example of the objective function value deriving unit 22, the objective function value deriving unit 22 derives the distance between each element of the first feature array and each element of the second feature array. The objective function value deriving unit 22 weights the distance using a weight matrix. The objective function value deriving unit 22 derives the degree of similarity between the first feature array and the second feature array. The objective function value deriving unit 22 derives an objective function value so that the derived degree of similarity is associated with a label.

目的関数値導出部２２の第２例の処理の詳細は、以下の通りである。
目的関数値導出部２２の第２例では、目的関数値導出部２２は、第１特徴配列の各要素と第２特徴配列の各要素の間の距離（例えば、ユークリッド距離）を、局所距離として導出する。目的関数値導出部２２は、重み行列を使用して、局所距離の加重総和又は加重平均を導出する。目的関数値導出部２２は、導出された加重総和又は加重平均がラベルに関連付けられるように、目的関数値を導出する。 Details of the second example of processing by the objective function value deriving unit 22 are as follows.
In the second example of the objective function value deriving unit 22, the objective function value deriving unit 22 calculates the distance (for example, Euclidean distance) between each element of the first feature array and each element of the second feature array as a local distance. Derive. The objective function value deriving unit 22 derives a weighted sum or weighted average of local distances using a weight matrix. The objective function value deriving unit 22 derives an objective function value such that the derived weighted sum or weighted average is associated with a label.

ここで、第１特徴配列が「Ｘ∈Ｒ^Ｗ×Ｋ」と表記される。第２特徴配列が「Ｙ∈Ｒ^Ｗ×Ｋ」と表記される。特徴配列の長さが「Ｗ」と表記される。「Ｘ」のｊ番目の要素が「ｘ_ｊ∈Ｒ^１×Ｋ」と表記される。「Ｙ」のｉ番目の要素が「ｙ_ｉ∈Ｒ^１×Ｋ」と表記される。重み行列が「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。「ｙ_ｉ」に対する「ｘ_ｊ」の重みが「ｐ_ｉｊ∈Ｐ」と表記される。ラベルが「ｚ∈｛０,１｝」と表記される。同じクラスに第１配列と第２配列とが属する場合に、ラベルが「ｚ＝１」となる。異なるクラスに第１配列と第２配列とが属する場合に、ラベルが「ｚ＝０」となる。従って、第１特徴配列と第２特徴配列との間の類似度は、式（４）のように表される。 Here, the first feature array is expressed as "XεR ^W×K ". The second feature array is expressed as “YεR ^W×K ”. The length of the feature array is expressed as "W". The j-th element of "X" is expressed as "x _j ∈R ^1×K ". The i-th element of "Y" is written as "y _i ∈R ^1×K ". The weight matrix is expressed as "P∈R ^W×W ". The weight of “x _j ” with respect to “y _i ” is expressed as “p _ij ∈P”. The label is written as "z∈{0,1}". When the first array and the second array belong to the same class, the label becomes "z=1". When the first array and the second array belong to different classes, the label becomes "z=0". Therefore, the degree of similarity between the first feature array and the second feature array is expressed as in equation (4).

ここで、「||ｘ_ｊ－ｙ_ｉ||」は、「ｘ_ｊ」と「ｙ_ｉ」との間のユークリッド距離、すなわち局所距離を表す。目的関数値は、式（５）のように表される。 Here, “||x _j −y _i ||” represents the Euclidean distance between “x _j ” and “y _i ”, that is, the local distance. The objective function value is expressed as in equation (5).

学習段階では、多数の学習データを使用して導出された目的関数値が可能な限り小さくなるように（例えば、最小になるように）、更新部２４は、符号化部２０と注意機構２１とを含む数理モデルのパラメータを更新する。目的関数値が最小化されることによって、同じクラスに第１配列と第２配列とが属する場合において数理モデルが局所距離をより小さく導出するようにパラメータが更新される。 In the learning stage, the updating unit 24 updates the encoding unit 20 and the attention mechanism 21 so that the objective function value derived using a large amount of learning data is as small as possible (for example, minimized). Update the parameters of the mathematical model, including By minimizing the objective function value, the parameters are updated so that the mathematical model derives a smaller local distance when the first array and the second array belong to the same class.

同じクラスに第１配列と第２配列とが属する場合、対応関係にある確率が高い２個の要素に対してより大きい重みが導出されるように、更新部２４は数理モデルのパラメータを更新する。同じクラスに第１配列と第２配列とが属する場合、対応関係にある確率が低い２個の要素に対してより小さい重みが導出されるように、更新部２４は数理モデルのパラメータを更新する。すなわち、第１特徴配列の各要素と第２特徴配列の各要素との間の対応関係がより正しく特定できるように、数理モデルのパラメータが更新される。 When the first array and the second array belong to the same class, the updating unit 24 updates the parameters of the mathematical model so that a larger weight is derived for two elements with a high probability of corresponding relationship. . When the first array and the second array belong to the same class, the updating unit 24 updates the parameters of the mathematical model so that a smaller weight is derived for two elements with a low probability of corresponding relationship. . That is, the parameters of the mathematical model are updated so that the correspondence between each element of the first feature array and each element of the second feature array can be specified more accurately.

このようにして学習された数理モデルが使用されることによって、第１特徴配列の各要素と第２特徴配列の各要素との間の対応関係がより正しく特定され、第１配列と第２配列との間の距離をより正しく導出することができる。また、人手によって設計された特徴表現の使用に依存することなく、動的時間伸縮法と比べてより複雑な特徴表現を導出及び使用可能な配列整列を実現することができる。 By using the mathematical model learned in this way, the correspondence between each element of the first feature array and each element of the second feature array can be more accurately specified, and It is possible to more accurately derive the distance between Moreover, without relying on the use of manually designed feature representations, it is possible to realize array alignment that allows more complex feature representations to be derived and used than with the dynamic time warping method.

＜制約関数値導出部２３＞
制約関数値導出部２３は、重み行列を注意機構２１から取得する。制約関数値導出部２３は、重み行列を使用して、制約関数値を導出する。制約関数値導出部２３は、単調性制約と連続性制約とのうちの少なくとも一方を満たす度合いが大きいほど制約関数値が小さくなるように、制約関数値を導出する。制約関数値導出部２３は、制約関数値を更新部２４に出力する。 <Constraint function value derivation unit 23>
The constraint function value deriving unit 23 acquires the weight matrix from the attention mechanism 21. The constraint function value deriving unit 23 derives a constraint function value using the weight matrix. The constraint function value deriving unit 23 derives the constraint function value such that the greater the degree to which at least one of the monotonicity constraint and the continuity constraint is satisfied, the smaller the constraint function value becomes. The constraint function value deriving unit 23 outputs the constraint function value to the updating unit 24.

符号化部２０と注意機構２１とを含む数理モデルは、制約関数値が最小化されることによって、第１特徴配列の各要素と第２特徴配列の各要素との間の対応関係が単調性制約と連続性制約とのうちの少なくとも一方を満たす重み行列を導出するように学習される。 The mathematical model including the encoding unit 20 and the attention mechanism 21 minimizes the constraint function value so that the correspondence between each element of the first feature array and each element of the second feature array is monotonic. The weight matrix is trained to derive a weight matrix that satisfies at least one of the constraints and the continuity constraint.

制約関数値導出部２３の処理の詳細は、以下の通りである。
重み行列は、第１特徴配列の各要素と第２特徴配列の各要素とが対応関係にある確率を表す行列であり、対応関係そのものではない。従って、単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いに関して、度合いを重み行列から直接評価することはできない。 Details of the processing of the constraint function value deriving unit 23 are as follows.
The weight matrix is a matrix representing the probability that each element of the first feature array and each element of the second feature array are in a correspondence relationship, and is not the correspondence relationship itself. Therefore, the degree to which at least one of the monotonicity constraint and the continuity constraint is satisfied cannot be directly evaluated from the weight matrix.

単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いを評価するために、対応関数のような形に重み行列を変換する必要がある。この対応関数は、例えば、第２特徴配列の各要素の添字を独立変数とし、第２特徴配列の各要素の添字との対応関係にある第１特徴配列の要素の添字を従属変数とした関数である。 In order to evaluate the degree to which at least one of the monotonicity constraint and the continuity constraint is satisfied, it is necessary to transform the weight matrix into a form like a correspondence function. This correspondence function is, for example, a function in which the independent variable is the subscript of each element of the second feature array, and the dependent variable is the subscript of the element of the first feature array that has a corresponding relationship with the subscript of each element of the second feature array. It is.

そこで、制約関数値導出部２３は、重み行列と所定の等差数列との積を、対応配列として導出する。等差数列とは、隣り合う要素ごとに共通の差を持つ数列である。 Therefore, the constraint function value deriving unit 23 derives the product of the weight matrix and a predetermined arithmetic progression as a corresponding array. An arithmetic sequence is a sequence in which adjacent elements have a common difference.

図３は、第１実施形態における、対応配列の例を示す図である。図３における上側には、単調性制約と連続性制約とが満たされた場合について、重み行列の例と、等差数列の例と、対応配列の例とが表されている。図３における下側には、単調性制約と連続性制約とが満たされていない場合について、重み行列の例と、等差数列の例と、対応配列の例とが表されている。すなわち、等号の左辺には、重み行列と等差数列「［１,２,３,４］^Ｔ」との積が表されている。重み行列の各行は正規化済みであり、重み行列の各行では要素の合計が１である。等号の右辺には、対応配列が表されている。 FIG. 3 is a diagram showing an example of a corresponding arrangement in the first embodiment. In the upper part of FIG. 3, examples of weight matrices, examples of arithmetic progressions, and examples of corresponding arrays are shown for the case where the monotonicity constraint and the continuity constraint are satisfied. The lower part of FIG. 3 shows an example of a weight matrix, an example of an arithmetic progression, and an example of a corresponding array for the case where the monotonicity constraint and the continuity constraint are not satisfied. That is, the left side of the equal sign represents the product of the weight matrix and the arithmetic progression "[1,2,3,4] ^T ". Each row of the weight matrix has been normalized, and the sum of the elements in each row of the weight matrix is 1. The corresponding array is shown on the right side of the equal sign.

等差数列を用いて導出された対応配列の添字は、第２特徴配列の各要素の添字（番号）を表す。対応配列の要素である数値は、第２特徴配列の各要素との対応関係にある第１特徴配列の要素の添字（番号）を表す。なお、対応配列の要素である数値は、第２特徴配列の各要素との対応関係にある第１特徴配列の要素の添字に比例する数値を表してもよい。 The subscript of the corresponding array derived using the arithmetic progression represents the subscript (number) of each element of the second feature array. Numerical values that are elements of the correspondence array represent subscripts (numbers) of elements of the first feature array that have a correspondence relationship with each element of the second feature array. Note that the numerical values that are the elements of the correspondence array may represent numerical values that are proportional to the subscripts of the elements of the first feature array that have a correspondence relationship with each element of the second feature array.

図３では、重み行列と等差数列とを使用して、対応配列が導出されている。例えば、図３における上側に表された例では、第２特徴配列の１番目の要素が第１特徴配列の１番目の要素との対応関係にあることを、対応配列が表している。第２特徴配列の２番目の要素が第１特徴配列の２番目の要素との対応関係にあることを、対応配列が表している。第２特徴配列の３番目の要素が第１特徴配列の２番目の要素との対応関係にあることを、対応配列が表している。 In FIG. 3, a corresponding array is derived using a weight matrix and an arithmetic progression. For example, in the example shown in the upper part of FIG. 3, the correspondence array indicates that the first element of the second feature array is in a correspondence relationship with the first element of the first feature array. The correspondence array indicates that the second element of the second feature array is in a correspondence relationship with the second element of the first feature array. The correspondence array indicates that the third element of the second feature array is in a correspondence relationship with the second element of the first feature array.

第２特徴配列の４番目の要素との対応関係にある第１特徴配列の要素の添字は、整数を用いて表されているのではなく、実数を用いて「３．６」と表されている。このような対応配列が使用されることによって、単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いを評価することが可能になる。 The subscript of the element of the first feature array that corresponds to the fourth element of the second feature array is not expressed using an integer, but is expressed as "3.6" using a real number. There is. By using such a correspondence array, it becomes possible to evaluate the degree to which at least one of the monotonicity constraint and the continuity constraint is satisfied.

対応配列を使用して導出される制約関数値は、単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いが大きいほど小さくなる必要がある。なお、勾配法を使用して学習装置２が数理モデルを学習するために、重み行列又は対応配列に対して制約関数値が微分可能であることが望ましい。また、より高速な学習を可能とするために、制約関数値の導出の並列化が容易であることが望ましい。 The constraint function value derived using the correspondence array needs to become smaller as the degree to which at least one of the monotonicity constraint and the continuity constraint is satisfied is greater. Note that in order for the learning device 2 to learn a mathematical model using the gradient method, it is desirable that the constraint function value is differentiable with respect to the weight matrix or the corresponding array. Furthermore, in order to enable faster learning, it is desirable that the derivation of constraint function values can be easily parallelized.

制約関数値導出部２３は、単調性制約関数値と連続性制約関数値とのうちの少なくとも一方を、制約関数値として導出する。 The constraint function value deriving unit 23 derives at least one of the monotonicity constraint function value and the continuity constraint function value as a constraint function value.

＜単調性制約関数値＞
制約関数値導出部２３は、対応配列の各要素について、対応配列の要素の１個前の要素と対応配列の要素との大きさを比較することによって、局所的な単調性制約の関数値（以下「局所単調性制約関数値」という。）を導出する。局所単調性制約関数値は、対応配列の要素の１個前の要素が対応配列の要素よりも大きい場合、これら２個の要素の差の絶対値となる。局所単調性制約関数値は、対応配列の要素の１個前の要素が対応配列の要素以下である場合、０となる。 <Monotonicity constraint function value>
The constraint function value deriving unit 23 calculates the local monotonicity constraint function value ( (hereinafter referred to as "local monotonicity constraint function value"). When the element immediately before the element of the corresponding array is larger than the element of the corresponding array, the local monotonicity constraint function value becomes the absolute value of the difference between these two elements. The local monotonicity constraint function value becomes 0 when the element immediately before the element of the corresponding array is less than or equal to the element of the corresponding array.

制約関数値導出部２３は、対応配列における全ての要素に関する全ての局所単調性制約関数値の総和又は平均を導出する。制約関数値導出部２３は、局所単調性制約関数値の総和又は平均を、単調性制約関数値として更新部２４に出力する。 The constraint function value deriving unit 23 derives the sum or average of all local monotonicity constraint function values regarding all elements in the corresponding array. The constraint function value deriving unit 23 outputs the sum or average of the local monotonicity constraint function values to the updating unit 24 as a monotonicity constraint function value.

ここで、重み行列は「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。特徴配列の長さは「Ｗ」と表記される。対応配列は「Ｆ∈Ｒ^Ｗ×１」と表記される。「Ｆ」のｉ番目の要素は「ｆ_ｉ」と表記される。従って、単調性制約関数値は、式（６）のように表される。 Here, the weight matrix is expressed as "P∈R ^W×W ". The length of the feature array is denoted as "W". The corresponding array is expressed as "F∈R ^W×1 ". The i-th element of "F" is written as "f _i ". Therefore, the monotonicity constraint function value is expressed as in equation (6).

ここで、「ｆ_０」は０である。畳み込みニューラルネットワークのライブラリを使用して式（６）が実装されることによって、単調性制約関数値がより高速に導出される。 Here, "f ₀ " is 0. Equation (6) is implemented using a library of convolutional neural networks to derive the monotonicity constraint function value faster.

図４は、第１実施形態における、単調性制約関数値の導出例を示す図である。図４における上側には、単調性制約と連続性制約とが満たされた場合について、単調性制約関数値の導出例が表されている。図４における下側には、単調性制約と連続性制約とが満たされていない場合について、単調性制約関数値の導出例が表されている。 FIG. 4 is a diagram showing an example of deriving the monotonicity constraint function value in the first embodiment. The upper part of FIG. 4 shows an example of deriving the monotonicity constraint function value in the case where the monotonicity constraint and the continuity constraint are satisfied. The lower part of FIG. 4 shows an example of deriving the monotonicity constraint function value in a case where the monotonicity constraint and the continuity constraint are not satisfied.

図４には、左側から順に、対応配列の例と、フィルタの例と、対応配列において隣り合う２個の要素の差と、局所単調性制約関数値の例と、単調性制約関数値の例とが表されている。図４において、丸印に「×」の記号は畳み込みを表す。「損失」は単調性制約関数値を表す。対応配列が単調性制約を満たす度合いが大きいほど、より小さい単調性制約関数値が導出される。対応配列が単調性制約を満たす度合いが小さいほど、より大きい単調性制約関数値が導出される。 FIG. 4 shows, from the left, an example of a corresponding array, an example of a filter, a difference between two adjacent elements in a corresponding array, an example of a local monotonicity constraint function value, and an example of a monotonicity constraint function value. is expressed. In FIG. 4, a circle with an "x" symbol represents convolution. "Loss" represents the monotonicity constraint function value. The greater the degree to which the corresponding arrays satisfy the monotonicity constraint, the smaller the monotonicity constraint function value is derived. The smaller the degree to which the corresponding arrays satisfy the monotonicity constraint, the larger the monotonicity constraint function value is derived.

図４において、対応配列とフィルタ「［１,－１］^Ｔ」との畳み込みの結果として、対応配列において隣り合う２個の要素の差が導出される。制約関数値導出部２３は、隣り合う２個の要素の差の配列に対して、「ＲｅＬＵ」を活性化関数として適用する。このようにして、局所単調性制約関数値が導出される。局所単調性制約関数値の配列における全ての要素の平均が導出されることによって、式（６）のような単調性制約関数値が容易に導出される。 In FIG. 4, the difference between two adjacent elements in the corresponding array is derived as a result of the convolution of the corresponding array and the filter "[1,-1] ^T ". The constraint function value deriving unit 23 applies "ReLU" as an activation function to the array of differences between two adjacent elements. In this way, the local monotonicity constraint function value is derived. By deriving the average of all elements in the array of local monotonicity constraint function values, the monotonicity constraint function value as shown in Equation (6) can be easily derived.

なお、フィルタは、対応配列において位置が互いに近い２個の要素の差を導出可能な任意のフィルタでよい。例えば、「［１,０,－１］^Ｔ」又は「［２,１,－１,－２］^Ｔ」等のフィルタが、「［１,－１］^Ｔ」の代わりに使用されてもよい。 Note that the filter may be any filter capable of deriving the difference between two elements whose positions are close to each other in the corresponding array. For example, a filter such as "[1,0,-1] ^T " or "[2,1,-1,-2] ^T " may be used instead of "[1,-1] ^T " .

＜連続性制約関数値＞
制約関数値導出部２３は、対応配列の各要素について、対応配列の要素の１個前の要素と対応配列の要素との差の絶対値を導出する。制約関数値導出部２３は、所定の正数を、導出された絶対値から減算する。この所定の正数は、ハイパーパラメータであり、例えば、１、２又は３などの正の整数である。「１．５」などの実数がハイパーパラメータとして使用されてもよい。 <Continuity constraint function value>
The constraint function value deriving unit 23 derives, for each element of the corresponding array, the absolute value of the difference between the element immediately before the element of the corresponding array and the element of the corresponding array. The constraint function value deriving unit 23 subtracts a predetermined positive number from the derived absolute value. This predetermined positive number is a hyperparameter and is, for example, a positive integer such as 1, 2, or 3. A real number such as "1.5" may be used as a hyperparameter.

制約関数値導出部２３は、減算結果の数値と０とのうちの最大値を、局所的な連続性制約の関数値（以下「局所連続性制約関数値」という。）として導出する。制約関数値導出部２３は、対応配列における全ての要素に関する全ての局所連続性制約関数値の総和又は平均を導出する。制約関数値導出部２３は、局所連続性制約関数値の総和又は平均を、連続性制約関数値として更新部２４に出力する。 The constraint function value deriving unit 23 derives the maximum value between the numerical value of the subtraction result and 0 as a function value of a local continuity constraint (hereinafter referred to as "local continuity constraint function value"). The constraint function value deriving unit 23 derives the sum or average of all local continuity constraint function values regarding all elements in the corresponding array. The constraint function value deriving unit 23 outputs the sum or average of the local continuity constraint function values to the updating unit 24 as a continuity constraint function value.

重み行列は「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。特徴配列の長さは「Ｗ」と表記される。対応配列は「Ｆ∈Ｒ^Ｗ×１」と表記される。「Ｆ」のｉ番目の要素は「ｆ_ｉ」と表記される。従って、連続性制約関数値は、式（７）のように表される。 The weight matrix is expressed as "P∈R ^W×W ". The length of the feature array is denoted as "W". The corresponding array is expressed as "F∈R ^W×1 ". The i-th element of "F" is written as "f _i ". Therefore, the continuity constraint function value is expressed as in equation (7).

ここで、「ｆ_０」は０である。畳み込みニューラルネットワークのライブラリを使用して式（７）が実装されることによって、連続性制約関数値がより高速に導出される。 Here, "f ₀ " is 0. Equation (7) is implemented using a library of convolutional neural networks to derive continuity constraint function values faster.

図５は、第１実施形態における、連続性制約関数値の導出例を示す図である。図５における上側には、単調性制約と連続性制約とが満たされた場合について、連続性制約関数値の導出例が表されている。図５における下側には、単調性制約と連続性制約とが満たされていない場合について、連続性制約関数値の導出例が表されている。 FIG. 5 is a diagram showing an example of deriving continuity constraint function values in the first embodiment. The upper part of FIG. 5 shows an example of deriving the continuity constraint function value in the case where the monotonicity constraint and the continuity constraint are satisfied. The lower part of FIG. 5 shows an example of deriving the continuity constraint function value in a case where the monotonicity constraint and the continuity constraint are not satisfied.

図４には、左側から順に、対応配列の例と、フィルタの例と、所定の正数の例と、対応配列において隣り合う２個の要素の差の絶対値から所定の正数が減算された結果と、局所連続性制約関数値の例と、連続性制約関数値の例とが表されている。図５において、丸印に「×」の記号は畳み込みを表す。「損失」は、連続性制約関数値を表す。 FIG. 4 shows, from the left, an example of a corresponding array, an example of a filter, an example of a predetermined positive number, and an example of a predetermined positive number subtracted from the absolute value of the difference between two adjacent elements in the corresponding array. The results, examples of local continuity constraint function values, and examples of continuity constraint function values are shown. In FIG. 5, a circle with an "x" symbol represents convolution. "Loss" represents the continuity constraint function value.

図５において、対応配列とフィルタ「［－１,１］^Ｔ」との畳み込みによって、対応配列において隣り合う２個の要素の差が導出される。制約関数値導出部２３は、隣り合う２個の要素の差の配列における各要素の絶対値を導出する。制約関数値導出部２３は、所定の正数（図５では、１）を、導出された絶対値から減算する。制約関数値導出部２３は、減算結果の配列に対して、「ＲｅＬＵ」を活性化関数として適用する。このようにして、局所連続性制約関数値が導出される。局所連続性制約関数値の配列における全ての要素の平均が導出されることによって、式（７）のような連続性制約関数値が容易に導出される。 In FIG. 5, the difference between two adjacent elements in the corresponding array is derived by convolving the corresponding array with the filter “[−1,1] ^T ”. The constraint function value deriving unit 23 derives the absolute value of each element in the array of differences between two adjacent elements. The constraint function value deriving unit 23 subtracts a predetermined positive number (1 in FIG. 5) from the derived absolute value. The constraint function value deriving unit 23 applies "ReLU" as an activation function to the subtraction result array. In this way, the local continuity constraint function value is derived. By deriving the average of all elements in the array of local continuity constraint function values, the continuity constraint function value as shown in equation (7) can be easily derived.

図５に表されているように、対応配列が連続性制約を満たす度合いが大きいほど、より小さい連続性制約関数値が導出される。対応配列が連続性制約を満たす度合いが小さいほど、より大きい連続性制約関数値が導出される。 As shown in FIG. 5, the greater the degree to which the corresponding array satisfies the continuity constraint, the smaller the continuity constraint function value is derived. The smaller the degree to which the corresponding array satisfies the continuity constraint, the larger the continuity constraint function value is derived.

＜更新部２４＞
更新部２４は、目的関数値を目的関数値導出部２２から取得する。更新部２４は、制約関数値を制約関数値導出部２３から取得する。更新部２４は、目的関数値と制約関数値とに基づいて学習処理を実行する。学習処理は、特定の学習処理に限定されない。更新部２４は、制約関数値と目的関数値との加重総和又は加重平均が可能な限り小さくなるように（例えば、最小になるように）、符号化部２０と注意機構２１とを含む数理モデルのパラメータを更新する。更新部２４は、所定の外部装置（不図示）に学習済の数理モデル（学習結果）を出力する。 <Update section 24>
The updating unit 24 obtains the objective function value from the objective function value deriving unit 22. The updating unit 24 obtains the constraint function value from the constraint function value deriving unit 23. The updating unit 24 executes learning processing based on the objective function value and the constraint function value. The learning process is not limited to a specific learning process. The updating unit 24 updates the mathematical model including the encoding unit 20 and the attention mechanism 21 so that the weighted sum or weighted average of the constraint function value and the objective function value is as small as possible (for example, minimized). Update the parameters of The updating unit 24 outputs the trained mathematical model (learning result) to a predetermined external device (not shown).

以上のように、学習段階において、注意機構２１は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、第１特徴配列と第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する。目的関数値導出部２２は、同じクラスに第１配列と第２配列とが属するか否かを表すラベルと第１特徴配列と第２特徴配列とに応じた値である目的関数値を、重み行列に基づいて導出する。制約関数値導出部２３は、単調性制約と連続性制約とのうちの少なくとも一方を表す制約関数値を、重み行列に基づいて導出する。更新部２４は、目的関数値と制約関数値とに基づいて所定の学習処理を実行することによって学習結果を生成する。目的関数値は、例えば、第１特徴配列と第２特徴配列との間の差分又は類似度と、ラベルとに応じた値である。更新部２４は、数理モデルを更新する。 As described above, in the learning stage, the attention mechanism 21 uses the first feature array based on the first array and the second feature array based on the second array to A weight matrix is generated, which is a matrix representing the probability that elements have a corresponding relationship. The objective function value deriving unit 22 calculates the objective function value, which is a value corresponding to the label indicating whether the first array and the second array belong to the same class, and the first feature array and the second feature array, using weights. Derive based on the matrix. The constraint function value deriving unit 23 derives a constraint function value representing at least one of the monotonicity constraint and the continuity constraint based on the weight matrix. The updating unit 24 generates a learning result by executing a predetermined learning process based on the objective function value and the constraint function value. The objective function value is, for example, a value according to the difference or similarity between the first feature array and the second feature array and the label. The updating unit 24 updates the mathematical model.

学習段階において更新された数理モデルは、実行段階において推論処理の実行に使用される。実行段階において、注意機構１１は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、第１特徴配列と第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する。照合部１２は、第１特徴配列と第２特徴配列と重み行列とに基づいて、第１配列と第２配列との間の距離を導出する。推論部１３は、距離に基づいて所定の推論処理を実行することによって推論結果を生成する。 The mathematical model updated in the learning stage is used to execute inference processing in the execution stage. In the execution stage, the attention mechanism 11 uses the first feature array based on the first array and the second feature array based on the second array to establish a correspondence relationship between each element of the first feature array and the second feature array. Generate a weight matrix, which is a matrix that represents a certain probability. The matching unit 12 derives the distance between the first array and the second array based on the first feature array, the second feature array, and the weight matrix. The inference unit 13 generates an inference result by executing a predetermined inference process based on the distance.

このように、単調性制約と連続性制約とのうちの少なくとも一方を表す制約関数値を用いて学習された数理モデルを用いて符号化部が特徴配列を導出することによって、有効に働く重み行列を注意機構が特徴配列に基づいて生成する。 In this way, the encoding unit derives the feature array using the mathematical model learned using the constraint function value representing at least one of the monotonicity constraint and the continuity constraint, thereby creating an effective weight matrix. is generated by the attention mechanism based on the feature array.

これによって、人手によって設計された特徴表現の使用に依存することなく、より複雑な特徴表現を導出及び使用可能であると同時に、単調で連続的な対応関数を導出及び使用可能な配列整列を実現することが可能である。人手によって設計された特徴表現の使用に依存することなく、より複雑な特徴表現を実現することが可能である。また、推論精度の向上と学習時間の短縮とを両立させることが可能である。 This makes it possible to derive and use more complex feature representations without relying on the use of manually designed feature representations, and at the same time realizes array alignment that allows for the derivation and use of monotonic and continuous correspondence functions. It is possible to do so. It is possible to realize more complex feature representations without relying on the use of manually designed feature representations. Furthermore, it is possible to both improve inference accuracy and shorten learning time.

学習装置２、学習方法及びプログラムによれば、注意機構１１が単調で連続的な対応関数を導出できるように、更新部２４が数理モデルを学習する際に数理モデルの学習を誘導（ガイド）することが可能になる。学習済の数理モデルにおける注意機構１１が使用されることによって、照合又は分類などの応用問題において、配列間の距離又は類似度を正しく導出することが可能である。異なるクラスに属する配列であるか否かを正しく推論することが可能である。また、注意機構１１が十分な性能を提供できるようになるまでの学習時間（数理モデルの学習に必要とされる時間）を短縮することが可能になる。 According to the learning device 2, the learning method, and the program, the updating unit 24 guides learning of the mathematical model when learning the mathematical model so that the attention mechanism 11 can derive a monotonous and continuous correspondence function. becomes possible. By using the attention mechanism 11 in the learned mathematical model, it is possible to correctly derive the distance or similarity between sequences in applied problems such as matching or classification. It is possible to correctly infer whether the arrays belong to different classes. Furthermore, it is possible to shorten the learning time (the time required for learning the mathematical model) until the attention mechanism 11 can provide sufficient performance.

（第２実施形態）
第２実施形態は、音声等の連続データの合成又は変換などの応用問題に学習方法及び推論方法を適用するための実施形態である。音声合成とは、人間の音声を人工的に作り出すことであり、例えば、音声を文章から合成することである。音声変換とは、個人の音声を別の個人又はキャラクタの音声に変換することである。 (Second embodiment)
The second embodiment is an embodiment for applying a learning method and an inference method to applied problems such as synthesis or conversion of continuous data such as speech. Speech synthesis refers to artificially creating human speech, for example, synthesizing speech from text. Voice conversion is the conversion of an individual's voice into the voice of another individual or character.

なお、連続データとなるように不連続データ（例えば、手書き署名）が予め補正されるのであれば、第２実施形態における学習方法及び推論方法を不連続データに対して使うことは可能である。 Note that if the discontinuous data (for example, a handwritten signature) is corrected in advance so as to become continuous data, the learning method and inference method in the second embodiment can be used for the discontinuous data.

第２実施形態は、学習段階と実行段階とに分けられる。学習段階では、学習装置は、学習データを使用して、多数のパラメータを持つ数理モデルを学習する。学習装置は、数理モデルのパラメータの数値を決定する。実行段階では推論装置は、学習済の数理モデルを使用して、所定の目的（例えば、音声合成、音声変換）のタスクを実行する。 The second embodiment is divided into a learning stage and an execution stage. In the learning phase, the learning device uses training data to learn a mathematical model with a large number of parameters. The learning device determines numerical values of parameters of the mathematical model. In the execution stage, the inference device uses the learned mathematical model to execute a task for a predetermined purpose (eg, speech synthesis, speech conversion).

まず、実行段階における、音声合成又は音声変換などの応用問題に適用される推論方法について説明する。 First, an inference method applied to applied problems such as speech synthesis or speech conversion in the execution stage will be explained.

図６は、第２実施形態における、推論装置３の構成例を示す図である。音声合成では、第１配列の要素は、例えば、文章の各単語の特徴を表す数値ベクトルである。文章の各単語の特徴は、例えば、単語のＯｎｅ－Ｈｏｔベクトルである。第２配列の要素は、例えば、音声の各時刻又は各フレームの特徴を表す数値ベクトルである。 FIG. 6 is a diagram showing a configuration example of the inference device 3 in the second embodiment. In speech synthesis, the elements of the first array are, for example, numerical vectors representing the characteristics of each word in a sentence. The feature of each word in a sentence is, for example, the word's One-Hot vector. The elements of the second array are, for example, numerical vectors representing the characteristics of each time or frame of audio.

音声変換では、第１配列の要素は、例えば、音声の各時刻又は各フレームの特徴を表す数値ベクトルである。音声の各時刻又は各フレームの特徴は、例えば、所定の抽出方法（参考文献１：Masanori Morise, Fumiya Yokomori, Kenji Ozawa, "WORLD: A vocoder-based high-quality speech synthesis system for real-time applications, " IEICE Trans. Inf. Syst. 99-D (7): 1877-1884 (2016)）を用いて抽出された、メルケプストラム係数と対数Ｆ０パターンとを含む多次元ベクトルである。第２配列の要素は、例えば、第１配列の音声の個人とは別の個人又はキャラクタの音声における、各時刻又は各フレームの特徴を表す数値ベクトルである。 In audio conversion, the elements of the first array are, for example, numerical vectors representing the characteristics of each time or frame of audio. The characteristics of each time or frame of audio can be extracted using, for example, a predetermined extraction method (Reference 1: Masanori Morise, Fumiya Yokomori, Kenji Ozawa, "WORLD: A vocoder-based high-quality speech synthesis system for real-time applications, "IEICE Trans. Inf. Syst. 99-D (7): 1877-1884 (2016)) is a multidimensional vector containing mel cepstral coefficients and a logarithm F0 pattern. The elements of the second array are, for example, numerical vectors representing the characteristics of each time or each frame in the voice of an individual or character different from the individual in the voice of the first array.

推論装置３は、第１符号化部３０と、第２符号化部３１と、注意機構３２と、復号化部３３と、推論部３４とを備える。 The inference device 3 includes a first encoding section 30, a second encoding section 31, a caution mechanism 32, a decoding section 33, and an inference section 34.

第１符号化部３０は、第１配列を入力として取得する。第１符号化部３０は、第１配列に対する符号化処理を例えば１回だけ実行することによって。第１特徴配列を導出する。第１符号化部３０は、第１特徴配列を注意機構３２と復号化部３３とに出力する。 The first encoding unit 30 obtains the first array as input. The first encoding unit 30 performs encoding processing on the first array, for example, only once. Derive a first feature array. The first encoding unit 30 outputs the first feature array to the attention mechanism 32 and the decoding unit 33.

第２符号化部３１は、１個前の時刻における第２配列の要素を、復号化部３３から取得する。第２符号化部３１は、１個前の時刻における第２配列の要素に対する符号化処理を実行することによって、１個前の時刻における第２特徴配列の要素を導出する。第２符号化部３１は、１個前の時刻における第２特徴配列の要素を、注意機構３２に出力する。 The second encoding unit 31 acquires the elements of the second array at the previous time from the decoding unit 33. The second encoding unit 31 derives the elements of the second feature array at the previous time by performing encoding processing on the elements of the second array at the previous time. The second encoding unit 31 outputs the elements of the second feature array at the previous time to the attention mechanism 32.

注意機構３２は、第１特徴配列を、第１符号化部３０から取得する。注意機構３２は、１個前の時刻における第２特徴配列の要素を、第２符号化部３１から取得する。注意機構３２は、１個前の時刻における第２特徴配列の要素と第１特徴配列の各要素とを使用して、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを導出する。注意機構３２は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを、重み行列として復号化部３３に出力する。 The attention mechanism 32 acquires the first feature array from the first encoding unit 30. The attention mechanism 32 acquires the elements of the second feature array at the previous time from the second encoding unit 31. The attention mechanism 32 uses the elements of the second feature array at the previous time and each element of the first feature array to determine the weight of each element of the first feature array with respect to the element of the second array at the current time. Derive. The attention mechanism 32 outputs the weight of each element of the first feature array with respect to the element of the second array at the current time to the decoding unit 33 as a weight matrix.

復号化部３３は、第１特徴配列を第１符号化部３０から取得する。復号化部３３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを、重み行列として注意機構３２から取得する。復号化部３３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みと、第１特徴配列とに基づいて、現在の時刻における第２配列の要素を導出する。復号化部３３は、現在の時刻における第２配列の要素を、第２符号化部３１と推論部３４とに出力する。なお、復号化部３３は、現在の時刻における第２配列の要素を、所定の外部装置（不図示）に出力してもよい。 The decoding unit 33 acquires the first feature array from the first encoding unit 30. The decoding unit 33 obtains the weight of each element of the first feature array with respect to the element of the second array at the current time from the attention mechanism 32 as a weight matrix. The decoding unit 33 derives the elements of the second array at the current time based on the weight of each element of the first feature array with respect to the elements of the second array at the current time and the first feature array. The decoding unit 33 outputs the elements of the second array at the current time to the second encoding unit 31 and the inference unit 34. Note that the decoding unit 33 may output the elements of the second array at the current time to a predetermined external device (not shown).

第２符号化部３１は、現在の時刻における第２配列の要素を、復号化部３３から取得する。第２符号化部３１は、現在の時刻における第２配列の要素を使用して、現在の時刻における第２特徴配列の要素を導出する。第２符号化部３１は、現在の時刻における第２特徴配列の要素を、注意機構３２に出力する。 The second encoding unit 31 obtains the elements of the second array at the current time from the decoding unit 33. The second encoding unit 31 uses the elements of the second array at the current time to derive the elements of the second feature array at the current time. The second encoding unit 31 outputs the elements of the second feature array at the current time to the attention mechanism 32.

このように、信号が第２符号化部３１から出発し、注意機構３２と復号化部３３とを信号が経由し、第２符号化部３１に信号が再び戻るという循環が、推論装置３に存在する。最初の時刻において第２配列の要素が初期化されてから、初期化された第２配列の要素が第２符号化部３１に入力され、最後の時刻において第２配列の要素が復号化部３３から出力されるまでの単位時間ごとに、この循環における推論処理が繰り返される。 In this way, a cycle in which the signal departs from the second encoding unit 31, passes through the attention mechanism 32 and the decoding unit 33, and returns to the second encoding unit 31 again occurs in the inference device 3. exist. After the elements of the second array are initialized at the first time, the initialized elements of the second array are input to the second encoding unit 31, and at the last time, the elements of the second array are input to the decoding unit 33. The inference process in this cycle is repeated every unit time from to output.

注意機構３２は、第２配列の各要素に対する第１特徴配列の各要素の重みを全て含む行列を、重み行列として復号化部３３に出力する。また、復号化部３３は、全ての時刻における第２配列の各要素を、第２配列として推論部３４に出力する。 The attention mechanism 32 outputs a matrix including all the weights of each element of the first feature array for each element of the second array to the decoding unit 33 as a weight matrix. Further, the decoding unit 33 outputs each element of the second array at all times to the inference unit 34 as a second array.

推論部３４は、第２配列を、復号化部３３から取得する。推論部３４は、第２配列に基づいて推論結果を生成する。音声合成又は音声変換等の応用問題では、推論結果は、音声信号である。推論部３４は、所定の外部装置（不図示）に推論結果を出力する。 The inference unit 34 obtains the second array from the decoding unit 33. The inference unit 34 generates an inference result based on the second array. In applied problems such as speech synthesis or speech conversion, the inference result is an audio signal. The inference unit 34 outputs the inference result to a predetermined external device (not shown).

推論装置３の機能部の詳細を説明する。
＜第１符号化部３０＞
第１符号化部３０は、第１配列を入力として取得する。第１符号化部３０は、第１配列を使用して、数値又は数値ベクトルを要素とする配列を、第１特徴配列として導出する。例えば、第１符号化部３０は、参考文献２（Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly,Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ-Skerrv Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu, "Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions," In ICASSP, pp.4779-4783, 2018.）の人工ニューラルネットワークを使用して、第１特徴配列を第１配列から導出する。第１符号化部３０は、人工ニューラルネットワークのパラメータを、学習段階において学習データを使用して決定する。第１符号化部３０は、第１特徴配列を注意機構３２と復号化部３３に出力する。 The details of the functional units of the inference device 3 will be explained.
<First encoding unit 30>
The first encoding unit 30 obtains the first array as input. The first encoding unit 30 uses the first array to derive an array whose elements are numerical values or numerical vectors as a first feature array. For example, the first encoding unit 30 may be configured as described in Reference 2 (Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ-Skerrv Ryan, Rif A Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu, "Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions," In ICASSP, pp.4779-4783, 2018. Derived from one array. The first encoding unit 30 determines parameters of the artificial neural network using learning data in a learning stage. The first encoding unit 30 outputs the first feature array to the attention mechanism 32 and the decoding unit 33.

第１符号化部３０の処理の詳細は、以下の通りである。
第１配列は、例えば、「1×Ｎ×５１２」のテンソルである。「Ｎ」は配列の長さを表す。「５１２」は、配列の要素の次元数の例である。第１符号化部３０は、第１配列を人工ニューラルネットワークに入力する。 Details of the processing of the first encoding unit 30 are as follows.
The first array is, for example, a “1×N×512” tensor. "N" represents the length of the array. “512” is an example of the number of dimensions of the elements of the array. The first encoding unit 30 inputs the first array to the artificial neural network.

人工ニューラルネットワークは、例えば、３個の「１×５×５１２」の畳み込み層と、１個の双方向長短期記憶（Bidirectional Long Short-Term Memory : BiLSTM）（以下「双方向ＬＳＴＭ」という。）とを備える。各畳み込み層の直後にバッチ正規化層が備えられる。バッチ正規化層の直後において、活性化関数としてＲｅＬＵ層が備えられる。双方向ＬＳＴＭは、合計５１２個の隠れユニットを有する。第１符号化部３０の双方向ＬＳＴＭは、数値又は数値ベクトルを要素とする配列を第１特徴配列として、注意機構３２と復号化部３３とに出力する。 The artificial neural network, for example, has three 1×5×512 convolutional layers and one Bidirectional Long Short-Term Memory (BiLSTM) (hereinafter referred to as ``bidirectional LSTM''). Equipped with. A batch normalization layer is provided immediately after each convolutional layer. Immediately after the batch normalization layer, a ReLU layer is provided as an activation function. The bidirectional LSTM has a total of 512 hidden units. The bidirectional LSTM of the first encoding unit 30 outputs an array having numerical values or numerical vectors as elements to the attention mechanism 32 and the decoding unit 33 as a first feature array.

＜第２符号化部３１＞
第２符号化部３１は、第２配列を復号化部３３から取得する。１個前の時刻における第２配列の要素を、復号化部３３から取得する。第２符号化部３１は、１個前の時刻における第２配列の要素を使用して、１個前の時刻における第２特徴配列の要素として、数値又は数値ベクトルを導出する。数値又は数値ベクトルの導出には、例えば、上述の参考文献２の人工ニューラルネットワークを使用することができる。人工ニューラルネットワークのパラメータは、学習段階で学習データを使用して決定される。第２符号化部３１は、第２特徴配列を注意機構３２に出力する。 <Second encoding unit 31>
The second encoding unit 31 obtains the second array from the decoding unit 33. The elements of the second array at the previous time are acquired from the decoding unit 33. The second encoding unit 31 uses the elements of the second array at the previous time to derive a numerical value or a numerical value vector as an element of the second feature array at the previous time. For deriving a numerical value or a numerical vector, for example, the artificial neural network of Reference 2 mentioned above can be used. The parameters of the artificial neural network are determined using training data during the training phase. The second encoding unit 31 outputs the second feature array to the attention mechanism 32.

第２符号化部３１の処理の詳細は、以下の通りである。
１個前の時刻における第２配列の各要素は、例えば、５１２次元の数値ベクトルである。第２符号化部３１は、１個前の時刻における第２配列の各要素を、人工ニューラルネットワークに入力する。この人工ニューラルネットワークは、例えば、２個の全結合層を備える。各全結合層は２５６個の隠れユニットを有する。各全結合層の直後には、活性化関数としてＲｅＬＵ層が備えられる。最後の全結合層は、１個前の時刻における第２特徴配列の要素として、数値又は数値ベクトルを注意機構３２に出力する。 Details of the processing by the second encoding unit 31 are as follows.
Each element of the second array at the previous time is, for example, a 512-dimensional numerical vector. The second encoding unit 31 inputs each element of the second array at the previous time to the artificial neural network. This artificial neural network includes, for example, two fully connected layers. Each fully connected layer has 256 hidden units. Immediately after each fully connected layer, a ReLU layer is provided as an activation function. The final fully connected layer outputs a numerical value or a numerical vector to the attention mechanism 32 as an element of the second feature array at the previous time.

＜注意機構３２＞
注意機構３２は、第１特徴配列を第１符号化部３０から取得する。注意機構３２は、第２特徴配列を第２符号化部３１から取得する。注意機構３２は、１個前の時刻における第２特徴配列の要素と、第１特徴配列の各要素とを使用して、現在の時刻に対する第２配列の要素に対する第１特徴配列の各要素の重みを導出する。注意機構３２として、例えば、人工ニューラルネットワークが使用されてもよいし、人工ニューラルネットワーク以外の数理モデル（例えば、線形回帰モデル、多項式回帰モデル、ロジスティック回帰モデル）が使用されてもよい。人工ニューラルネットワークのパラメータは、学習段階において、学習データを使用して決定される。注意機構３２は、重み行列を復号化部３３に出力する。 <Caution mechanism 32>
The attention mechanism 32 obtains the first feature array from the first encoding unit 30. The attention mechanism 32 acquires the second feature array from the second encoding unit 31. The attention mechanism 32 uses the elements of the second feature array at the previous time and each element of the first feature array to determine the relationship between each element of the first feature array and the elements of the second array at the current time. Derive the weights. As the attention mechanism 32, for example, an artificial neural network may be used, or a mathematical model other than the artificial neural network (for example, a linear regression model, a polynomial regression model, a logistic regression model) may be used. The parameters of the artificial neural network are determined using training data during the training phase. The attention mechanism 32 outputs the weight matrix to the decoding unit 33.

注意機構３２の処理の詳細は、以下の通りである。
注意機構３２は、１個前の時刻における第２特徴配列の要素である数値ベクトルと、第１特徴配列の各要素である数値ベクトルとを、数値ベクトルの次元方向に沿って連結する。注意機構３２は、連結された数値ベクトルを、人工ニューラルネットワークに入力する。人工ニューラルネットワークは、例えば、３個の全結合層を備える。３個の全結合層において、１個目の全結合層が６４個の隠れユニットを有し、２個目の全結合層が１６個の隠れユニットを有し、３個目の全結合層が１個の隠れユニットを有する。１個目の全結合層の直後において、活性化関数としてＲｅＬＵ層が備えられる。２個目の全結合層の直後において、活性化関数としてＲｅＬＵ層が備えられる。３個目の全結合層は、１個の実数を出力する。 Details of the processing by the attention mechanism 32 are as follows.
The attention mechanism 32 connects the numerical vector that is an element of the second feature array at the previous time and the numerical vector that is each element of the first feature array along the dimension direction of the numerical vector. The attention mechanism 32 inputs the concatenated numerical vectors into the artificial neural network. The artificial neural network comprises, for example, three fully connected layers. Among the three fully connected layers, the first fully connected layer has 64 hidden units, the second fully connected layer has 16 hidden units, and the third fully connected layer has 64 hidden units. Contains 1 hidden unit. Immediately after the first fully connected layer, a ReLU layer is provided as an activation function. Immediately after the second fully connected layer, a ReLU layer is provided as an activation function. The third fully connected layer outputs one real number.

注意機構３２は、１個前の時刻における第２特徴配列の要素と第１特徴配列の各要素とを使用して導出された実数を全て含む配列を、Ｓｏｆｔｍａｘ関数によって正規化する。この導出された実数を全て含む配列とは、第１特徴配列の各要素に対して出力された実数を配列としてまとめたものである。導出された実数を全て含む配列は、第１特徴配列の要素数と同じ数の実数を含む。注意機構３２は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みとして、正規化された実数を導出する。注意機構３２は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを全て含む行列を、重み行列として復号化部３３に出力する。 The attention mechanism 32 normalizes an array including all real numbers derived using the elements of the second feature array and each element of the first feature array at the previous time using the Softmax function. The array including all the derived real numbers is an array of real numbers output for each element of the first feature array. The array containing all the derived real numbers contains the same number of real numbers as the number of elements of the first feature array. The attention mechanism 32 derives a normalized real number as the weight of each element of the first feature array relative to the element of the second array at the current time. The attention mechanism 32 outputs a matrix including all the weights of each element of the first feature array to the elements of the second array at the current time to the decoding unit 33 as a weight matrix.

＜復号化部３３＞
復号化部３３は、第１特徴配列を第１符号化部３０から取得する。復号化部３３は、重み行列を注意機構３２から取得する。復号化部３３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを使用して、第１特徴配列の各要素に対して重み付けを実行する。復号化部３３は、重み付けによって得られた数値又は数値ベクトルを使用して、現在の時刻における第２配列の要素を導出する。例えば、復号化部３３は、上述の参考文献２の人工ニューラルネットワークを使用して、現在の時刻における第２配列の要素を導出する。復号化部３３は、人工ニューラルネットワークのパラメータを、学習段階において学習データを使用して決定する。復号化部３３は、第２配列を推論部３４に出力する。 <Decoding unit 33>
The decoding unit 33 acquires the first feature array from the first encoding unit 30. The decoding unit 33 acquires the weight matrix from the attention mechanism 32. The decoding unit 33 weights each element of the first feature array using the weight of each element of the first feature array with respect to the element of the second array at the current time. The decoding unit 33 uses the numerical value or numerical vector obtained by weighting to derive the elements of the second array at the current time. For example, the decoding unit 33 uses the artificial neural network of Reference 2 mentioned above to derive the elements of the second array at the current time. The decoding unit 33 determines parameters of the artificial neural network using learning data in the learning stage. The decoding unit 33 outputs the second array to the inference unit 34.

復号化部３３の処理の詳細は、以下の通りである。
復号化部３３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを用いて、第１特徴配列の全ての要素の加重総和を導出する。これによって、現在の時刻における第２配列の要素との対応関係にある第１特徴配列の要素が、加重総和として特定（抽出又は生成）される。すなわち、現在の時刻における第２配列の要素との対応関係にある第１特徴配列の要素が整列される。従って、第１配列と第２配列との間に存在する局所的な変移と速度の変化とに関する非線形の時間変動が補償される。 Details of the processing of the decoding unit 33 are as follows.
The decoding unit 33 derives a weighted sum of all elements of the first feature array using the weight of each element of the first feature array with respect to the element of the second array at the current time. As a result, the elements of the first feature array that have a corresponding relationship with the elements of the second array at the current time are specified (extracted or generated) as a weighted sum. That is, the elements of the first feature array that have a corresponding relationship with the elements of the second array at the current time are arranged. Therefore, non-linear temporal variations in local displacements and velocity changes that exist between the first and second arrays are compensated for.

ここで、第１特徴配列は「Ｘ∈Ｒ^Ｗ×Ｋ」と表記される。重み行列は「Ｐ∈Ｒ^Ｗ×Ｗ」と表記される。現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを全て含む行ベクトルは「ｐ_ｉ∈Ｒ^１×Ｗ」と表記される。現在の時刻は「ｉ」と表記される。現在の時刻における第２配列の要素に対する第１特徴配列の全ての要素の加重総和は、「ｐ_ｉＸ」と表記される。 Here, the first feature array is expressed as "X∈R ^W×K ". The weight matrix is expressed as "P∈R ^W×W ". A row vector including all the weights of each element of the first feature array with respect to the elements of the second array at the current time is expressed as "p _i ∈R ^1×W ". The current time is written as "i". The weighted sum of all elements of the first feature array relative to the elements of the second array at the current time is denoted as "p _i X."

加重総和は、例えば、１２８次元の数値ベクトルである。復号化部３３は、この数値ベクトルを、人工ニューラルネットワークに入力する。人工ニューラルネットワークは、例えば、２個の双方向ＬＳＴＭと１個の全結合層とを備える。各双方向ＬＳＴＭは、１０２４個の隠れユニットを有する。全結合層は、数値又は数値ベクトルを、現在の時刻における第２配列の要素として推論部３４に出力する。 The weighted sum is, for example, a 128-dimensional numerical vector. The decoding unit 33 inputs this numerical vector to the artificial neural network. The artificial neural network comprises, for example, two bidirectional LSTMs and one fully connected layer. Each bidirectional LSTM has 1024 hidden units. The fully connected layer outputs the numerical value or numerical vector to the inference unit 34 as an element of the second array at the current time.

なお、復号化部３３は、第２符号化部３１から出力された第２特徴配列と、第１特徴配列と、重み行列とを使用して、第２配列を導出してもよい。この場合、復号化部３３は、加重総和である数値ベクトルと、１個前の時刻における第２特徴配列の要素である数値ベクトルとを、数値ベクトルの次元方向に沿って連結する。復号化部３３は、連結された数値ベクトルを、人工ニューラルネットワークに入力する。 Note that the decoding unit 33 may derive the second array using the second feature array, the first feature array, and the weight matrix output from the second encoding unit 31. In this case, the decoding unit 33 connects the numerical vector that is the weighted sum and the numerical vector that is the element of the second feature array at the previous time along the dimension direction of the numerical vector. The decoding unit 33 inputs the concatenated numerical vectors to the artificial neural network.

＜推論部３４＞
推論部３４は、第２配列を復号化部３３から取得する。推論部３４は、第２配列に基づいて推論結果を生成する。音声合成又は音声変換等の応用問題では、推論結果は、音声信号である。推論部３４は、例えば、所定の生成方法（参考文献３：Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W. Senior, Koray Kavukcuoglu, "WaveNet: A generative model for raw audio, " SSW 2016: 125.）を用いて、第２配列に基づいて音声信号を生成する。推論部３４は、所定の外部装置（不図示）に推論結果を出力する。 <Inference section 34>
The inference unit 34 obtains the second array from the decoding unit 33. The inference unit 34 generates an inference result based on the second array. In applied problems such as speech synthesis or speech conversion, the inference result is an audio signal. The inference unit 34 uses, for example, a predetermined generation method (Reference 3: Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W. Senior, Koray Kavukcuoglu, "WaveNet: A generative model for raw audio, "SSW 2016: 125.) is used to generate an audio signal based on the second array. The inference unit 34 outputs the inference result to a predetermined external device (not shown).

次に、学習段階における、音声合成又は音声変換などの応用問題に適用される学習方法について説明する。 Next, a learning method applied to applied problems such as speech synthesis or speech conversion in the learning stage will be explained.

図７は、第２実施形態における、学習装置４の構成例を示す図である。第２実施形態の学習段階では、音声合成又は音声変換などの応用問題に学習方法が適用される。学習装置４は、第１配列と正解配列とを入力として取得する。学習装置４は、目的関数値と制約関数値とを導出する。学習装置４は、目的関数値と制約関数値とに基づいて数理モデルを学習し、学習済の数理モデル（学習結果）を、所定の外部装置（不図示）に出力する。また、学習装置４は、学習済の数理モデルを、実行段階よりも前に推論装置３に出力する。 FIG. 7 is a diagram showing a configuration example of the learning device 4 in the second embodiment. In the learning stage of the second embodiment, the learning method is applied to applied problems such as speech synthesis or speech conversion. The learning device 4 receives the first array and the correct array as input. The learning device 4 derives an objective function value and a constraint function value. The learning device 4 learns a mathematical model based on the objective function value and the constraint function value, and outputs the learned mathematical model (learning result) to a predetermined external device (not shown). Further, the learning device 4 outputs the learned mathematical model to the inference device 3 before the execution stage.

第１配列と正解配列とは、所定の目的（例えば、音声合成又は音声変換）のタスクを実行するための数理モデルを学習するために使用される学習データである。目的関数値と制約関数値とは、数理モデルを学習装置４が学習するために使用される。例えば、多数の学習データを使用して導出された目的関数値と制約関数値との加重総和又は加重平均が可能な限り小さくなるように（例えば、最小になるように）、学習装置４は、数理モデルのパラメータを更新する。学習データの数が多いほど、数理モデルの性能が向上する。学習データの数は、例えば、２万から３万程度である。 The first array and the correct array are training data used to learn a mathematical model for performing a task for a predetermined purpose (eg, speech synthesis or speech conversion). The objective function value and the constraint function value are used by the learning device 4 to learn the mathematical model. For example, the learning device 4 performs the following steps so that the weighted sum or weighted average of the objective function value and the constraint function value derived using a large amount of learning data is as small as possible (for example, minimized). Update the parameters of the mathematical model. The larger the number of training data, the better the performance of the mathematical model. The number of learning data is, for example, about 20,000 to 30,000.

学習装置４は、第１符号化部４０と、第２符号化部４１と、注意機構４２と、復号化部４３と、目的関数値導出部４４と、制約関数値導出部４５と、更新部４６とを備える。 The learning device 4 includes a first encoding section 40, a second encoding section 41, a caution mechanism 42, a decoding section 43, an objective function value deriving section 44, a constraint function value deriving section 45, and an updating section. 46.

第１符号化部４０は、第１配列を入力として取得する。第１符号化部４０は、第１配列に対する符号化処理を例えば１回だけ実行することによって。第１特徴配列を導出する。第１符号化部４０は、第１特徴配列を注意機構４２と復号化部４３とに出力する。 The first encoding unit 40 obtains the first array as input. The first encoding unit 40 performs encoding processing on the first array, for example, only once. Derive a first feature array. The first encoding unit 40 outputs the first feature array to the attention mechanism 42 and the decoding unit 43.

第２符号化部４１は、１個前の時刻における第２配列の要素を、復号化部４３から取得する。第２符号化部４１は、１個前の時刻における第２配列の要素に対する符号化処理を実行することによって、１個前の時刻における第２特徴配列の要素を導出する。 The second encoding unit 41 acquires the elements of the second array at the previous time from the decoding unit 43. The second encoding unit 41 derives the elements of the second feature array at the previous time by performing encoding processing on the elements of the second array at the previous time.

注意機構４２は、第１特徴配列を、第１符号化部４０から取得する。注意機構４２は、１個前の時刻における第２特徴配列の要素を、第２符号化部４１から取得する。注意機構４２は、１個前の時刻における第２特徴配列の要素と第１特徴配列の各要素とを使用して、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを導出する。注意機構３２は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを、重み行列として復号化部４３に出力する。 The attention mechanism 42 obtains the first feature array from the first encoding unit 40 . The attention mechanism 42 acquires the elements of the second feature array at the previous time from the second encoding unit 41. The attention mechanism 42 uses the elements of the second feature array at the previous time and each element of the first feature array to determine the weight of each element of the first feature array with respect to the element of the second array at the current time. Derive. The attention mechanism 32 outputs the weight of each element of the first feature array with respect to the element of the second array at the current time to the decoding unit 43 as a weight matrix.

復号化部４３は、第１特徴配列を第１符号化部４０から取得する。復号化部４３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みを、重み行列として注意機構４２から取得する。復号化部４３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みと、第１特徴配列とに基づいて、現在の時刻における第２配列の要素を導出する。復号化部４３は、現在の時刻における第２配列の要素を、第２符号化部４１と目的関数値導出部４４とに出力する。 The decoding unit 43 acquires the first feature array from the first encoding unit 40. The decoding unit 43 acquires the weight of each element of the first feature array with respect to the element of the second array at the current time from the attention mechanism 42 as a weight matrix. The decoding unit 43 derives the elements of the second array at the current time based on the weight of each element of the first feature array with respect to the elements of the second array at the current time and the first feature array. The decoding unit 43 outputs the elements of the second array at the current time to the second encoding unit 41 and the objective function value deriving unit 44.

第２符号化部４１は、現在の時刻における第２配列の要素を、復号化部４３から取得する。第２符号化部４１は、現在の時刻における第２配列の要素を使用して、現在の時刻における第２特徴配列の要素を導出する。第２符号化部４１は、現在の時刻における第２特徴配列の要素を、注意機構４２に出力する。 The second encoding unit 41 obtains the elements of the second array at the current time from the decoding unit 43. The second encoding unit 41 uses the elements of the second array at the current time to derive the elements of the second feature array at the current time. The second encoding unit 41 outputs the elements of the second feature array at the current time to the attention mechanism 42.

このように、信号が第２符号化部４１から出発し、注意機構４２と復号化部４３とを信号が経由し、第２符号化部４１に信号が再び戻るという循環が、学習装置４に存在する。この循環では、最初の時刻において第２配列の要素が初期化されてから、初期化された第２配列の要素が第２符号化部４１に入力され、最後の時刻において第２配列の要素が復号化部４３から出力されるまでの単位時間ごとに、学習処理が繰り返される。 In this way, a cycle in which the signal departs from the second encoding section 41, passes through the attention mechanism 42 and the decoding section 43, and returns to the second encoding section 41 again occurs in the learning device 4. exist. In this cycle, the elements of the second array are initialized at the first time, the initialized elements of the second array are input to the second encoding unit 41, and the elements of the second array are input at the final time. The learning process is repeated every unit time until output from the decoding unit 43.

注意機構４２は、第２配列の各要素に対する第１特徴配列の各要素の重みを全て含む行列を、重み行列として復号化部４３に出力する。また、復号化部４３は、全ての時刻における第２配列の各要素を、第２配列として第２符号化部４１と目的関数値導出部４４とに出力する。 The attention mechanism 42 outputs a matrix including all the weights of each element of the first feature array for each element of the second array to the decoding unit 43 as a weight matrix. Further, the decoding unit 43 outputs each element of the second array at every time as a second array to the second encoding unit 41 and the objective function value deriving unit 44.

目的関数値導出部４４は、正解配列を入力として取得する。目的関数値導出部４４は、第２配列を復号化部４３から取得する。目的関数値導出部４４は、正解配列と第２配列とに基づいて、目的関数値を導出する。目的関数値導出部４４が目的関数値を導出する処理は、例えば１回だけ実行される。目的関数値導出部４４は、目的関数値を更新部４６に出力する。 The objective function value deriving unit 44 obtains the correct answer array as input. The objective function value deriving unit 44 obtains the second array from the decoding unit 43. The objective function value deriving unit 44 derives an objective function value based on the correct array and the second array. The process in which the objective function value derivation unit 44 derives the objective function value is executed, for example, only once. The objective function value deriving unit 44 outputs the objective function value to the updating unit 46.

制約関数値導出部４５は、重み行列を注意機構４２から取得する。制約関数値導出部４５は、重み行列を使用して、制約関数値を導出する。制約関数値導出部４５が制約関数値を導出する処理は、例えば１回だけ実行される。制約関数値導出部４５は、制約関数値を更新部４６に出力する。 The constraint function value deriving unit 45 obtains the weight matrix from the attention mechanism 42 . The constraint function value deriving unit 45 derives a constraint function value using the weight matrix. The process of deriving a constraint function value by the constraint function value deriving unit 45 is executed, for example, only once. The constraint function value deriving unit 45 outputs the constraint function value to the updating unit 46.

更新部４６は、目的関数値を目的関数値導出部４４から取得する。更新部４６は、制約関数値を制約関数値導出部４５から取得する。更新部４６は、目的関数値と制約関数値とに基づいて学習処理を実行する。更新部４６は、制約関数値と目的関数値との加重総和又は加重平均が可能な限り小さくなるように（例えば、最小になるように）、第１符号化部４０と第２符号化部４１と注意機構４２と復号化部４３とを含む数理モデルを更新する。更新部４６は、所定の外部装置（不図示）に、学習済の数理モデル（学習結果）を出力する。 The updating unit 46 obtains the objective function value from the objective function value deriving unit 44. The updating unit 46 obtains the constraint function value from the constraint function value deriving unit 45. The updating unit 46 executes learning processing based on the objective function value and the constraint function value. The updating unit 46 updates the first encoding unit 40 and the second encoding unit 41 so that the weighted sum or weighted average of the constraint function value and the objective function value is as small as possible (for example, minimized). The mathematical model including the attention mechanism 42 and the decoding unit 43 is updated. The updating unit 46 outputs the learned mathematical model (learning result) to a predetermined external device (not shown).

学習装置４の機能部の詳細を説明する。
＜第１符号化部４０＞
第１符号化部４０は、第１配列を入力として取得する。学習段階における第１符号化部４０が実行する処理は、実行段階における第１符号化部３０が実行する処理と同じである。第１符号化部４０は、第１特徴配列を注意機構４２と復号化部４３に出力する。 The details of the functional units of the learning device 4 will be explained.
<First encoding unit 40>
The first encoding unit 40 obtains the first array as input. The processing executed by the first encoding unit 40 in the learning stage is the same as the processing executed by the first encoding unit 30 in the execution stage. The first encoding unit 40 outputs the first feature array to the attention mechanism 42 and the decoding unit 43.

＜第２符号化部４１＞
第２符号化部４１は、第２配列を復号化部４３から取得し、第２特徴配列を注意機構４２に出力する。学習段階における第２符号化部４１の処理は、実行段階における第２符号化部３１の処理と同じである。なお、学習段階における第２符号化部４１は、第２配列を入力として使用する代わりに、正解配列を入力として使用してもよい。この場合、第２配列に対して実行される全ての処理は、第２配列の代わりに使用される正解配列に対して実行される。 <Second encoding unit 41>
The second encoding unit 41 acquires the second array from the decoding unit 43 and outputs the second feature array to the attention mechanism 42. The processing of the second encoding unit 41 in the learning stage is the same as the processing of the second encoding unit 31 in the execution stage. Note that the second encoding unit 41 in the learning stage may use the correct array as an input instead of using the second array as an input. In this case, all processing performed on the second array is performed on the ground truth array used in place of the second array.

＜注意機構４２＞
注意機構４２は、第１特徴配列を第１符号化部４０から取得する。注意機構４２は、第２特徴配列を第２符号化部４１から取得する。学習段階における注意機構４２の処理は、実行段階における注意機構３２の処理と同じである。注意機構４２は、重み行列を復号化部４３と制約関数値導出部４５とに出力する。 <Caution mechanism 42>
The attention mechanism 42 obtains the first feature array from the first encoding unit 40 . The attention mechanism 42 acquires the second feature array from the second encoding unit 41. The processing of the attention mechanism 42 in the learning stage is the same as the processing of the attention mechanism 32 in the execution stage. The attention mechanism 42 outputs the weight matrix to the decoding section 43 and the constraint function value deriving section 45.

＜復号化部４３＞
復号化部４３は、第１特徴配列を第１符号化部４０から取得する。復号化部４３は、重み行列を注意機構４２から取得する。学習段階における復号化部４３の処理は、実行段階における復号化部３３の処理と同じである。復号化部４３は、第２配列を目的関数値導出部４４に出力する。 <Decoding unit 43>
The decoding unit 43 acquires the first feature array from the first encoding unit 40. The decoding unit 43 acquires the weight matrix from the attention mechanism 42. The processing of the decoding unit 43 in the learning stage is the same as the processing of the decoding unit 33 in the execution stage. The decoding unit 43 outputs the second array to the objective function value deriving unit 44.

＜目的関数値導出部４４＞
目的関数値導出部４４は、正解配列を入力として取得する。目的関数値導出部４４は、第２配列を復号化部４３から取得する。目的関数値導出部４４は、正解配列と第２配列との間の差分を導出する。目的関数値導出部４４は、導出された差分が大きいほど値が大きくなるような目的関数値を導出する。目的関数値導出部４４は、目的関数値を更新部４６に出力する。 <Objective function value deriving unit 44>
The objective function value deriving unit 44 obtains the correct answer array as input. The objective function value deriving unit 44 obtains the second array from the decoding unit 43. The objective function value deriving unit 44 derives the difference between the correct array and the second array. The objective function value deriving unit 44 derives an objective function value such that the larger the derived difference, the larger the value. The objective function value deriving unit 44 outputs the objective function value to the updating unit 46.

目的関数値導出部４４の処理の詳細は、以下の通りである。
目的関数値導出部４４は、例えば、正解配列と第２配列との間の残差平方和（類似度）を、目的関数値として導出する。ここで、正解配列は「Ｚ^＊」と表記される。第２配列は「Ｚ」と表記される。従って、目的関数値は、式（８）のように表される。 Details of the processing of the objective function value deriving unit 44 are as follows.
The objective function value deriving unit 44 derives, for example, the sum of squared residuals (similarity) between the correct array and the second array as the objective function value. Here, the correct array is written as "Z ^* ". The second array is written as "Z". Therefore, the objective function value is expressed as in equation (8).

ここで、「||・||」は、Ｌ２ノルムを表す。 Here, “||·||” represents the L2 norm.

＜制約関数値導出部４５＞
制約関数値導出部４５は、重み行列を注意機構４２から取得する。制約関数値導出部４５は、重み行列を使用して、制約関数値を導出する。ここで、単調性制約と連続性制約とのうちの少なくとも一方を満たす度合いが大きいほど、制約関数値が小さくなるように、制約関数値は導出される。制約関数値導出部４５は、制約関数値を更新部４６に出力する。 <Constraint function value deriving unit 45>
The constraint function value deriving unit 45 obtains the weight matrix from the attention mechanism 42 . The constraint function value deriving unit 45 derives a constraint function value using the weight matrix. Here, the constraint function value is derived such that the greater the degree to which at least one of the monotonicity constraint and the continuity constraint is satisfied, the smaller the constraint function value is. The constraint function value deriving unit 45 outputs the constraint function value to the updating unit 46.

制約関数値が最小化されることによって、第１特徴配列の各要素と第２配列の各要素との間の対応関係が単調性制約と連続性制約とのうちの少なくとも一方を満たすという重み行列を導出するように数理モデルは学習される。この数理モデルは、第１符号化部４０と、第２符号化部４１と、注意機構４２と、復号化部４３とを含む。 A weight matrix in which the correspondence between each element of the first feature array and each element of the second array satisfies at least one of the monotonicity constraint and the continuity constraint by minimizing the constraint function value. The mathematical model is trained to derive . This mathematical model includes a first encoding section 40, a second encoding section 41, a caution mechanism 42, and a decoding section 43.

制約関数値導出部４５の処理の詳細は、以下の通りである。
重み行列とは、第１特徴配列の各要素と第２配列の各要素とが対応関係にある確率を表す行列である。重み行列は、対応関係そのものではない。従って、単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いを、重み行列からは直接評価することができない。 Details of the processing by the constraint function value deriving unit 45 are as follows.
The weight matrix is a matrix representing the probability that each element of the first feature array and each element of the second array have a corresponding relationship. The weight matrix is not a correspondence relationship itself. Therefore, the degree to which at least one of the monotonicity constraint and the continuity constraint is satisfied cannot be directly evaluated from the weight matrix.

単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いを評価することができるようになるためには、重み行列が変換される必要がある。例えば、第２配列の各要素の時刻を独立変数とし、第２配列の各要素の時刻との対応関係にある第１特徴配列の要素の添字を従属変数とした関数（対応関数）のような形に、重み行列が変換される必要がある。このために、制約関数値導出部４５は、重み行列と所定の等差数列との積を、対応配列として導出する。等差数列とは、各項（各要素）がその直前の項（要素）に一定数（公差）を加えて得られる数列である。 In order to be able to evaluate the degree to which at least one of the monotonicity constraint and the continuity constraint is satisfied, the weight matrix needs to be transformed. For example, a function (correspondence function) where the time of each element of the second array is the independent variable and the subscript of the element of the first feature array that has a correspondence relationship with the time of each element of the second array is the dependent variable. The weight matrix needs to be transformed into For this purpose, the constraint function value deriving unit 45 derives the product of the weight matrix and a predetermined arithmetic progression as a corresponding array. An arithmetic progression is a progression in which each term (each element) is obtained by adding a constant number (tolerance) to the term (element) immediately before it.

例えば図３では、「［１,２,３,４］^Ｔ」が等差数列である。等差数列を用いて導出された対応配列において、対応配列の添字は第２配列の各要素の時刻を表す。対応配列の要素である数値は、第２配列の各要素との対応関係にある第１特徴配列の要素の添字又は添字に比例する数値を表す。図３における上側に表された例では、第２配列の１番目の要素が、第１特徴配列の１番目の要素との対応関係にある。第２配列の２番目の要素が第１特徴配列の２番目の要素との対応関係にある。第２配列の３番目の要素が、第１特徴配列の２番目の要素との対応関係にあることを、対応配列が表している。第２配列の４番目の要素との対応関係にある第１特徴配列の要素の添字は、整数を用いて表されているのではなく、実数を用いて「３．６」と表されている。このような対応配列が使用されることによって、単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いを評価することが可能になる。 For example, in FIG. 3, "[1,2,3,4] ^T " is an arithmetic progression. In the correspondence array derived using the arithmetic progression, the subscript of the correspondence array represents the time of each element of the second array. The numerical values that are the elements of the correspondence array represent the subscripts of the elements of the first feature array that are in correspondence with the respective elements of the second array, or the numerical values that are proportional to the subscripts. In the example shown in the upper part of FIG. 3, the first element of the second array has a corresponding relationship with the first element of the first feature array. The second element of the second array corresponds to the second element of the first feature array. The correspondence array indicates that the third element of the second array is in a correspondence relationship with the second element of the first feature array. The subscript of the element of the first feature array that corresponds to the fourth element of the second array is not expressed using an integer, but is expressed as "3.6" using a real number. . By using such a correspondence array, it becomes possible to evaluate the degree to which at least one of the monotonicity constraint and the continuity constraint is satisfied.

対応配列を使用して導出される制約関数値は、単調性制約と連続性制約とのうちの少なくとも一方が満たされる度合いが大きいほど値が小さくなる必要がある。なお、勾配法を使用して学習装置４が数理モデルを学習するために、重み行列又は対応配列に対して制約関数値が微分可能であることが望ましい。また、より高速な学習を可能にするために、制約関数値の導出の並列化が容易であることが望ましい。 The constraint function value derived using the correspondence array needs to be smaller as the degree to which at least one of the monotonicity constraint and the continuity constraint is satisfied is greater. Note that in order for the learning device 4 to learn a mathematical model using the gradient method, it is desirable that the constraint function value is differentiable with respect to the weight matrix or the corresponding array. Furthermore, in order to enable faster learning, it is desirable that the derivation of constraint function values can be easily parallelized.

制約関数値導出部４５は、単調性制約関数値と連続性制約関数値とのうちの少なくとも一方を、制約関数値として導出する。 The constraint function value deriving unit 45 derives at least one of the monotonicity constraint function value and the continuity constraint function value as a constraint function value.

＜単調性制約関数値＞
第２実施形態における単調性制約関数値に関する説明は、第１実施形態における単調性制約関数値に関する説明と同様である。 <Monotonicity constraint function value>
The explanation regarding the monotonicity constraint function value in the second embodiment is the same as the explanation regarding the monotonicity constraint function value in the first embodiment.

＜連続性制約関数値＞
第２実施形態における連続性制約関数値に関する説明は、第１実施形態における連続性制約関数値に関する説明と同様である。 <Continuity constraint function value>
The explanation regarding the continuity constraint function value in the second embodiment is the same as the explanation regarding the continuity constraint function value in the first embodiment.

＜更新部４６＞
更新部４６は、目的関数値を目的関数値導出部４４から取得する。更新部４６は、制約関数値を制約関数値導出部４５から取得する。更新部４６は、目的関数値と制約関数値とに基づいて学習処理を実行する。更新部４６は、所定の外部装置（不図示）に、学習済の数理モデル（学習結果）を出力する。学習処理は、特定の学習処理に限定されない。 <Update section 46>
The updating unit 46 obtains the objective function value from the objective function value deriving unit 44. The updating unit 46 obtains the constraint function value from the constraint function value deriving unit 45. The updating unit 46 executes learning processing based on the objective function value and the constraint function value. The updating unit 46 outputs the learned mathematical model (learning result) to a predetermined external device (not shown). The learning process is not limited to a specific learning process.

以上のように、注意機構４２は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、第１特徴配列と第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する。復号化部４３は、現在の時刻における第２配列の要素に対する第１特徴配列の各要素の重みと、第１特徴配列とに基づいて、現在の時刻における第２配列の要素を導出する。目的関数値導出部４４は、正解配列と第２配列とに応じた値である目的関数値を導出する。制約関数値導出部４５は、重み行列に基づいて制約関数値を導出する。更新部４６は、目的関数値と制約関数値とに基づいて所定の学習処理を実行することによって、第１符号化部４０と第２符号化部４１と注意機構４２と復号化部４３とを含む数理モデルのパラメータを更新し、学習結果を生成する。目的関数値は、例えば、正解配列と第２配列との間の差分又は残差平方和である。更新部４６は、数理モデルを更新する。 As described above, the attention mechanism 42 uses the first feature array based on the first array and the second feature array based on the second array, so that each element of the first feature array and the second feature array has a corresponding relationship. Generate a weight matrix, which is a matrix representing the probability that . The decoding unit 43 derives the elements of the second array at the current time based on the weight of each element of the first feature array with respect to the elements of the second array at the current time and the first feature array. The objective function value deriving unit 44 derives an objective function value that is a value according to the correct array and the second array. The constraint function value deriving unit 45 derives a constraint function value based on the weight matrix. The updating unit 46 updates the first encoding unit 40, the second encoding unit 41, the attention mechanism 42, and the decoding unit 43 by executing a predetermined learning process based on the objective function value and the constraint function value. Update the parameters of the included mathematical model and generate learning results. The objective function value is, for example, the difference or residual sum of squares between the correct array and the second array. The updating unit 46 updates the mathematical model.

学習段階において更新された数理モデルは、実行段階において推論処理の実行に使用される。実行段階において、注意機構３２は、第１配列に基づく第１特徴配列と第２配列に基づく第２特徴配列とを用いて、第１特徴配列と第２特徴配列との各要素が対応関係にある確率を表す行列である重み行列を生成する。復号化部３３は、第１特徴配列と重み行列とに基づいて、第２配列を導出する。推論部３４は、第２配列に基づいて所定の推論処理を実行することによって推論結果を生成する。 The mathematical model updated in the learning stage is used to execute inference processing in the execution stage. In the execution stage, the attention mechanism 32 uses the first feature array based on the first array and the second feature array based on the second array to establish a correspondence between each element of the first feature array and the second feature array. Generate a weight matrix, which is a matrix that represents a certain probability. The decoding unit 33 derives a second array based on the first feature array and the weight matrix. The inference unit 34 generates an inference result by performing a predetermined inference process based on the second array.

このように、単調性制約と連続性制約とのうちの少なくとも一方を表す制約関数値を用いて学習された数理モデルを用いて符号化部が特徴配列を導出することによって、有効に働く重み行列を注意機構が生成する。 In this way, the encoding unit derives the feature array using the mathematical model learned using the constraint function value representing at least one of the monotonicity constraint and the continuity constraint, thereby creating an effective weight matrix. is generated by the attention mechanism.

これによって、人手によって設計された特徴表現の使用に依存することなく、音声合成又は音声変換などの応用問題に対して、より複雑な特徴表現を導出及び使用可能であると同時に、単調で連続的な対応関数を導出及び使用可能な配列整列を実現することが可能である。人手によって設計された特徴表現の使用に依存することなく、音声合成又は音声変換などの応用問題に対して、より複雑な特徴表現を実現することが可能である。また、音声合成又は音声変換などの推論精度の向上と学習時間の短縮とを両立させることが可能である。 This makes it possible to derive and use more complex feature representations for applied problems such as speech synthesis or speech conversion, without relying on the use of manually designed feature representations, while at the same time providing a monotonous and continuous representation. It is possible to derive a corresponding function and realize a usable array alignment. It is possible to realize more complex feature representations for application problems such as speech synthesis or speech conversion without relying on the use of manually designed feature representations. Furthermore, it is possible to simultaneously improve the accuracy of inferences such as speech synthesis or speech conversion and shorten the learning time.

図８は、各実施形態における、推論装置１のハードウェア構成例を示す図である。推論装置１の各機能部のうちの一部又は全部は、ＣＰＵ（Central Processing Unit）等のプロセッサ１００が、不揮発性の記録媒体（非一時的な記録媒体）を有する記憶部２００に記憶されたプログラムを実行することにより、ソフトウェアとして実現される。プログラムは、コンピュータ読み取り可能な記録媒体に記録されてもよい。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ（Read Only Memory）、ＣＤ－ＲＯＭ（Compact Disc Read Only Memory）等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置などの非一時的な記録媒体である。通信部３００は、推論装置１による処理結果を外部装置（不図示）に送信する。通信部３００は、通信回線を経由してプログラムを受信してもよい。表示部４００は、推論装置１による処理結果を表示する。表示部４００は、例えば、液晶ディスプレイ、有機ＥＬ（Electro Luminescence）ディスプレイである。 FIG. 8 is a diagram showing an example of the hardware configuration of the inference device 1 in each embodiment. Some or all of the functional units of the inference device 1 are stored by a processor 100 such as a CPU (Central Processing Unit) in a storage unit 200 having a non-volatile recording medium (non-temporary recording medium). It is realized as software by executing the program. The program may be recorded on a computer-readable recording medium. Computer-readable recording media include, for example, portable media such as flexible disks, magneto-optical disks, ROM (Read Only Memory), and CD-ROM (Compact Disc Read Only Memory), and storage such as hard disks built into computer systems. It is a non-temporary recording medium such as a device. The communication unit 300 transmits the processing results by the inference device 1 to an external device (not shown). The communication unit 300 may receive the program via a communication line. The display unit 400 displays the processing results by the inference device 1. The display unit 400 is, for example, a liquid crystal display or an organic EL (Electro Luminescence) display.

推論装置１の各機能部のうちの一部又は全部は、例えば、ＬＳＩ（Large Scale Integration circuit）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）又はＦＰＧＡ（Field Programmable Gate Array）等を用いた電子回路（electronic circuit又はcircuitry）を含むハードウェアを用いて実現されてもよい。なお、推論装置３のハードウェア構成例は、推論装置１のハードウェア構成例と同様である。 Some or all of the functional units of the inference device 1 may include, for example, an LSI (Large Scale Integration circuit), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array). It may also be realized using hardware including the electronic circuit or circuitry used. Note that the hardware configuration example of the inference device 3 is similar to the hardware configuration example of the inference device 1.

図９は、各実施形態における、学習装置２のハードウェア構成例を示す図である。学習装置２の各機能部のうちの一部又は全部は、ＣＰＵ等のプロセッサ１０１が、不揮発性の記録媒体（非一時的な記録媒体）を有する記憶部２０１に記憶されたプログラムを実行することにより、ソフトウェアとして実現される。プログラムは、コンピュータ読み取り可能な記録媒体に記録されてもよい。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置などの非一時的な記録媒体である。通信部３０１は、学習装置２による処理結果を外部装置（不図示）に送信する。通信部３０１は、通信回線を経由してプログラムを受信してもよい。表示部４０１は、学習装置２による処理結果を表示する。表示部４０１は、例えば、液晶ディスプレイ、有機ＥＬディスプレイである。 FIG. 9 is a diagram showing an example of the hardware configuration of the learning device 2 in each embodiment. Some or all of the functional units of the learning device 2 are configured such that a processor 101 such as a CPU executes a program stored in a storage unit 201 having a non-volatile recording medium (non-temporary recording medium). This is realized as software. The program may be recorded on a computer-readable recording medium. The computer-readable recording medium is a non-temporary recording medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or other portable medium, or a hard disk or other storage device built into a computer system. The communication unit 301 transmits the processing results by the learning device 2 to an external device (not shown). The communication unit 301 may receive the program via a communication line. The display unit 401 displays the processing results by the learning device 2. The display unit 401 is, for example, a liquid crystal display or an organic EL display.

学習装置２の各機能部のうちの一部又は全部は、例えば、ＬＳＩ、ＡＳＩＣ、ＰＬＤ又はＦＰＧＡ等を用いた電子回路（electronic circuit又はcircuitry）を含むハードウェアを用いて実現されてもよい。なお、学習装置４のハードウェア構成例は、学習装置２のハードウェア構成例と同様である。 Some or all of the functional units of the learning device 2 may be realized using hardware including an electronic circuit using an LSI, an ASIC, a PLD, an FPGA, or the like, for example. Note that the example hardware configuration of the learning device 4 is similar to the example hardware configuration of the learning device 2.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described above in detail with reference to the drawings, the specific configuration is not limited to these embodiments, and includes designs within the scope of the gist of the present invention.

本発明は、学習装置及び推論装置に適用可能である。 The present invention is applicable to learning devices and reasoning devices.

１…推論装置、２…学習装置、３…推論装置、４…学習装置、１０…符号化部、１１…注意機構、１２…照合部、１３…推論部、２０…符号化部、２１…注意機構、２２…目的関数値導出部、２３…制約関数値導出部、２４…更新部、３０…第１符号化部、３１…第２符号化部、３２…注意機構、３３…復号化部、３４…推論部、４０…第１符号化部、４１…第２符号化部、４２…注意機構、４３…復号化部、４４…目的関数値導出部、４５…制約関数値導出部、４６…更新部、１００…プロセッサ、１０１…プロセッサ、２００…記憶部、２０１…記憶部、３００…通信部、３０１…通信部、４００…表示部、４０１…表示部 DESCRIPTION OF SYMBOLS 1...Inference device, 2...Learning device, 3...Inference device, 4...Learning device, 10...Encoding unit, 11...Attention mechanism, 12...Verification unit, 13...Inference unit, 20...Encoding unit, 21...Attention mechanism, 22... objective function value deriving unit, 23... constraint function value deriving unit, 24... updating unit, 30... first encoding unit, 31... second encoding unit, 32... attention mechanism, 33... decoding unit, 34... Reasoning section, 40... First encoding section, 41... Second encoding section, 42... Attention mechanism, 43... Decoding section, 44... Objective function value deriving section, 45... Constraint function value deriving section, 46... Update unit, 100... Processor, 101... Processor, 200... Storage unit, 201... Storage unit, 300... Communication unit, 301... Communication unit, 400... Display unit, 401... Display unit

Claims

A matrix representing a probability that each element of the first feature array and the second feature array has a corresponding relationship using a first feature array based on the first array and a second feature array based on the second array. an attention mechanism that generates a weight matrix;
An objective function value, which is a value corresponding to a label indicating whether the first array and the second array belong to the same class, and the first feature array and the second feature array, is determined based on the weight matrix. an objective function value derivation unit that derives the
an updating unit that generates a learning result by executing a predetermined learning process based on the objective function value ;
If there is a correspondence between the elements of the first array and the elements of the second array, a constraint function that derives a constraint function value representing at least one of a monotonicity constraint and a continuity constraint based on the weight matrix. Value derivation part and
Equipped with
The monotonicity constraint is a constraint that as the subscript of the element of the second array increases, the subscript of the element of the first array that has a corresponding relationship with the element of the second array does not decrease,
The continuity constraint means that when the subscripts of adjacent elements in the second array are consecutive, the subscripts of the elements of the first array that have a corresponding relationship with the subscripts of adjacent elements in the second array are The constraint is that the difference is less than or equal to a predetermined positive value,
The updating unit generates a learning result by executing a predetermined learning process based on the objective function value and the constraint function value.
learning device.

The objective function value deriving unit calculates the difference or similarity between the first feature array and the second feature array or between the second feature array and the feature array derived from the weight matrix into the label. determining the objective function value so as to be associated with it;
The learning device according to claim 1.

A matrix representing a probability that each element of the first feature array and the second feature array has a corresponding relationship using a first feature array based on the first array and a second feature array based on the second array. an attention mechanism that generates a weight matrix;
a decoding unit that derives the elements of the second array at the current time based on the weight of each element of the first feature array with respect to the elements of the second array at the current time and the first feature array;
an objective function value derivation unit that derives an objective function value that is a value according to the correct answer array and the second array;
an updating unit that generates a learning result by executing a predetermined learning process based on the objective function value ;
If there is a correspondence between the elements of the first array and the elements of the second array, a constraint function that derives a constraint function value representing at least one of a monotonicity constraint and a continuity constraint based on the weight matrix. Value derivation part and
Equipped with
The monotonicity constraint is a constraint that as the subscript of the element of the second array increases, the subscript of the element of the first array that has a corresponding relationship with the element of the second array does not decrease,
The continuity constraint means that when the subscripts of adjacent elements in the second array are consecutive, the subscripts of the elements of the first array that have a corresponding relationship with the subscripts of adjacent elements in the second array are The constraint is that the difference is less than or equal to a predetermined positive value,
The updating unit generates a learning result by executing a predetermined learning process based on the objective function value and the constraint function value.
learning device.

The constraint function value deriving unit decreases the constraint function value as the degree to which at least one of the monotonicity constraint and the continuity constraint is satisfied is greater;
The learning device according to any one of claims 1 to 3 .

The constraint function value deriving unit derives the product of the weight matrix and a predetermined arithmetic progression as a corresponding array, and calculates the sum or sum of all local monotonicity constraint function values for all elements in the corresponding array. deriving the mean as said constraint function value of monotonicity;
The learning device according to any one of claims 1 to 4.

The constraint function value deriving unit derives the product of the weight matrix and a predetermined arithmetic progression as a corresponding array, and for each element of the corresponding array, the element immediately before the element of the corresponding array and the element of the corresponding array. Derive the absolute value of the difference with the element, subtract a predetermined positive number from the derived absolute value, and derive the maximum value between the subtraction result and 0 as the local function value of the continuity constraint. and deriving the sum or average of all the local continuity constraint function values for all elements in the corresponding array as the continuity constraint function value;
The learning device according to any one of claims 1 to 4.

A matrix representing a probability that each element of the first feature array and the second feature array has a corresponding relationship using a first feature array based on the first array and a second feature array based on the second array. an attention mechanism that generates a weight matrix;
a matching unit that derives a distance between the first array and the second array based on the first feature array, the second feature array, and the weight matrix;
an inference unit that generates an inference result by executing a predetermined inference process based on the distance ;
The objective function value is a value according to a label indicating whether the first array and the second array belong to the same class, and the first feature array and the second feature array,
When there is a correspondence between the elements of the first array and the elements of the second array, the constraint function value represents at least one of a monotonicity constraint and a continuity constraint,
The monotonicity constraint is a constraint that as the subscript of the element of the second array increases, the subscript of the element of the first array that has a corresponding relationship with the element of the second array does not decrease,
The continuity constraint means that when the subscripts of adjacent elements in the second array are consecutive, the subscripts of the elements of the first array that have a corresponding relationship with the subscripts of adjacent elements in the second array are The constraint is that the difference is less than or equal to a predetermined positive value,
The inference unit generates the inference result using a learning result generated by executing a predetermined learning process based on the objective function value and the constraint function value.
Reasoning device.

A matrix representing a probability that each element of the first feature array and the second feature array has a corresponding relationship using a first feature array based on the first array and a second feature array based on the second array. an attention mechanism that generates a weight matrix;
a decoding unit that derives a second array based on the first feature array and the weight matrix;
an inference unit that generates an inference result by performing a predetermined inference process based on the second array ,
The objective function value is a value according to the correct answer array and the second array,
When there is a correspondence between the elements of the first array and the elements of the second array, the constraint function value represents at least one of a monotonicity constraint and a continuity constraint,
The monotonicity constraint is a constraint that as the subscript of the element of the second array increases, the subscript of the element of the first array that has a corresponding relationship with the element of the second array does not decrease,
The continuity constraint means that when the subscripts of adjacent elements in the second array are consecutive, the subscripts of the elements of the first array that have a corresponding relationship with the subscripts of adjacent elements in the second array are The constraint is that the difference is less than or equal to a predetermined positive value,
The inference unit generates the inference result using a learning result generated by executing a predetermined learning process based on the objective function value and the constraint function value.
Reasoning device.

A learning method executed by a learning device, comprising:
A matrix representing a probability that each element of the first feature array and the second feature array has a corresponding relationship using a first feature array based on the first array and a second feature array based on the second array. an attention step of generating a weight matrix;
An objective function value, which is a value corresponding to a label indicating whether the first array and the second array belong to the same class, and the first feature array and the second feature array, is determined based on the weight matrix. an objective function value derivation step,
an updating step of generating a learning result by executing a predetermined learning process based on the objective function value ;
If there is a correspondence between the elements of the first array and the elements of the second array, a constraint function that derives a constraint function value representing at least one of a monotonicity constraint and a continuity constraint based on the weight matrix. value derivation step and
including;
The monotonicity constraint is a constraint that as the subscript of the element of the second array increases, the subscript of the element of the first array that has a corresponding relationship with the element of the second array does not decrease,
The continuity constraint means that when the subscripts of adjacent elements in the second array are consecutive, the subscripts of the elements of the first array that have a corresponding relationship with the subscripts of adjacent elements in the second array are The constraint is that the difference is less than or equal to a predetermined positive value,
The updating step includes generating a learning result by performing a predetermined learning process based on the objective function value and the constraint function value.
How to learn.

An inference method executed by an inference device,
A matrix representing a probability that each element of the first feature array and the second feature array has a corresponding relationship using a first feature array based on the first array and a second feature array based on the second array. an attention step of generating a weight matrix;
a matching step of deriving a distance between the first array and the second array based on the first feature array, the second feature array, and the weight matrix;
an inference step of generating an inference result by performing a predetermined inference process based on the distance ;
The objective function value is a value according to a label indicating whether the first array and the second array belong to the same class, and the first feature array and the second feature array,
When there is a correspondence between the elements of the first array and the elements of the second array, the constraint function value represents at least one of a monotonicity constraint and a continuity constraint,
The monotonicity constraint is a constraint that as the subscript of the element of the second array increases, the subscript of the element of the first array that has a corresponding relationship with the element of the second array does not decrease,
The continuity constraint means that when the subscripts of adjacent elements in the second array are consecutive, the subscripts of the elements of the first array that have a corresponding relationship with the subscripts of adjacent elements in the second array are The constraint is that the difference is less than or equal to a predetermined positive value,
The inference step includes generating the inference result using a learning result generated by executing a predetermined learning process based on the objective function value and the constraint function value.
Reasoning method.

A learning method executed by a learning device, comprising:
A matrix representing a probability that each element of the first feature array and the second feature array has a corresponding relationship using a first feature array based on the first array and a second feature array based on the second array. an attention step of generating a weight matrix;
a decoding step of deriving the elements of the second array at the current time based on the weight of each element of the first feature array with respect to the elements of the second array at the current time and the first feature array;
an objective function value deriving step of deriving an objective function value that is a value according to the correct answer array and the second array;
an updating step of generating a learning result by executing a predetermined learning process based on the objective function value ;
If there is a correspondence between the elements of the first array and the elements of the second array, a constraint function that derives a constraint function value representing at least one of a monotonicity constraint and a continuity constraint based on the weight matrix. value derivation step and
including;
The monotonicity constraint is a constraint that as the subscript of the element of the second array increases, the subscript of the element of the first array that has a corresponding relationship with the element of the second array does not decrease,
The continuity constraint means that when the subscripts of adjacent elements in the second array are consecutive, the subscripts of the elements of the first array that have a corresponding relationship with the subscripts of adjacent elements in the second array are The constraint is that the difference is less than or equal to a predetermined positive value,
The updating step includes generating a learning result by performing a predetermined learning process based on the objective function value and the constraint function value.
How to learn.

An inference method executed by an inference device,
A matrix representing a probability that each element of the first feature array and the second feature array has a corresponding relationship using a first feature array based on the first array and a second feature array based on the second array. an attention step of generating a weight matrix;
a decoding step of deriving a second array based on the first feature array and the weight matrix;
an inference step of generating an inference result by performing a predetermined inference process based on the second array ;
The objective function value is a value according to the correct answer array and the second array,
When there is a correspondence between the elements of the first array and the elements of the second array, the constraint function value represents at least one of a monotonicity constraint and a continuity constraint,
The monotonicity constraint is a constraint that as the subscript of the element of the second array increases, the subscript of the element of the first array that has a corresponding relationship with the element of the second array does not decrease,
The continuity constraint means that when the subscripts of adjacent elements in the second array are consecutive, the subscripts of the elements of the first array that have a corresponding relationship with the subscripts of adjacent elements in the second array are The constraint is that the difference is less than or equal to a predetermined positive value,
The inference step includes generating the inference result using a learning result generated by executing a predetermined learning process based on the objective function value and the constraint function value.
Reasoning method.

A program for causing a computer to function as the learning device according to any one of claims 1 to 6 .

A program for causing a computer to function as the inference device according to claim 7 or 8 .