JP7019138B2

JP7019138B2 - Coding device, coding method and program

Info

Publication number: JP7019138B2
Application number: JP2017037640A
Authority: JP
Inventors: 亘中鹿; 信二高木; 順一山岸
Original assignee: THE UNIVERSITY OF ELECTRO-COMUNICATINS; Inter University Research Institute Corp Research Organization of Information and Systems
Current assignee: THE UNIVERSITY OF ELECTRO-COMUNICATINS; Inter University Research Institute Corp Research Organization of Information and Systems
Priority date: 2017-02-28
Filing date: 2017-02-28
Publication date: 2022-02-15
Anticipated expiration: 2037-02-28
Also published as: JP2018142278A

Description

本発明は、符号化装置および符号化方法、並びに符号化方法を実行するプログラムに関する。 The present invention relates to a coding apparatus and a coding method, and a program for executing the coding method.

近年、ディープラーニングを用いた手法が飛躍的に高い精度を上げ、画像認識や音声認識など、幅広い分野において盛んに研究され、利用が進んでいる。これまでに数多くのディープラーニング手法が提案されているが、最も代表的なモデルとして、制限ボルツマンマシン（restricted Boltzmann machine：以下、「ＲＢＭ」と称する）が用いられている。また、ＲＢＭを多層に積み重ねたDeep Belief Net (以下、「ＤＢＮ」と称する) も用いられている。さらに、様々なＲＢＭの拡張モデルも提案されている。 In recent years, methods using deep learning have dramatically improved accuracy, and have been actively researched and used in a wide range of fields such as image recognition and voice recognition. Although many deep learning methods have been proposed so far, the restricted Boltzmann machine (hereinafter referred to as "RBM") is used as the most representative model. In addition, Deep Belief Net (hereinafter referred to as "DBN") in which RBMs are stacked in multiple layers is also used. In addition, various RBM extensions have been proposed.

“Lending Direction to Neural Networks”：Neural Networks Vol.8. No4.pp503-512,1995(Richard S.Zemel,Christopher K.Williams,Michael C.Mozer)“Lending Direction to Neural Networks”: Neural Networks Vol.8. No4.pp503-512,1995 (Richard S. Zemel, Christopher K. Williams, Michael C. Mozer)

従来、ＲＢＭを利用した特徴量抽出処理としては、いずれのアプローチでも入力特徴量はバイナリまたは実数値が使用されていた。
例えば音声認識や音声合成などの音声処理を行う場合には、メル周波数ケプストラム係数（Mel-Frequency Cepstrum Coefficients ：ＭＦＣＣ）、メルケプストラム特徴量、ＳＴＲＡＩＧＨＴスペクトルなどの振幅スペクトルに基づいた音響特徴量が利用されている。ところが、振幅スペクトルに基づいた音響特徴量抽出では、位相情報が欠落しており、元の複素数表現された音声データに対して少なからず情報の損失が存在する。
ここでは音声処理を例に説明したが、その他の複素数情報から特徴量抽出をする場合にも、情報の損失が存在するという問題があった。 Conventionally, as a feature amount extraction process using RBM, a binary or a real value has been used as an input feature amount in either approach.
For example, when performing speech processing such as speech recognition and speech synthesis, acoustic features based on amplitude spectra such as Mel-Frequency Cepstrum Coefficients (MFCC), Mel-Frequency Cepstrum Coefficients, and STRAIGHT spectra are used. ing. However, in the acoustic feature amount extraction based on the amplitude spectrum, the phase information is missing, and there is not a little loss of information with respect to the original complex-numbered voice data.
Here, speech processing has been described as an example, but there is a problem that information loss exists even when feature quantity extraction is performed from other complex number information.

なお、非特許文献１には、ボルツマンマシンで複素数を使って特徴量を抽出する技術が記載されているが、この技術は、上述したＲＢＭやＤＢＮを適用したものではないため、特徴量の抽出がより精度よく行うことができる手法の開発が望まれていた。 Note that Non-Patent Document 1 describes a technique for extracting a feature amount using a complex number with a Boltzmann machine, but since this technique does not apply the above-mentioned RBM or DBN, the feature amount is extracted. It has been desired to develop a method that can be performed more accurately.

本発明は、複素数に対してＲＢＭを適用し、精度の良い特徴抽出を行うことで、その特徴量抽出に基づいた良好な符号化ができる符号化装置、符号化方法およびプログラムを提供することを目的とする。 The present invention provides a coding device, a coding method, and a program capable of performing good coding based on the feature extraction by applying RBM to a complex number and performing feature extraction with high accuracy. The purpose.

本発明の符号化装置は、パラメータ学習ユニットと符号化ユニットとを備える。
パラメータ学習ユニットは、入力データを表現する可視素子と、潜在的な情報を表現した隠れ素子との間に結合重みが存在すると仮定した制限ボルツマンマシンによる確率モデルを適用して、学習用データに対して、隠れ素子および結合重みを推定する処理を行う。
符号化ユニットは、符号化用入力データに対して、パラメータ学習ユニットで推定した制限ボルツマンマシンによる確率モデルを適用して、隠れ素子を推定し、推定した隠れ素子を符号化データとして出力する。
ここで、学習用データおよび符号化用入力データは複素数データであり、制限ボルツマンマシンによる確率モデルのエネルギー関数に実部と虚部のクロスタームが含まれていることを特徴とする。 The coding device of the present invention includes a parameter learning unit and a coding unit.
The parameter learning unit applies a probabilistic model based on a restricted Boltzmann machine that assumes that there is a coupling weight between the visible element that represents the input data and the hidden element that represents the potential information, and applies it to the training data. Then, the process of estimating the hidden element and the coupling weight is performed.
The coding unit applies a probability model by the restricted Boltzmann machine estimated by the parameter learning unit to the coding input data, estimates hidden elements, and outputs the estimated hidden elements as coding data.
Here, the training data and the coding input data are complex number data, and are characterized in that the energy function of the probability model by the restricted Boltzmann machine includes a cross-term of a real part and an imaginary part .

また本発明の符号化方法は、パラメータ学習処理と符号化処理とを含む。
パラメータ学習処理は、入力データを表現する可視素子と、潜在的な情報を表現した隠れ素子との間に結合重みが存在すると仮定した制限ボルツマンマシンによる確率モデルを適用して、学習用データに対して、隠れ素子および結合重みを推定する処理を行う。
符号化処理は、符号化用入力データに対して、パラメータ学習処理で推定した制限ボルツマンマシンによる確率モデルを適用して、隠れ素子を推定し、推定した隠れ素子を符号化データとして出力する。
ここで、パラメータ学習処理で得られる学習用データと符号化処理で得られる符号化用入力データは複素数データであり、制限ボルツマンマシンによる確率モデルのエネルギー関数に実部と虚部のクロスタームが含まれていることを特徴とする。
Further, the coding method of the present invention includes a parameter learning process and a coding process.
The parameter learning process applies a probabilistic model based on a restricted Boltzmann machine that assumes that there is a coupling weight between the visible element that represents the input data and the hidden element that represents the potential information, and applies the probability model to the training data. Then, the process of estimating the hidden element and the coupling weight is performed.
In the coding process, a stochastic model by the restricted Boltzmann machine estimated by the parameter learning process is applied to the coding input data to estimate the hidden element, and the estimated hidden element is output as the coded data.
Here, the training data obtained by the parameter learning process and the coding input data obtained by the coding process are complex number data, and the energy function of the probability model by the restricted Boltzmann machine includes the cross-term of the real part and the imaginary part. It is characterized by being.

また本発明のプログラムは、複素数データで構成される学習用データおよび符号化用入力データを入力として、上述した符号化方法のパラメータ学習処理を実行するステップと、符号化処理を実行するステップをコンピュータに実行させるものである。 Further, in the program of the present invention, a computer performs a step of executing the parameter learning process of the above-mentioned coding method and a step of executing the coding process by inputting the learning data and the coding input data composed of complex number data. Is to be executed.

本発明によると、制限ボルツマンマシン（ＲＢＭ）を複素数に拡張した複素ＲＢＭによる特徴量の抽出を行うことができ、高い精度で入力データから特徴量を抽出して符号化することが可能になるので、効率の良い符号化が行えるようになる。 According to the present invention, it is possible to extract the feature amount by the complex RBM which is an extension of the restricted Boltzmann machine (RBM) to a complex number, and it is possible to extract and encode the feature amount from the input data with high accuracy. , Efficient coding will be possible.

本発明の一実施の形態例による符号化装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the coding apparatus by one Embodiment of this invention. 図１の符号化装置のハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware configuration example of the coding apparatus of FIG. 本発明の一実施の形態例に適用される確率モデルである、複素ＲＢＭ(Restricted Boltzmann machine)を模式的に示す図である。It is a figure which shows typically the complex RBM (Restricted Boltzmann machine) which is the probability model applied to one Embodiment of this invention. 本発明の一実施の形態例によるパラメータ学習の流れを示すフローチャートである。It is a flowchart which shows the flow of parameter learning by one Embodiment of this invention. 本発明の一実施の形態例による符号化の流れを示すフローチャートである。It is a flowchart which shows the flow of coding by one Embodiment of this invention. 図４のステップＳ１３の複素ＲＢＭの学習処理を示すフローチャートである。It is a flowchart which shows the learning process of the complex RBM of step S13 of FIG. 図５のステップＳ２３の符号化処理を示すフローチャートである。It is a flowchart which shows the coding process of step S23 of FIG. 本発明の一実施の形態例により符号化されたデータを復号化する復号化装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the decoding apparatus which decodes the data encoded by one Embodiment of this invention. 本発明の一実施の形態例による復号化の流れを示すフローチャートである。It is a flowchart which shows the flow of decoding by one Embodiment of this invention. 図９のステップＳ５２の復号化処理を示すフローチャートである。It is a flowchart which shows the decoding process of step S52 of FIG. オリジナルデータ（図１１Ａ）と、本発明の一実施の形態例を適用した符号化データ（図１１Ｂ）との例を示す図である。It is a figure which shows the example of the original data (FIG. 11A), and the coded data (FIG. 11B) to which one Embodiment of this invention is applied. 本発明を適用した複素ＲＢＭによる再構築エラーと、従来例（ＧＢ－ＲＢＭ）による再構築エラーとを比較した特性図である。It is a characteristic diagram which compared the reconstruction error by the complex RBM to which this invention was applied, and the reconstruction error by the conventional example (GB-RBM). 本発明の一実施の形態例に適用される複素ＲＢＭを多層化した例を模式的に示す図である。It is a figure which shows typically the example in which the complex RBM applied to one Embodiment of this invention is multi-layered.

以下、本発明の好適な一実施の形態例について説明する。 Hereinafter, an example of a preferred embodiment of the present invention will be described.

［１．符号化装置の構成例］
図１は、本発明の一実施の形態例にかかる符号化装置の構成例を示す図である。図１に示すように、コンピュータ（ＰＣ）等により構成される符号化装置１は、パラメータ学習ユニット１１と符号化処理ユニット１２とを備える。
パラメータ学習ユニット１１は、符号化を行うデータと同じ種類のデータについて事前に学習処理を行い、符号化に必要なパラメータを得る。符号化処理ユニット１２は、その学習処理で得たパラメータを使って、入力データ（符号化用データ）の符号化を行う。
符号化を行う入力データとしては、音声データ、画像データなど様々なデータが適用可能である。但し、後述するように本実施の形態例で扱う学習データおよび入力データは複素数データである。 [1. Configuration example of coding device]
FIG. 1 is a diagram showing a configuration example of a coding device according to an embodiment of the present invention. As shown in FIG. 1, the coding device 1 configured by a computer (PC) or the like includes a parameter learning unit 11 and a coding processing unit 12.
The parameter learning unit 11 performs learning processing in advance on the same type of data as the data to be encoded, and obtains the parameters required for encoding. The coding processing unit 12 encodes the input data (coding data) by using the parameters obtained in the learning processing.
As the input data to be encoded, various data such as voice data and image data can be applied. However, as will be described later, the learning data and the input data handled in this embodiment are complex number data.

パラメータ学習ユニット１１は、複素数データ取得部１１１と前処理部１１２とパラメータ推定部１１３とを備える。複素数データ取得部１１１には、学習用複素数データが供給される。複素数データ取得部１１１で取得した学習用複素数データは、前処理部１１２で前処理が行われた後、パラメータ推定部１１３に供給される。
例えば、複素数データ取得部１１１で取得される学習用複素数データが音声データの場合には、前処理部１１２は、学習用の音声データを単位時間ごと（以下、フレームという）に切り出して、ＭＦＣＣ（Mel-Frequency Cepstrum Coefficients：メル周波数ケプストラム係数）やメルケプストラム特徴量などのフレームごとの音声信号のスペクトル特徴量を計算し、これを正規化する。なお、この前処理部１１２での処理で学習用データを複素数データに変換してもよい。 The parameter learning unit 11 includes a complex number data acquisition unit 111, a preprocessing unit 112, and a parameter estimation unit 113. The complex number data for learning is supplied to the complex number data acquisition unit 111. The learning complex number data acquired by the complex number data acquisition unit 111 is supplied to the parameter estimation unit 113 after being preprocessed by the preprocessing unit 112.
For example, when the learning complex number data acquired by the complex number data acquisition unit 111 is voice data, the preprocessing unit 112 cuts out the training voice data every unit time (hereinafter referred to as a frame) and MFCC (hereinafter referred to as a frame). Calculates and normalizes the spectral features of the audio signal for each frame, such as Mel-Frequency Cepstrum Coefficients) and Mel-Frequency Cepstrum features. The training data may be converted into complex number data by the processing in the preprocessing unit 112.

パラメータ推定部１１３は、可視素子推定部１１３１と隠れ素子推定部１１３２とによって構成される確率モデルを持つ。本実施の形態例では、可視素子推定部１１３１および隠れ素子推定部１１３２で構成される確率モデルとして、ＲＢＭを複素数に拡張した複素ＲＢＭ（Complex RBM）を使用する。なお、複素ＲＢＭの確率モデルは、可視素子および隠れ素子の他に、素子間の結合重みの情報についても有し、パラメータ推定部１１３は、この結合重みの情報についても推定して持つ。この複素ＲＢＭの詳細については後述する。 The parameter estimation unit 113 has a probability model composed of a visible element estimation unit 1131 and a hidden element estimation unit 1132. In the present embodiment, a complex RBM (Complex RBM) obtained by extending RBM to a complex number is used as a probability model composed of the visible element estimation unit 1131 and the hidden element estimation unit 1132. In addition to the visible element and the hidden element, the probability model of the complex RBM also has information on the coupling weight between the elements, and the parameter estimation unit 113 also estimates and possesses the information on the coupling weight. The details of this complex RBM will be described later.

符号化処理ユニット１２は、複素数データ取得部１２１と前処理部１２２と符号化部１２３とを備える。
複素数データ取得部１２１には、符号化用複素数データが供給される。複素数データ取得部１２１で取得された符号化用複素数データは、前処理部１２２で前処理が行われた後、符号化部１２３に供給される。
前処理部１２２は、パラメータ学習ユニット１１の前処理部１１２と同じ構成である。この前処理部１２２における処理により、符号化用データを複素数データに変換してもよい。 The coding processing unit 12 includes a complex number data acquisition unit 121, a preprocessing unit 122, and a coding unit 123.
The complex number data acquisition unit 121 is supplied with the complex number data for coding. The complex number data for coding acquired by the complex number data acquisition unit 121 is supplied to the coding unit 123 after being preprocessed by the preprocessing unit 122.
The preprocessing unit 122 has the same configuration as the preprocessing unit 112 of the parameter learning unit 11. The coding data may be converted into complex number data by the processing in the preprocessing unit 122.

符号化部１２３は、パラメータ学習ユニット１１のパラメータ推定部１１３と同じ構成であり、可視素子推定部１２３１で得た可視素子と隠れ素子推定部１２３２で得た隠れ素子とによって構成される複素ＲＢＭの確率モデルを備える。可視素子推定部１２３１および隠れ素子推定部１２３２で、可視素子および隠れ素子を推定する際には、パラメータ学習ユニット１１のパラメータ推定部１１３で推定したパラメータが利用される。 The coding unit 123 has the same configuration as the parameter estimation unit 113 of the parameter learning unit 11, and is a complex RBM composed of a visible element obtained by the visible element estimation unit 1231 and a hidden element obtained by the hidden element estimation unit 1232. It has a probabilistic model. When the visible element estimation unit 1231 and the hidden element estimation unit 1232 estimate the visible element and the hidden element, the parameters estimated by the parameter estimation unit 113 of the parameter learning unit 11 are used.

符号化装置１は、符号化部１２３の隠れ素子推定部１２３２で推定された隠れ素子を、符号化データとして外部に出力する。
なお、図１に示す構成では、学習処理を行うパラメータ推定部１１３と、入力データの符号化処理を行う符号化部１２３を個別の構成としたが、パラメータ推定部１１３と符号化部１２３は、ほぼ同じ機能を有しており、パラメータ推定部１１３で符号化部１２３の処理を行うようにしてもよい。複素数データ取得部１１１，１２１や前処理部１１２，１２２についても共通化してもよい。 The coding device 1 outputs the hidden element estimated by the hidden element estimation unit 1232 of the coding unit 123 to the outside as coded data.
In the configuration shown in FIG. 1, the parameter estimation unit 113 that performs learning processing and the coding unit 123 that performs input data coding processing are individually configured, but the parameter estimation unit 113 and the coding unit 123 have different configurations. It has almost the same function, and the parameter estimation unit 113 may process the coding unit 123. The complex number data acquisition units 111 and 121 and the preprocessing units 112 and 122 may also be shared.

図２は、符号化装置１のハードウェア構成例を示す図である。ここでは、符号化装置１をコンピュータ（ＰＣ）で構成した例を示す。
図２に示すように、符号化装置１は、バス１０７を介して相互に接続されたＣＰＵ（中央制御ユニット：Central Processing Unit）１０１、ＲＯＭ（Read Only Memory）１０２、ＲＡＭ（Random Access Memory）１０３、ＨＤＤ（Hard Disk Drive）／ＳＳＤ（Solid State Drive）１０４、接続Ｉ／Ｆ（Interface）１０５、通信Ｉ／Ｆ１０６を備える。ＣＰＵ１０１は、ＲＡＭ１０３をワークエリアとしてＲＯＭ１０２またはＨＤＤ／ＳＳＤ１０４等に格納されたプログラムを実行することで、符号化装置１の動作を統括的に制御する。接続Ｉ／Ｆ１０５は、符号化装置１に接続される機器とのインターフェースである。通信Ｉ／Ｆは、ネットワークを介して他の情報処理機器と通信を行うためのインターフェースである。 FIG. 2 is a diagram showing a hardware configuration example of the coding device 1. Here, an example in which the coding device 1 is configured by a computer (PC) is shown.
As shown in FIG. 2, the coding apparatus 1 has a CPU (Central Processing Unit) 101, a ROM (Read Only Memory) 102, and a RAM (Random Access Memory) 103, which are connected to each other via a bus 107. , HDD (Hard Disk Drive) / SSD (Solid State Drive) 104, connection I / F (Interface) 105, communication I / F 106. The CPU 101 comprehensively controls the operation of the coding device 1 by executing a program stored in the ROM 102, the HDD / SSD 104, or the like with the RAM 103 as a work area. The connection I / F 105 is an interface with a device connected to the coding device 1. The communication I / F is an interface for communicating with other information processing devices via a network.

学習用データや符号化用データの入出力および設定は、接続Ｉ／Ｆ１０５または通信Ｉ／Ｆ１０６を介して行われる。図１で説明した符号化装置１の機能は、ＣＰＵ１０１において所定のプログラムが実行されることで実現される。プログラムは、記録媒体を経由して取得してもよく、ネットワークを経由して取得してもよく、ＲＯＭに組み込んで使用してもよい。また、一般的なコンピュータとプログラムの組合せでなく、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などの論理回路を組むことで、符号化装置１の構成を実現するためのハードウェア構成にしてもよい。 The input / output and setting of the learning data and the coding data are performed via the connection I / F 105 or the communication I / F 106. The function of the coding apparatus 1 described with reference to FIG. 1 is realized by executing a predetermined program in the CPU 101. The program may be acquired via a recording medium, may be acquired via a network, or may be incorporated into a ROM for use. In addition, hardware for realizing the configuration of the encoding device 1 by building a logic circuit such as an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array) instead of a general computer and program combination. It may be configured.

［２．複素ＲＢＭの定義］
次に、パラメータ推定部１１３および符号化部１２３が持つ確率モデルである、複素ＲＢＭについて説明する。
ＲＢＭは、入力データを表現する可視素子と、潜在的な情報を表現する隠れ素子の間に双方向の接続重みが存在する（ただし可視素子間または隠れ素子間には接続はない）と仮定した確率モデルであり、複素ＲＢＭは、実部と虚部を持つ複素数にＲＢＭを拡張したものである。
図３は、本実施の形態例の複素ＲＢＭのグラフ表現例を示す。
図３の例は、複素数となるＩ次元のデータｚ∈Ｃ^Ｉを可視素子とする複素ＲＢＭのモデルを示す。
図３において、ｚは可視素子、ｈは隠れ素子、Ｗ′は可視素子ｚと隠れ素子ｈとの間の双方向結合重みであり、ｂ′は可視素子ｚのバイアス、ｃは隠れ素子ｈのバイアス、ｑは共役を示す。また、各符号の上に付けた線（オーバーライン）は複素共役を示す。 [2. Definition of complex RBM]
Next, the complex RBM, which is a probabilistic model of the parameter estimation unit 113 and the coding unit 123, will be described.
The RBM has assumed that there is a bidirectional connection weight between the visible element representing the input data and the hidden element representing the potential information (although there is no connection between the visible or hidden elements). It is a stochastic model, and the complex RBM is an extension of the RBM to a complex number having a real part and an imaginary part.
FIG. 3 shows a graph representation example of the complex RBM of the present embodiment.
The example of FIG. 3 shows a model of a complex RBM whose visible element is ^I -dimensional data z ∈ CI which is a complex number.
In FIG. 3, z is a visible element, h is a hidden element, W'is a bidirectional coupling weight between the visible element z and the hidden element h, b'is the bias of the visible element z, and c is the hidden element h. Bias, q indicates conjugation. The line (overline) attached above each sign indicates the complex conjugate.

この複素ＲＢＭは、次の［数１］式～［数４］式で定義される。ここでは、Ｉ次元のデータｚ∈Ｃ^Ｉを可視素子とし、確率モデルのパラメータの集合をθとし、上付きのＨはエルミート転置を示す。 This complex RBM is defined by the following equations [Equation 1] to [Equation 4]. Here, the ^I -dimensional data z ∈ CI is a visible element, the set of parameters of the probability model is θ, and the superscript H indicates the Hermitian transpose.

また、［数３］式のΦは、［数５］式で定義され、［数５］式で定義される複素数Ｚの分散と疑似分散（共役複素数との共分散）を表すパラメータは、［数６］式で定義される。ただし、Δは入力されたベクトルが対角成分となる対角行列を返す関数である。 Further, Φ in the equation [Equation 3] is defined by the equation [Equation 5], and the parameter representing the variance and pseudo-variance (covariance with the conjugate complex number) of the complex number Z defined in the equation [Equation 5] is [. It is defined by the equation [Equation 6]. However, Δ is a function that returns a diagonal matrix in which the input vector is a diagonal component.

結局、複素ＲＢＭのパラメータは、θ＝｛ｂ、ｃ、Ｗ、γ、δ｝となる。ここで、［数７］式および［数８］式を導入する。但し、［数７］式および［数８］式において、分数線は要素除算を表す。 After all, the parameter of the complex RBM is θ = {b, c, W, γ, δ}. Here, the [Equation 7] equation and the [Equation 8] equation are introduced. However, in the [Equation 7] and [Equation 8] equations, the fractional line represents element division.

これより、［数９］式となる。 From this, it becomes the formula [Equation 9].

［数３］式で定義されるエネルギー関数は、［数１０］式に書き直すことができる。Ｒは入力された複素数の実部を返す関数である。 The energy function defined by the equation [Equation 3] can be rewritten into the equation [Equation 10]. R is a function that returns the real part of the input complex number.

ここで、エネルギー関数は実数値となる。複素可視素子ｚの各次元は共役複素数との結合が存在するが、通常のＲＢＭ（複素でないＲＢＭ）のように次元間の結合は存在しないことが確認できる。さらに、次の［数１１］式および［数１２］式を用いることで、［数３］式は［数１３］式となる。 Here, the energy function is a real value. It can be confirmed that each dimension of the complex visible element z has a coupling with a conjugate complex number, but there is no coupling between dimensions like a normal RBM (non-complex RBM). Further, by using the following [Equation 11] and [Equation 12] equations, the [Equation 3] equation becomes the [Equation 13] equation.

この［数１３］式から、図３に示すように、ｚとｈ、ｚ（^－）とｈの関係性は、互いに共役空間を挟んで鏡像の関係にあることが分かる。なお、本明細書中に示す「ｚ（^－）」の「（^－）」は、複素共役を示すオーバーラインであり、本来は、図２に示すように「－」が「ｚ」の上に付加されるものであるが、本明細書では記載上の制約から「ｚ（^－）」と記載することとする。他の記号に付加されるオーバーラインについても、本明細書では同様に記載する。
以上の定義から、隠れ素子が与えられたときの可視素子の条件付き確率、および可視素子が与えられたときの隠れ素子の条件付き確率は、それぞれ［数１４］式および［数１５］式で表すことができる。 From this equation [Equation 13], as shown in FIG. 3, it can be seen that the relationship between z and h and the relationship between z ( ⁻ ) and h are mirror images of each other with the conjugate space in between. The "( ^- )" of "z ( ^- )" shown in the present specification is an overline indicating complex conjugate, and originally, "-" is above "z" as shown in FIG. Although it is added, in this specification, it is described as "z ( ^- )" due to the limitation of description. Overlines added to other symbols are described herein as well.
From the above definitions, the conditional probability of the visible element when the hidden element is given and the conditional probability of the hidden element when the visible element is given are given by the equations [Equation 14] and [Equation 15], respectively. Can be represented.

但し、ＣＮ（・;μ，Γ，Ｃ）は平均μ、分散共分散行列Γ、疑似分散共分散行列Ｃの多変量複素正規分布である［数１６］式および［数１７］式で定義される。Ｂ（・；π）は成功確率πの多次元ベルヌーイ分布を表す。ｆ（・）は要素ごとのシグモイド関数を表す。Ｄはｚの次元数である。 However, CN (・; μ, Γ, C) is defined by the [Equation 16] and [Equation 17] equations, which are multivariate complex normal distributions of the mean μ, the variance-covariance matrix Γ, and the pseudovariance-covariance matrix C. To. B (・; π) represents a multidimensional Bernoulli distribution with a success probability of π. f (・) represents a sigmoid function for each element. D is the number of dimensions of z.

［３．学習処理動作および符号化処理動作］
次に、本実施の形態例の複素ＲＢＭを適用して行われる符号化処理について説明する。
図４は、パラメータ学習ユニット１１が行うパラメータ学習動作の流れを示すフローチャートである。
まず、複素数データ取得部１１１が学習用の複素数データを取得し（ステップＳ１１）、前処理部１１２がその複素数データの前処理を実行する（ステップＳ１２）。例えば、複素数データ取得部１１１は、学習用のデータが音声データである場合には、学習用音声データをフレームごと（例えば、５ｍｓｅｃごと）に切り出し、切り出された学習用音声信号にＦＦＴ処理などを施すことでスペクトル特徴量（例えば、ＭＦＣＣやメルケプストラム特徴量）を算出する。なお、この前処理にて学習用データを複素数データとしてもよい。 [3. Learning processing operation and coding processing operation]
Next, the coding process performed by applying the complex RBM of the present embodiment will be described.
FIG. 4 is a flowchart showing the flow of the parameter learning operation performed by the parameter learning unit 11.
First, the complex number data acquisition unit 111 acquires the complex number data for learning (step S11), and the preprocessing unit 112 executes the preprocessing of the complex number data (step S12). For example, when the learning data is voice data, the complex number data acquisition unit 111 cuts out the learning voice data for each frame (for example, every 5 msec), and performs FFT processing on the cut out learning voice signal. By applying, the spectral feature amount (for example, MFCC or mercepstrum feature amount) is calculated. In this preprocessing, the training data may be used as complex number data.

次に、前処理が施された複素数データがパラメータ推定部１１３に供給され、パラメータ推定部１１３は、複素数データのパラメータ学習処理を行う（ステップＳ１３）。ステップＳ１３で行われるパラメータ学習処理の詳細については後述する（図６）。
このパラメータ学習処理で複素ＲＢＭのモデルが持つ各パラメータが決定され、記憶される。そして、記憶されたパラメータが、パラメータを符号化部１２３に引き渡されて、符号化部１２３で符号化される（ステップＳ１４）。 Next, the preprocessed complex number data is supplied to the parameter estimation unit 113, and the parameter estimation unit 113 performs the parameter learning process of the complex number data (step S13). The details of the parameter learning process performed in step S13 will be described later (FIG. 6).
In this parameter learning process, each parameter of the complex RBM model is determined and stored. Then, the stored parameter is passed to the coding unit 123 and encoded by the coding unit 123 (step S14).

図５は、符号化処理ユニット１２が行う符号化処理の流れを示すフローチャートである。
まず、複素数データ取得部１２１が符号化用複素数データを取得し（ステップＳ２１）、前処理部１２２がその複素数データの前処理を実行する（ステップＳ２２）。ここでの前処理は、前処理部１１２が行うステップＳ１２での前処理と同じである。なお、先に前処理部１１２の構成で述べたように、この前処理によって、入力データを複素数データとしてもよい。 FIG. 5 is a flowchart showing the flow of the coding process performed by the coding process unit 12.
First, the complex number data acquisition unit 121 acquires the complex number data for coding (step S21), and the preprocessing unit 122 executes the preprocessing of the complex number data (step S22). The pre-processing here is the same as the pre-processing in step S12 performed by the pre-processing unit 112. As described earlier in the configuration of the preprocessing unit 112, the input data may be converted into complex number data by this preprocessing.

前処理が施された複素数データは符号化部１２３に供給され、符号化部１２３は、ステップＳ１４で引き渡された複素ＲＢＭのモデルが持つパラメータを使って隠れ素子を推定するとともに、符号化処理を行う（ステップＳ２３）。ステップＳ２３で行われる符号化処理の詳細については後述する（図７）。そして、符号化処理ユニット１２は、ステップＳ２３で得られた隠れ素子を符号化データとして出力する（ステップＳ２４）。 The preprocessed complex number data is supplied to the coding unit 123, and the coding unit 123 estimates the hidden element using the parameters of the complex RBM model passed in step S14 and performs the coding process. (Step S23). Details of the coding process performed in step S23 will be described later (FIG. 7). Then, the coding processing unit 12 outputs the hidden element obtained in step S23 as coding data (step S24).

図６は、図４のステップＳ１３で行われるパラメータ学習処理の詳細を示すフローチャートである。
まず、パラメータ推定部１１３は、複素ＲＢＭのモデルが持つパラメータとして任意の値を設定する（ステップＳ３１）。次に、パラメータ推定部１１３の可視素子推定部１１３１に、前処理が施された学習用複素数データを入力する（ステップＳ３２）。
その後、パラメータ推定部１１３は、複素ＲＢＭのモデルの隠れ素子の確率値を計算し、計算値をサンプリングする（ステップＳ３３）。なお、ここで「サンプリングする」とは、条件付き確率密度関数に従うデータをランダムに１つ生成することをいい、以下、同じ意味で用いる。 FIG. 6 is a flowchart showing the details of the parameter learning process performed in step S13 of FIG.
First, the parameter estimation unit 113 sets an arbitrary value as a parameter of the complex RBM model (step S31). Next, the preprocessed complex number data for learning is input to the visible element estimation unit 1131 of the parameter estimation unit 113 (step S32).
After that, the parameter estimation unit 113 calculates the probability value of the hidden element of the complex RBM model and samples the calculated value (step S33). Here, "sampling" means randomly generating one piece of data according to the conditional probability density function, and is used hereinafter with the same meaning.

また、パラメータ推定部１１３は、複素ＲＢＭのモデルの可視素子の確率値を計算し、計算値をサンプリングし（ステップＳ３４）、その後、複素ＲＢＭのモデルの隠れ素子の確率値を再度計算し、計算値を再サンプリングする（ステップＳ３５）。そして、パラメータ推定部１１３は、ここまでの計算で得られた各種パラメータを、複素ＲＢＭのモデルを構成するパラメータとして更新し、更新値を記憶する（ステップＳ３６）。 Further, the parameter estimation unit 113 calculates the probability value of the visible element of the complex RBM model, samples the calculated value (step S34), and then recalculates and calculates the probability value of the hidden element of the complex RBM model. The value is resampled (step S35). Then, the parameter estimation unit 113 updates the various parameters obtained by the calculations so far as parameters constituting the model of the complex RBM, and stores the updated values (step S36).

ステップＳ３６でパラメータを更新した後、パラメータ推定部１１３は、パラメータ学習処理の終了条件を満足したか否か判断し（ステップＳ３７）、終了条件を満足しないと判断した場合には（ステップＳ３７のＮＯ）、ステップＳ３１に戻り、ここまでの処理を繰り返す。また、ステップＳ３７で終了条件を満足したと判断した場合には（ステップＳ３７のＹＥＳ）、パラメータ推定部１１３は、パラメータ学習処理を終了する。なお、ステップＳ３７での終了条件としては、例えば、これら一連のステップの繰り返し数が挙げられる。 After updating the parameters in step S36, the parameter estimation unit 113 determines whether or not the end condition of the parameter learning process is satisfied (step S37), and if it is determined that the end condition is not satisfied (NO in step S37). ), Return to step S31, and repeat the process up to this point. If it is determined in step S37 that the end condition is satisfied (YES in step S37), the parameter estimation unit 113 ends the parameter learning process. The end condition in step S37 includes, for example, the number of repetitions of these series of steps.

図７は、図５のステップＳ３３で行われる符号化処理の詳細を示すフローチャートである。
まず、符号化部１２３は、パラメータ推定部１１３から引き渡されたパラメータを設定する（ステップＳ４１）。次に、符号化部１２３の可視素子推定部１２３１に、前処理が施された符号化用複素数データを入力する（ステップＳ４２）。
その後、符号化部１２３の隠れ素子推定部１２３２は、複素ＲＢＭのモデルの隠れ素子を計算し、推定した隠れ素子を符号化データとして出力する（ステップＳ４３）。 FIG. 7 is a flowchart showing details of the coding process performed in step S33 of FIG.
First, the coding unit 123 sets the parameters passed from the parameter estimation unit 113 (step S41). Next, the preprocessed complex number data for coding is input to the visible element estimation unit 1231 of the coding unit 123 (step S42).
After that, the hidden element estimation unit 1232 of the coding unit 123 calculates the hidden element of the complex RBM model and outputs the estimated hidden element as coding data (step S43).

次に、具体的な学習処理および符号化処理で行われる、複素ＲＢＭのモデルのパラメータ推定処理を、数式を用いて説明する。
パラメータ推定では、次の［数１８］式で示される、入力データ（可視データ）ｚの対数尤度Ｌ（θ）を最大化するように、複素ＲＢＭのパラメータを複素勾配法によって更新する。チルダ付きの変数は、チルダ無しの変数と区別するために導入した。 Next, the parameter estimation process of the complex RBM model, which is performed in the specific learning process and the coding process, will be described using mathematical formulas.
In the parameter estimation, the parameters of the complex RBM are updated by the complex gradient method so as to maximize the log-likelihood L (θ) of the input data (visible data) z represented by the following equation [Equation 18]. Variables with tildes were introduced to distinguish them from variables without tildes.

複素勾配法は、学習率α＞０を用いて、［数１９］式の計算を繰り返し実行することでパラメータを更新する。 In the complex gradient method, the parameters are updated by repeatedly executing the calculation of the equation [Equation 19] using the learning rate α> 0.

但し、［数１９］式における複素数の偏微分は、［数２０］式に示すウェルティンガーの微分である。ここでのｉは、虚数単位である。［数２０］式右辺第一項、第二項はそれぞれ、対数尤度Lの、パラメータθの実部に関する偏微分、虚部に関する偏微分を表す。 However, the partial derivative of the complex number in the equation [Equation 19] is the derivative of Weltinger shown in the equation [Equation 20]. Here, i is an imaginary unit. [Equation 20] The first and second terms on the right side of the equation represent the partial differential of the log-likelihood L with respect to the real part and the partial differential with respect to the imaginary part of the parameter θ, respectively.

各パラメータの偏微分には、観測データ（入力データ）に対する期待値およびモデルの期待値の項が含まれる。モデルの期待値は、計算困難であるため、従来のＲＢＭで計算する場合と同様に、ＣＤ法（Contrastive Divergence法）を用いて近似計算する。
エネルギー関数に対するパラメータの偏微分は、解析的に求めることができ、それぞれ［数２１］式～［数２５］式に示すようになる。 The partial derivative of each parameter includes terms for the expected value for the observed data (input data) and the expected value of the model. Since the expected value of the model is difficult to calculate, it is approximately calculated using the CD method (Contrastive Divergence method) in the same manner as in the case of calculating with the conventional RBM.
The partial differential of the parameter with respect to the energy function can be obtained analytically, and is shown in the equations [Equation 21] to [Equation 25], respectively.

但し、○，｜・｜，・^２は、それぞれ要素ごとの積、絶対値、および二乗を表し、次の［数２６］式および［数２７］式で示される。 However, ◯, |, |, and ² represent the product, the absolute value, and the square of each element, respectively, and are represented by the following equations [Equation 26] and [Equation 27].

分散および疑似分散の更新は、他のパラメータと比較してスケールが異なるため、安定して学習させるために、実際には、［数２８］に示すように置き換え、ｒおよびｓでパラメータ更新を行う。 Since the variance and pseudo-variance updates have different scales compared to other parameters, in order to train them stably, they are actually replaced as shown in [Equation 28], and the parameters are updated with r and s. ..

［４．復号化装置の構成および動作］
図８は、本発明の一実施形態例に係る符号化装置１に対応する復号化装置２の構成例を示したものである。
復号化装置２は、符号化装置１で得られた符号化データを復号化するものであり、例えばコンピュータで構成される。なお、復号化装置２は、符号化装置１と一体化してもよい。
復号化装置２は、パラメータ学習ユニット１１と復号化処理ユニット１３とを備える。
パラメータ学習ユニット１１は、符号化装置１のパラメータ学習ユニット１１と同じであり、パラメータ推定部１１３として、学習処理で得た可視素子および隠れ素子推定する可視素子推定部１１３１および隠れ素子推定部１１３２を備える。 [4. Decryptor configuration and operation]
FIG. 8 shows a configuration example of the decoding device 2 corresponding to the coding device 1 according to the embodiment of the present invention.
The decoding device 2 decodes the coded data obtained by the coding device 1, and is configured by, for example, a computer. The decoding device 2 may be integrated with the coding device 1.
The decoding device 2 includes a parameter learning unit 11 and a decoding processing unit 13.
The parameter learning unit 11 is the same as the parameter learning unit 11 of the coding device 1, and as the parameter estimation unit 113, the visible element estimation unit 1131 and the hidden element estimation unit 1132 for estimating the visible element and the hidden element obtained in the learning process are used. Be prepared.

復号化処理ユニット１３には、符号化装置１で得られた符号化データが供給される。復号化処理ユニット１３は、復号化部１３１を備える。復号化部１３１は、可視素子推定部１３１１と隠れ素子推定部１３１２とを有し、複素ＲＢＭのモデルのパラメータをパラメータ推定部１１３から取得する。
隠れ素子推定部１３１２は、入力した符号化データを隠れ素子とする。そして、可視素子推定部１３１１は、複素ＲＢＭのモデルのパラメータを使った演算により、可視素子の推定値を得る。この可視素子の推定値は、後処理部１３２に供給され、後処理部１３２で後処理が行われる。後処理部１３２では、例えば符号化装置１の前処理部１２２での前処理を元に戻す処理が行われる。
そして、出力部１３３は、後処理が行われた復号化データを出力する。 The coding data obtained by the coding device 1 is supplied to the decoding processing unit 13. The decoding processing unit 13 includes a decoding unit 131. The decoding unit 131 has a visible element estimation unit 1311 and a hidden element estimation unit 1312, and acquires the parameters of the complex RBM model from the parameter estimation unit 113.
The hidden element estimation unit 1312 uses the input coded data as a hidden element. Then, the visible element estimation unit 1311 obtains an estimated value of the visible element by an operation using the parameters of the complex RBM model. The estimated value of this visible element is supplied to the post-processing unit 132, and the post-processing unit 132 performs post-processing. In the post-processing unit 132, for example, a process of restoring the pre-processing in the pre-processing unit 122 of the coding device 1 is performed.
Then, the output unit 133 outputs the decrypted data after the post-processing.

図９は、復号化装置２での復号化の流れを示すフローチャートである。
復号化装置２は、復号化する符号化データを取得すると（ステップＳ５１）、復号化処理ユニット１３が復号化処理を行う。復号化処理の詳細は後述する（図１０）。
復号化処理ユニット１３での処理で得られたデータは、後処理部１３２に供給されて後処理が行われ（ステップＳ５２）、後処理されたデータが出力部１３３から復号化データとして出力される（ステップＳ５３）。 FIG. 9 is a flowchart showing the flow of decoding by the decoding device 2.
When the decoding device 2 acquires the coded data to be decoded (step S51), the decoding processing unit 13 performs the decoding process. Details of the decryption process will be described later (FIG. 10).
The data obtained by the processing in the decoding processing unit 13 is supplied to the post-processing unit 132 and post-processed (step S52), and the post-processed data is output as decryption data from the output unit 133. (Step S53).

図１０は、図９のフローチャートのステップＳ５２での復号化処理の詳細を示す。
まず、復号化部１３１は、パラメータ学習ユニット１１から引き渡された複素ＲＢＭのモデルの各種パラメータを設定する（ステップＳ６１）。ここでは、復号化する符号化データを符号化する際に用いたパラメータ（図１に示す符号化装置１での符号化時に使用したパラメータ）をパラメータ学習ユニット１１から取得して設定する。そして、復号化部１３１の隠れ素子推定部１３１２に、符号化データを入力する（ステップＳ６２）。そして、可視素子推定部１３１１が、複素ＲＢＭのモデルを使って可視素子（復号化データ）を推定する（ステップＳ６３）。
このようにして、符号化とは逆の流れで、符号化データの復号化が可能となる。 FIG. 10 shows the details of the decoding process in step S52 of the flowchart of FIG.
First, the decoding unit 131 sets various parameters of the complex RBM model passed from the parameter learning unit 11 (step S61). Here, the parameters used when coding the coded data to be decoded (parameters used at the time of coding in the coding device 1 shown in FIG. 1) are acquired from the parameter learning unit 11 and set. Then, the coded data is input to the hidden element estimation unit 1312 of the decoding unit 131 (step S62). Then, the visible element estimation unit 1311 estimates the visible element (decoding data) using the model of the complex RBM (step S63).
In this way, the coded data can be decoded in the reverse flow of the coding.

［５．複素ＲＢＭと従来手法（ＧＢ－ＲＢＭ）との相違］
複素数ｚ＝ｘ＋ｉｙは、実部と虚部の連結ベクトルであるｚ′＝［ｘ^Ｔｙ^Ｔ］^Ｔ∈Ｒ^２Ｉを用いることで、従来手法の一つであるＧＢ－ＲＢＭ（Gaussian-Bernoulli ＲＢＭ）で表現することもできる。ＧＢ－ＲＢＭは、次の［数２９］式～［数３２］式で示される。 [5. Differences between complex RBM and conventional method (GB-RBM)]
The complex number z = x + iy is GB-RBM (Gaussian-Bernoulli RBM), which is one of the conventional methods, by using z'= [x ^T y ^T ] ^T ∈ R ^2I , which is a connecting vector of the real part and the imaginary part. It can also be expressed by. GB-RBM is represented by the following equations [Equation 29] to [Equation 32].

但し、Σ_ｘ＝Δ（σ_ｘ ^２），Σ_ｙ＝Δ（σ_ｙ ^２）である。この場合、例えばエネルギー関数に対する実部および虚部のバイアスパラメータの偏微分はそれぞれ、［数３３］式および［数３４］式で示される。 However, Σ _x = Δ (σ _x ² ) and Σ _y = Δ (σ _y ² ). In this case, for example, the partial differentials of the bias parameters of the real part and the imaginary part with respect to the energy function are shown by the equations [Equation 33] and [Equation 34], respectively.

一方で、ｚ＝ｘ＋ｉｙ，ｂ＝ｂ^Ｒ＋ｉｂ^Ｉ，Ｗ＝Ｗ^Ｒ＋ｉＷ^Ｉ，ｑ＝ｑ^Ｒ＋ｉｑ^Ｉとし、複素ＲＢＭのエネルギー関数（［数３］式の右辺）を書き換えると、［数３５］式となる。 On the other hand, if z = x + iy, b = b ^R + ib ^I , W = ^WR + ^iWI , q = q ^R + iq ^I , and the energy function of the complex RBM (the right side of the equation [Equation 3]) is rewritten, [Equation 35] ] Expression.

但し、次の［数３６］式～［数４２］式で示す条件を設定した。 However, the conditions shown in the following equations [Equation 36] to [Equation 42] were set.

ここで、［数３１］式と［数３５］式を比較すると、本実施の形態例による複素ＲＢＭによるモデル化では、ｘとｙのクロスターム（ｘ^ＴΣ_ｘｙ ^－１ｙ）が含まれていることが分かる。すなわち、複素ＲＢＭでは、従来手法の１つであるＧＢ－ＲＢＭによる複素表現に加えて、特徴量次元ごとに実部と虚部との関係性を考慮した拡張表現であると言える。
さらに、ＧＢ－ＲＢＭによる複素表現では、［数３３］式，［数３４］式で示されるように、観測データの実部と虚部のバイアスがそれぞれ独立して計算（例えば実部バイアスの更新では実部のみの情報が用いられ）されるのに対して、複素ＲＢＭのバイアスパラメータの更新式（［数２１］式）では、実部と虚部の両方が用いられて更新される。そのため、本実施の形態例による複素ＲＢＭによるモデル化では、複素数のデータ構造を保ったまま学習を行うことができる。 Here, when the equation [Equation 31] and the equation [Equation 35] are compared, the cross-term of x and y (x ^T Σ _xy ^-1 y) is included in the modeling by the complex RBM according to the embodiment of the present embodiment. You can see that there is. That is, it can be said that the complex RBM is an extended expression that considers the relationship between the real part and the imaginary part for each feature dimension, in addition to the complex expression by GB-RBM, which is one of the conventional methods.
Furthermore, in the complex representation by GB-RBM, the biases of the real part and the imaginary part of the observed data are calculated independently (for example, updating the real part bias) as shown by the equations [Equation 33] and [Equation 34]. In (the information of only the real part is used), in the update formula of the bias parameter of the complex RBM ([Equation 21] formula), both the real part and the imaginary part are used and updated. Therefore, in the modeling by the complex RBM according to the embodiment of the present embodiment, the learning can be performed while maintaining the data structure of the complex number.

［６．実験例］
次に、本実施の形態例による複素ＲＢＭによるモデルの有効性を検証するために実験した例について説明する。
ここでは、本実施の形態例による複素ＲＢＭによるモデルの有効性を確認するため、音声データの符号化を行い、その符号化音声の品質評価実験を行った。具体的には、Repeated Harvard Sentence Prompts (REHASP)²コーパスを用いた再構築音声の品質評価実験を行い、同コーパスから１リピート分の音声（３０センテンス，約２０秒，サンプリングレート１６ｋＨｚ）を使用した。そして、窓幅２５６，６４サンプルオーバーラップの短時間フーリエ変換を施した複素スペクトル（１２９次元）を可視素子として、隠れ素子数２００の複素ＲＢＭを学習させた。この際、学習率０．０１、モーメント係数０．１、バッチサイズ１００、繰り返し回数１００の確率的勾配法を用いた。また、比較手法として、同じ複素スペクトルデータの実部と虚部を連結したベクトルを可視素子としたＧＢ－ＲＢＭ（隠れ素子数は２００）を、同様の条件で学習させた。 [6. Experimental example]
Next, an example experimented to verify the validity of the model by the complex RBM according to the present embodiment will be described.
Here, in order to confirm the validity of the model by the complex RBM according to the embodiment of the present embodiment, voice data was coded and a quality evaluation experiment of the coded voice was performed. Specifically, we conducted a quality evaluation experiment of reconstructed speech using the Repeated Harvard Sentence Prompts (REHASP) ² corpus, and used one repeat of speech (30 sentences, about 20 seconds, sampling rate 16 kHz) from the corpus. .. Then, a complex RBM having 200 hidden elements was trained using a complex spectrum (129 dimensions) subjected to a short-time Fourier transform with a window width of 256,64 sample overlap as a visible element. At this time, a stochastic gradient descent method with a learning rate of 0.01, a moment coefficient of 0.1, a batch size of 100, and a number of repetitions of 100 was used. Further, as a comparison method, GB-RBM (the number of hidden elements is 200) in which a vector connecting the real part and the imaginary part of the same complex spectrum data is used as a visible element was trained under the same conditions.

図１１は、符号化前のオリジナルの振幅スペクトル（図１１Ａ）と、本実施の形態例による複素ＲＢＭによるモデルによって復元されたスペクトル（図１１Ｂ）とを比較した図である。図１１において、それぞれ縦軸は周波数、横軸は時間を示す。この図１１から分かるように、本実施の形態例による複素ＲＢＭによるモデルによって復元されたスペクトルは、オリジナルのスペクトルに近く、本実施の形態例による複素ＲＢＭは、高い精度で音声スペクトルのエンコードおよびデコードが可能であることが確認できる。 FIG. 11 is a diagram comparing the original amplitude spectrum before coding (FIG. 11A) with the spectrum restored by the model by the complex RBM according to the embodiment of the present embodiment (FIG. 11B). In FIG. 11, the vertical axis represents frequency and the horizontal axis represents time. As can be seen from FIG. 11, the spectrum restored by the model by the complex RBM according to the present embodiment is close to the original spectrum, and the complex RBM according to the present embodiment encodes and decodes the voice spectrum with high accuracy. Can be confirmed to be possible.

図１２は、本実施の形態例による複素ＲＢＭ（Comp RBM）と、従来のＲＢＭ（ＧＢ－ＲＢＭ）とを、学習中の再構築エラーによって比較した様子を示し、縦軸は再構築エラーの数、横軸は時系列の変化を示す。「Adam」または「Ada Grad」は最適化手法にそれぞれAdamまたはAda Gradを使用した場合の結果であり、表記のないものは最適化手法に確率的勾配法を用いた場合の結果を示す。
図１２では、複素ＲＢＭ（Comp RBM）を単独で適用した例と、他の方式と組み合わせた例（Comp RBM＋Ada Grad、Comp RBM＋Adam）と、従来のＲＢＭを単独で適用した例と、他の方式と組み合わせた例（RBM＋Ada Grad、RBM＋Adam）との６つの例を示す。
例えば、複素ＲＢＭにAdamを組み合わせた例［Comp RBM＋Adam：太い実線］の特性は、従来のＲＢＭにAdamを組み合わせた例［RBM＋Adam：細い実線］の特性よりも早く収束し、収束時のエラーも低いことが分かる。複素ＲＢＭのみを適用した例［Comp RBM：２点鎖線］の特性についても、従来のＲＢＭのみを適用した例［RBM：１点鎖線］の特性よりも早く収束し、収束時のエラーも低いことが分かる。 FIG. 12 shows a comparison between the complex RBM (Comp RBM) according to the present embodiment and the conventional RBM (GB-RBM) based on the reconstruction error during learning, and the vertical axis indicates the number of reconstruction errors. , The horizontal axis shows the change over time. "Adam" or "Ada Grad" is the result when Adam or Ada Grad is used as the optimization method, respectively, and the one without notation shows the result when the stochastic gradient descent method is used as the optimization method.
In FIG. 12, an example in which a complex RBM (Comp RBM) is applied alone, an example in which it is combined with another method (Comp RBM + Ada Grad, Comp RBM + Adam), an example in which a conventional RBM is applied alone, and another method are shown. Six examples with combined examples (RBM + Ada Grad, RBM + Adam) are shown.
For example, the characteristics of the example of combining Adam with complex RBM [Comp RBM + Adam: thick solid line] converge faster than the characteristics of the example of combining Adam with conventional RBM [RBM + Adam: thin solid line], and the error at the time of convergence is low. You can see that. The characteristics of the example [Comp RBM: two-dot chain line] to which only the complex RBM is applied also converge faster than the characteristics of the example [RBM: one-dot chain line] to which only the conventional RBM is applied, and the error at the time of convergence is low. I understand.

［７．変形例］
なお、図３に示す複素ＲＢＭのモデルは、１層のＲＢＭの構成を示したが、ＲＢＭを多層に積み重ねたＤＢＮ(Deep Belief Net) に、本発明の複素ＲＢＭを適用してもよい。
図１３は、複素ＲＢＭを３層化した例を示す。
実部は、可視素子ｚから１層目の隠れ符号ｈ_１と、その隠れ素子ｈのバイアスｃ_１を得る。虚部は、可視素子ｚ（^－）から１層目の隠れ符号ｈ_１と、その隠れ素子ｈのバイアスｃ_１を得る。Ｗ_１′およびＷ_１′（^－）は、可視素子ｚと隠れ素子ｈ_１との間の双方向結合重みである。
１層目の実部の隠れ符号ｈ_１およびバイアスｃ_１から、２層目の隠れ符号ｈ_２およびバイアスｃ_２を得、１層目の実部の隠れ符号ｈ_１およびバイアスｃ_１から、２層目の隠れ符号ｈ_２およびバイアスｃ_２を得る。Ｗ_２′およびＷ_２′（^－）は隠れ素子ｈ_１と隠れ素子ｈ_２との間の双方向結合重みである。
さらに、２層目の実部の隠れ符号ｈ_２およびバイアスｃ_２から、３層目の隠れ符号ｈ_３およびバイアスｃ_３を得、２層目の実部の隠れ符号ｈ_２およびバイアスｃ_２から、３層目の隠れ符号ｈ_３およびバイアスｃ_３を得る。Ｗ_３′およびＷ_３′（^－）は隠れ素子ｈ_２と隠れ素子ｈ_３との間の双方向結合重みである。
このように、多層化した複素ＲＢＭによっても、同様に符号化および復号化ができるようになる。 [7. Modification example]
Although the model of the complex RBM shown in FIG. 3 shows the configuration of a single layer RBM, the complex RBM of the present invention may be applied to a DBN (Deep Belief Net) in which RBMs are stacked in multiple layers.
FIG. 13 shows an example in which the complex RBM is three-layered.
The real part obtains the hidden code h ₁ of the first layer from the visible element z and the bias c ₁ of the hidden element h. The imaginary portion obtains the hidden code h ₁ of the first layer from the visible element z ( ⁻ ) and the bias c ₁ of the hidden element h. W ₁ ′ and W ₁ ′ ( ⁻ ) are bidirectional coupling weights between the visible element z and the hidden element h ₁ .
From the hidden sign h ₁ and the bias c ₁ of the real part of the first layer, the hidden sign h ₂ and the bias c ₂ of the second layer are obtained, and from the hidden sign h ₁ and the bias c ₁ of the real part of the first layer, 2 The hidden sign h ₂ and the bias c ₂ of the layer are obtained. W ₂ ′ and W ₂ ′ ( ⁻ ) are bidirectional coupling weights between the hidden element h ₁ and the hidden element h ₂ .
Further, from the hidden code h ₂ and the bias c ₂ of the real part of the second layer, the hidden code h ₃ and the bias c ₃ of the third layer are obtained, and from the hidden code h ₂ and the bias c ₂ of the real part of the second layer. The hidden sign h ₃ and the bias c ₃ of the third layer are obtained. W ₃ ′ and W ₃ ′ ( ⁻ ) are bidirectional coupling weights between the hidden element h ₂ and the hidden element h ₃ .
In this way, even with the multi-layered complex RBM, coding and decoding can be performed in the same manner.

また、上述した実施の形態例では、実験例として音声データに適用した場合を説明したが、本発明による複素ＲＢＭは、他の様々の信号の符号化および復号化に適用が可能である。例えば画像データの符号化および復号化に本発明による複素ＲＢＭを適用してもよい。さらに、本発明による複素ＲＢＭは、音声データや画像データ以外のデータの符号化および復号化に適用してもよい。 Further, in the above-described embodiment, the case of applying to voice data has been described as an experimental example, but the complex RBM according to the present invention can be applied to coding and decoding of various other signals. For example, the complex RBM according to the present invention may be applied to the coding and decoding of image data. Further, the complex RBM according to the present invention may be applied to coding and decoding of data other than audio data and image data.

１・・・符号化装置、２・・・復号化装置、１１・・・パラメータ学習ユニット、１２・・・符号化処理ユニット、１３・・・復号化処理ユニット、１０１・・・ＣＰＵ（中央制御ユニット）、１０２・・・ＲＯＭ、１０３・・・ＲＡＭ、１０４・・・ＨＤＤ／ＳＤＤ、１０５・・・接続Ｉ／Ｆ、１０６・・・通信Ｉ／Ｆ、１１１，１２１・・・複素数データ取得部、１１２，１２２・・・前処理部、１１３・・・パラメータ推定部、１２３・・・符号化部、１３１・・・復号化部、１３２・・・後処理部、１３３・・・出力部、１１３１，１２３１，１３１１・・・可視素子推定部、１１３２，１２３２，１３１２・・・隠れ素子推定部
1 ... Coding device, 2 ... Decoding device, 11 ... Parameter learning unit, 12 ... Coding processing unit, 13 ... Decoding processing unit, 101 ... CPU (central control) Unit), 102 ... ROM, 103 ... RAM, 104 ... HDD / SDD, 105 ... Connection I / F, 106 ... Communication I / F, 111, 121 ... Complex number data acquisition Units 112, 122 ... Pre-processing unit, 113 ... Parameter estimation unit, 123 ... Encoding unit, 131 ... Decoding unit, 132 ... Post-processing unit, 133 ... Output unit , 1131, 1231, 1311 ... Visible element estimation unit, 1132, 1232, 1312 ... Hidden element estimation unit

Claims

The hidden element is applied to the learning data by applying a probabilistic model by a restricted Boltzmann machine assuming that there is a coupling weight between the visible element representing the input data and the hidden element expressing the potential information. And a parameter learning unit that performs processing to estimate the connection weight,
A coding unit that estimates the hidden element by applying the probability model by the restricted Boltzmann machine estimated by the parameter learning unit to the input data for coding, and outputs the estimated hidden element as coding data. And with
The training data and the coding input data are complex number data, and the energy function of the probability model by the restricted Boltzmann machine includes a cross-term of a real part and an imaginary part.
Coding device.

The probability model by the restricted Boltsman machine has a visible element z composed of I-dimensional data z ∈ CI and a hidden element h, the parameter set of the model is θ, and the parameters constituting the parameter set θ are b, c, and so on. W, γ, and δ, the bias of the visible element is b ∈ CI, the bias of the hidden element is c ∈ RJ, the complex conjugate weight between the visible element and the hidden element is W ∈ CI × J, and the overline of each sign is Complex conjugate, H is defined by the following equation when it is Hermitian translocation.

The coding device according to claim 1.

The coding device according to any one of claims 1 to 2, further comprising a decoding processing unit for decoding the coded data obtained by the coding unit.

The hidden element is applied to the learning data by applying a probabilistic model by a restricted Boltzmann machine assuming that there is a coupling weight between the visible element representing the input data and the hidden element expressing the potential information. And the parameter learning process in which the arithmetic processing unit executes the process of estimating the join weight, and
The arithmetic processing unit executes the process of estimating the hidden element by applying the probability model by the restricted Boltzmann machine estimated by the parameter learning process to the input data for coding, and the estimated hidden element is coded. Coding processing to output as conversion data and
Including
The learning data obtained by the parameter learning process and the coding input data obtained by the coding process are complex number data, and the cross-term of the real part and the imaginary part in the energy function of the probability model by the restricted Boltzmann machine. It is included
Coding method.

The hidden element is applied to the input data by applying a probabilistic model by a restricted Boltzmann machine assuming that there is a coupling weight between the visible element representing the input data and the hidden element expressing the potential information. And the parameter learning step of obtaining the training data which is the complex number data by performing the process of estimating the connection weight.
A code that estimates the hidden element by applying a probability model by the restricted Boltzmann machine estimated in the parameter learning step to the input data, and outputs the estimated hidden element as coded data which is complex data. Including the conversion step,
The energy function of the stochastic model by the restricted Boltzmann machine contains the cross-term of the real part and the imaginary part.
A program that causes a computer to execute each of the above steps.