JP2018028580A

JP2018028580A - Sound source enhancement learning device, sound source enhancement device, sound source enhancement learning method, and program

Info

Publication number: JP2018028580A
Application number: JP2016159692A
Authority: JP
Inventors: 悠馬小泉; Yuma Koizumi; 健太丹羽; Kenta Niwa; 小林　和則; Kazunori Kobayashi; 和則小林
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-08-16
Filing date: 2016-08-16
Publication date: 2018-02-22
Anticipated expiration: 2036-08-16
Also published as: JP6563874B2

Abstract

PROBLEM TO BE SOLVED: To provide a sound source enhancement learning device for learning sound source enhancement in which sound quality degradation is suppressed.SOLUTION: A sound source enhancement learning device includes: a Wiener filter templating unit 110 for a finite number of Wiener filter templates from a set of frequency domain target sound learning data and frequency domain noise learning data; an action-value function initialization unit 120 for initializing an action-value function; a sound source enhancement unit 130 for applying an appropriate Wiener filter template selected on the basis of a state vector generated from the set of frequency domain target sound learning data and frequency domain noise learning data and a value of the action-value function calculated using Wiener filter template data to a frequency domain observation signal and thereby generating an enhanced target sound; an action-value function update unit 150 for updating the action-value function using an auditory evaluation value calculated from the enhanced target sound; and a convergence determination unit 160 for outputting the action-value function when a prescribed convergence condition is satisfied.SELECTED DRAWING: Figure 3

Description

本発明は、音源強調技術に関するものであり、特に強化学習を用いて学習したウィナーフィルタによる音源強調技術に関する。 The present invention relates to a sound source enhancement technique, and more particularly to a sound source enhancement technique using a Wiener filter learned using reinforcement learning.

音声認識やスポーツ中継など音を使った情報処理技術では、マイクロホンを用いて特定の欲しい音（以下、目的音という）をクリアに収音する必要がある。ところが、現状のマイクロホンで音強調すると、目的音の他に周囲の雑音も一緒に収音してしまう。この一緒に収音される雑音の影響により、音声認識では音声が雑音に埋もれ音声認識が困難になるという問題がある。また、スポーツ中継では競技音が歓声にかき消され臨場感が伝わらなくなるという問題がある。 In information processing technology using sound such as voice recognition and sports broadcasting, it is necessary to clearly collect a specific desired sound (hereinafter referred to as a target sound) using a microphone. However, when the sound is emphasized with the current microphone, ambient noise is collected together with the target sound. Due to the influence of the noise collected together, there is a problem that in speech recognition, speech is buried in noise and speech recognition becomes difficult. Also, there is a problem that the sports sound is drowned out in cheers and the sense of realism is not transmitted in sports broadcasting.

このような問題を解決するための技術として、目的音だけをクリアに強調する音源強調技術がある。音源強調とは、時刻tにおいて強調したい目的音S_ω,tと雑音N_ω,tが混ざり合った、マイクロホンで収音された観測信号X_ω,tから目的音S_ω,tだけを強調するものである（式(1)参照）。

ここで、t∈{1, …, T}とω∈{1, …, Ω}は、それぞれ時間と周波数のインデックスである。 As a technique for solving such a problem, there is a sound source enhancement technique that clearly emphasizes only a target sound. Sound source enhancement emphasizes only the target sound S _{ω, t} from the observed signal X _{ω, t} collected by the microphone, which is a mixture of the target sound S _{ω, t} and the noise N _{ω, t to} be emphasized at time t (See equation (1)).

Here, t∈ {1,..., T} and ω∈ {1,..., Ω} are time and frequency indexes, respectively.

ウィナーフィルタによる音源強調では、目的音S_ω,tと雑音N_ω,tが無相関であると仮定して、以下の式で目的音S_ω,tを強調した信号（強調目的音）Y_ω,tを得る。

In the sound source enhancement using the Wiener filter, the target sound S _{ω, t} and the noise N _{ω, t} are assumed to be uncorrelated and the signal (emphasis target sound) Y _ω that emphasizes the target sound S _{ω, t} by the following equation _{, t} .

つまり、ウィナーフィルタによる音源強調では、観測信号X_ω,tからいかに正確にウィナーフィルタG_ω,tを設計するかが重要となる。 That is, in the sound source enhancement by the Wiener filter, it is important to design the Wiener filter _{Gω, t} accurately from the observation signal _{Xω, t} .

近年、統計的機械学習の技術を用いることで、ウィナーフィルタ設計の精度が向上することが分かってきた（非特許文献１）。統計的機械学習に基づくウィナーフィルタ設計では、学習データから理想的なウィナーフィルタ^G_ω,tを予測する関数M(X_ω,t)を学習する。

In recent years, it has been found that the accuracy of the Wiener filter design is improved by using a statistical machine learning technique (Non-Patent Document 1). In the Wiener filter design based on statistical machine learning, a function M (X _{ω, t} ) for predicting an ideal Wiener filter ^ G _{ω, t} is learned from learning data.

まず、目的音の学習データS_{ω,1,…,Ttrain}と雑音の学習データN_{ω,1,…,Ttrain}を大量に集める。次に、式(1)と式(3)に基づき擬似的な観測音X_{ω,1,…,Ttrain}と理想的なウィナーフィルタG_{ω,1,…,Ttrain}を設計する。そして、X_ω,tを^G_ω,tに変換する関数M(・)をニューラルネットワークなどで表現し、擬似的な観測音X_{ω,1,…,Ttrain}と理想的なウィナーフィルタG_{ω,1,…,Ttrain}の組を用いて学習する。このとき、関数M(・)の学習の基準には、二乗誤差の目的関数が用いられることが多い。例えば、ウィナーフィルタの誤差に基づく式(5)や目的音の誤差に基づく式(6)などを用いる。

具体的な最適化手続きでは、式(5)や式(6)の二乗誤差を関数M(・)のパラメータで偏微分し勾配法で最小化することにより、関数M(・)を学習する。 First, the learning data S _{ω, 1,..., Ttrain of} the target sound and the learning data N _{ω, 1,.} Next, pseudo observation sound _{Xω, 1, ..., Ttrain} and ideal Wiener filter _{Gω, 1, ..., Ttrain} are designed based on Equation (1) and Equation (3). Then, a function M (•) for converting X _{ω, t} to ^ G _{ω, t} is expressed by a neural network or the like, and a pseudo observed sound X _{ω, 1, ..., Ttrain} and an ideal Wiener filter G _ω, Learn using a set of _{1, ..., Ttrain} . At this time, a square error objective function is often used as a learning criterion for the function M (•). For example, Equation (5) based on the Wiener filter error or Equation (6) based on the error of the target sound is used.

In a specific optimization procedure, the function M (•) is learned by partially differentiating the square error of the equations (5) and (6) with the parameters of the function M (•) and minimizing by the gradient method.

ところが、例えば、式(5)を基準として学習した関数M(・)を用いて設計したウィナーフィルタでは、目的音は強調されるが、非線形歪みなどの影響により音質が劣化してしまうという問題が生じる。これは、式(5)が目的音と出力音のスペクトルの近さだけを基準にしており、人間の聴覚の特性などを考慮していないことに起因する。 However, for example, in the Wiener filter designed using the function M (・) learned on the basis of Equation (5), the target sound is emphasized, but the sound quality deteriorates due to the influence of nonlinear distortion and the like. Arise. This is because Equation (5) is based only on the closeness of the spectrum of the target sound and the output sound, and does not consider the characteristics of human hearing.

ところで、人間の聴覚の特性などを考慮した音源強調の性能指標として、PESQ（非特許文献２）、STOI（非特許文献３）、PEASS（非特許文献４）などが知られている。以下、これらの性能指標を総称して聴感評点と呼ぶことにする。この聴感評点は、その値が高いほど、人間が聴いてよい音質であると知覚することを示している。 By the way, PESQ (Non-Patent Document 2), STOI (Non-Patent Document 3), PEASS (Non-Patent Document 4), and the like are known as performance indexes for sound source enhancement in consideration of human auditory characteristics and the like. Hereinafter, these performance indexes are collectively referred to as auditory ratings. This auditory rating score indicates that the higher the value, the more perceived that the sound quality is acceptable for human beings.

このような性質を有する聴感評点が大きくなるように統計的音源強調を学習することができれば、音質が劣化しないウィナーフィルタの設計が可能となる。 If statistical sound source emphasis can be learned so that the auditory rating having such a property becomes large, it is possible to design a Wiener filter that does not deteriorate the sound quality.

Y. Xu, J. Du, L. R. Dai and C. H. Lee, “A regression approach to speech enhancement based on deep neural networks”, IEEE/ACM Trans. Audio, Speech and Language Processing, Vol.23, No.1, pp.7-19, 2015.Y. Xu, J. Du, LR Dai and CH Lee, “A regression approach to speech enhancement based on deep neural networks”, IEEE / ACM Trans.Audio, Speech and Language Processing, Vol.23, No.1, pp. 7-19, 2015. A. W. Rix, J. G. Beerends, M. P. Hollier and A. P. Hekstra, “Perceptual evaluation of speech quality (PESQ): A new method for speech quality assessment of telephone networks and codecs”, in Proc. ICASSP ’01, pp.749-752, 2001.AW Rix, JG Beerends, MP Hollier and AP Hekstra, “Perceptual evaluation of speech quality (PESQ): A new method for speech quality assessment of telephone networks and codecs”, in Proc. ICASSP '01, pp.749-752, 2001 . C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech”, IEEE Trans. Audio, Speech and Language Processing, Vol.19, No.7, pp.2125-2136, 2011.CH Taal, RC Hendriks, R. Heusdens, and J. Jensen, “An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech”, IEEE Trans. Audio, Speech and Language Processing, Vol. 19, No. 7, pp. 2125-2136, 2011. Valentin Emiya, Emmanuel Vincent, Niklas Harlander and Volker Hohmann, “Subjective and objective quality assessment of audio source separation”, IEEE Trans. Audio, Speech and Language Processing, Vol.19, No.7, pp.2046-2057, 2011.Valentin Emiya, Emmanuel Vincent, Niklas Harlander and Volker Hohmann, “Subjective and objective quality assessment of audio source separation”, IEEE Trans.Audio, Speech and Language Processing, Vol.19, No.7, pp.2046-2057, 2011.

ところが、聴感評点を最大化するように統計的音源強調を学習するには２つの問題がある。
(1)聴感評点は、式(5)や式(6)のような単純な計算式で計算できるものではなく、複雑な計算式を用いて計算される。そのため、その導関数を求めることが困難であり、勾配法などを用いて直接聴感評点を最大化することができない。
(2)聴感評点は、式(5)や式(6)のようにフレームごとに求まるのではなく、１つの音源データ（例えば、音声であれば一発話）が終わらないと計算することができない。 However, there are two problems in learning statistical sound source enhancement so as to maximize the auditory rating.
(1) The auditory score cannot be calculated by a simple calculation formula such as Expression (5) or Expression (6), but is calculated using a complicated calculation expression. Therefore, it is difficult to obtain the derivative, and the auditory rating cannot be maximized directly using a gradient method or the like.
(2) The auditory score cannot be calculated unless one sound source data (for example, one utterance for speech) is completed, instead of being obtained for each frame as in Equation (5) and Equation (6). .

このため、聴感評点が大きくなるような統計的音源強調の学習を実現することは困難であった。換言すれば、音質が劣化しないウィナーフィルタの設計は困難であった。 For this reason, it has been difficult to realize statistical sound source enhancement learning that increases the auditory score. In other words, it has been difficult to design a Wiener filter that does not deteriorate the sound quality.

そこで本発明では、強化学習を用いて、音質劣化を抑制した音源強調の学習を行う音源強調学習装置を提供することを目的とする。 Therefore, an object of the present invention is to provide a sound source enhancement learning apparatus that performs sound source enhancement learning while suppressing sound quality deterioration using reinforcement learning.

本発明の一態様は、周波数領域目的音学習データと周波数領域雑音学習データの組から有限個のウィナーフィルタをウィナーフィルタテンプレートとして生成するウィナーフィルタテンプレート化部と、行動価値関数を初期化する行動価値関数初期化部と、前記周波数領域目的音学習データと前記周波数領域雑音学習データの組から生成される周波数領域観測信号を用いて表現される状態ベクトルを生成し、前記状態ベクトルと前記ウィナーフィルタテンプレートを用いて計算した前記行動価値関数の値に基づいて選択した最適なウィナーフィルタテンプレートを前記周波数領域観測信号に適用することにより、強調目的音を生成する音源強調部と、前記強調目的音から計算された聴感評点を用いて前記行動価値関数を更新する行動価値関数更新部と、所定の収束条件を満たした場合に前記行動価値関数を出力する収束判定部とを含む。 One aspect of the present invention includes a winner filter template generation unit that generates a finite number of winner filters as a winner filter template from a set of frequency domain target sound learning data and frequency domain noise learning data, and an action value that initializes an action value function. A function initialization unit; and a state vector expressed using a frequency domain observation signal generated from a set of the frequency domain target sound learning data and the frequency domain noise learning data, and the state vector and the Wiener filter template A sound source emphasizing unit that generates an emphasized target sound by applying an optimal Wiener filter template selected based on the value of the behavior value function calculated using the frequency domain observation signal, and calculating from the emphasized target sound Action value function for updating the action value function using the obtained auditory score Including a Shinbu, and a convergence determination unit which outputs the action value function if it meets a predetermined convergence condition.

本発明の一態様は、周波数領域目的音学習データと周波数領域雑音学習データの組から第１の基準を満たすウィナーフィルタをウィナーフィルタテンプレートとして生成するウィナーフィルタテンプレート化部と、行動価値関数の初期値を生成する行動価値関数初期化部と、前記周波数領域目的音学習データと前記周波数領域雑音学習データの組から生成される周波数領域観測信号を用いて表現される状態ベクトルを生成し、前記状態ベクトルと前記ウィナーフィルタテンプレートを用いて計算した前記行動価値関数の値に基づいて選択した最適なウィナーフィルタテンプレートを前記周波数領域観測信号に適用することにより、強調目的音を生成する音源強調部と、前記強調目的音を評価した値である聴感評点を用いて、前記第１の基準と第２の基準とを満たすウィナーフィルタテンプレートが選択されるように前記行動価値関数を更新する行動価値関数更新部と、所定の収束条件を満たした場合に前記行動価値関数を出力する収束判定部とを含む。 One aspect of the present invention includes a winner filter template generation unit that generates a winner filter that satisfies the first criterion from a set of frequency domain target sound learning data and frequency domain noise learning data as a winner filter template, and an initial value of an action value function Action value function initialization unit for generating a state vector expressed using a frequency domain observation signal generated from a set of the frequency domain target sound learning data and the frequency domain noise learning data, and the state vector And applying the optimal winner filter template selected based on the value of the action value function calculated using the winner filter template to the frequency domain observation signal, and a sound source emphasizing unit that generates an enhanced target sound, Using the auditory rating score, which is a value obtained by evaluating the emphasis target sound, An action value function updating unit that updates the action value function so that a winner filter template that satisfies the criterion 2 is selected, and a convergence determination unit that outputs the action value function when a predetermined convergence condition is satisfied. Including.

本発明によれば、強化学習を用いることにより、音質劣化を抑制した音源強調の学習を行うことが可能となる。 According to the present invention, it is possible to perform sound source enhancement learning while suppressing deterioration in sound quality by using reinforcement learning.

ゲームを用いて強化学習の概念を説明する図。The figure explaining the concept of reinforcement learning using a game. 聴感評点低下を抑制するための重みづけを説明する図。The figure explaining the weighting for suppressing a hearing score fall. 音源強調学習装置１００の構成を示すブロック図。FIG. 2 is a block diagram showing a configuration of a sound source enhancement learning device 100. 音源強調学習装置１００の動作を示すフローチャート。5 is a flowchart showing the operation of the sound source enhancement learning apparatus 100. ウィナーフィルタテンプレート化部１１０の構成を示すブロック図。The block diagram which shows the structure of the winner filter template-izing part 110. FIG. ウィナーフィルタテンプレート化部１１０の動作を示すフローチャート。The flowchart which shows operation | movement of the winner filter template-izing part 110. FIG. 音源強調部１３０の構成を示すブロック図。The block diagram which shows the structure of the sound source emphasis part. 音源強調部１３０の動作を示すフローチャート。5 is a flowchart showing the operation of the sound source emphasizing unit 130. 音源強調学習装置２００の構成を示すブロック図。The block diagram which shows the structure of the sound source emphasis learning apparatus. 音源強調学習装置２００の動作を示すフローチャート。The flowchart which shows operation | movement of the sound source emphasis learning apparatus 200. 聴感評点バイナリ化部２４０の構成を示すブロック図。The block diagram which shows the structure of the auditory score binarization part 240. FIG. 聴感評点バイナリ化部２４０の動作を示すフローチャート。The flowchart which shows operation | movement of the auditory score binarization part 240. FIG. 音源強調学習装置３００の構成を示すブロック図。The block diagram which shows the structure of the sound source emphasis learning apparatus 300. FIG. 音源強調学習装置３００の動作を示すフローチャート。The flowchart which shows operation | movement of the sound source emphasis learning apparatus 300. 更新重み計算部３４０の構成を示すブロック図。The block diagram which shows the structure of the update weight calculation part 340. FIG. 更新重み計算部３４０の動作を示すフローチャート。The flowchart which shows operation | movement of the update weight calculation part 340. 音源強調学習装置４００の構成を示すブロック図。The block diagram which shows the structure of the sound source emphasis learning apparatus. 音源強調学習装置４００の動作を示すフローチャート。5 is a flowchart showing the operation of the sound source enhancement learning apparatus 400. 行動価値関数初期化部４２０の構成を示すブロック図。The block diagram which shows the structure of the action value function initialization part 420. FIG. 行動価値関数初期化部４２０の動作を示すフローチャート。The flowchart which shows operation | movement of the action value function initialization part 420. 音源強調装置５００の構成を示すブロック図。FIG. 3 is a block diagram showing a configuration of a sound source enhancement apparatus 500. 音源強調装置５００の動作を示すフローチャート。5 is a flowchart showing the operation of the sound source emphasizing apparatus 500.

以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the structure part which has the same function, and duplication description is abbreviate | omitted.

まず、強化学習について説明する。
＜強化学習(Reinforcement Learning)＞
強化学習とは、ある環境におけるエージェントが、現在の状態を観測し行動を決定する問題を扱う機械学習の一種である。時刻tにおいて、エージェントは環境からの観測（つまり、環境の現在の状態）x_tに基づき、A種類の行動のうちから一つの行動a∈{1, …,A}を決定する。時刻tにおける行動a_tは行動価値関数Q(x, a)の値に基づき決定する。まとめると、以下のような流れになる。なお、観測（状態）xは一般にベクトルとして表現される。
(1)エージェントは、時刻tにおいて、環境から観測x_tを受け取る。
(2)エージェントは、行動価値関数Q(x, a)に基づいて時刻tにおける最適な行動a_tを決定し実行する。一般には、

で決定する。
(3)環境は、エージェントの行動a_tによって状態x_t+1に変化する。
(4)環境は、状態x_t+1に基づき行動a_tの報酬r_tをエージェントに返す。
(5)(1)へ戻る。 First, reinforcement learning will be described.
<Reinforcement Learning>
Reinforcement learning is a type of machine learning that deals with problems in which an agent in a certain environment observes the current state and decides an action. At time t, the agent determines one action a∈ {1,..., A} from A types of actions based on the observation from the environment (that is, the current state of the environment) x _t . The action a _{t at} time t is determined based on the value of the action value function Q (x, a). In summary, the flow is as follows. Note that the observation (state) x is generally expressed as a vector.
(1) agents, at time t, receives the observation x _t from the environment.
(2) The agent action value function Q (x, a) determined to perform an optimal action a _t at time t based on. In general,

To decide.
(3) environment is changed to the state x _{t + 1} by the agent of the action a _t.
(4) environment, returns a reward r _t of action a _t based on the state x _{t + 1} to the agent.
(5) Return to (1).

ゲームを用いてこの処理の流れを説明したものが図１である。まず、エージェントは現在のゲーム画面が示すゲームの状態に従ってレバーをどのように動かせばよいかを決定する。その決定に従い、エージェントはレバーを操作する。すると、ゲームの画面が変化し、得点が更新される。 FIG. 1 illustrates the flow of this process using a game. First, the agent determines how to move the lever according to the game state indicated by the current game screen. According to the decision, the agent operates the lever. Then, the game screen changes and the score is updated.

エージェントは、ゲーム終了時の得点が大きくなるように、レバーを操作する。ゲームの練習とは、何度もゲームをプレーし、どういったゲーム画面のときにどうレバーを動かせば最終得点が大きくなるかの判断基準（行動価値関数Q(x, a)）を取得することである。 The agent operates the lever so that the score at the end of the game is increased. Game practice is to play the game many times, and to obtain a criterion (action value function Q (x, a)) for how to move the lever on what game screen to increase the final score That is.

行動価値関数Q(x, a)は、観測される状態xの取りうる状態数や行動パターン数Aが少なければ、テーブル関数のような形式で簡単に表現することができる。しかし、実際の音や画像を対象とする情報処理の問題では、観測される状態xは、音圧や画素値のように連続値を取るため膨大な状態数を持つことがほとんどである。 The action value function Q (x, a) can be easily expressed in the form of a table function if the number of states x and the number of action patterns A that the observed state x can take is small. However, in the problem of information processing for actual sounds and images, the observed state x usually has a huge number of states because it takes continuous values such as sound pressure and pixel values.

そこで、行動価値関数Q(x, a)は、何らかの別の関数で近似し表現されることが多い。例えば、ディープニューラルネットワーク（DNN）に基づく行動価値関数は

とした上で、

や

と表現される。ここで、Lはネットワークの層の数、W^(j)とb^(j)はそれぞれ第j層の重み行列とバイアスベクトル（j∈{1, …,L}）、F(・)はシグモイド関数などの非線形変換を表す。また、u_t ^(j)は第j層における出力（ベクトル）である。u_t ^(j)(a)はベクトルu_t ^(j)のa番目の要素を表す。 Therefore, the action value function Q (x, a) is often expressed by being approximated by some other function. For example, the action value function based on deep neural network (DNN) is

And then

And

It is expressed. Where L is the number of network layers, W ^(j) and b ^(j) are the weight matrix and bias vector (j∈ {1,…, L}) of the jth layer, and F (·) is the sigmoid function. Represents a non-linear transformation. U _t ^(j) is an output (vector) in the j-th layer. u _t ^(j) (a) represents the a-th element of the vector u _t ^(j) .

強化学習における学習とは、常に最適な行動を選べるよう行動価値関数Q(x, a)を学習する問題である。最適な行動価値関数とは、式(13)で定義される現時点tから無限の未来までに得ることのできる報酬rの和R_tを最大化するような方策を与える関数である。

ただし、γ（0≦γ<1）は、R_tが有限の値となるように設定する割引率である。 Learning in reinforcement learning is a problem of learning an action value function Q (x, a) so that an optimum action can always be selected. The optimal action value function is a function that gives a policy that maximizes the sum R _{t of} rewards r that can be obtained from the present time t defined by Equation (13) to an infinite future.

However, γ (0 ≦ γ <1) is a discount rate set so that R _t becomes a finite value.

以下、「行動価値関数が最適である」とは、式(13)の値が最大となる行動価値関数であるという意味で用いることにする。 Hereinafter, “behavior value function is optimal” is used in the sense that it is the behavior value function that maximizes the value of Equation (13).

もし、最適な行動価値関数Q^opt(x, a)が存在するならば、パラメータΘにより決定づけられる行動価値関数Q(x, a|Θ)は、

を最小化することで学習することができる。これにより、例えば、勾配法を用いて行動価値関数Q(x, a|Θ)を学習することができる。つまり、

によりパラメータΘを更新していき、行動価値関数Q(x, a|Θ)を求めるとよい。ここで、αは正の実数であり、∇はナブラを表す。 If an optimal action value function Q ^opt (x, a) exists, the action value function Q (x, a | Θ) determined by the parameter Θ is

Can be learned by minimizing. Thereby, for example, the behavior value function Q (x, a | Θ) can be learned using the gradient method. That means

The parameter Θ is updated according to the above, and the action value function Q (x, a | Θ) is obtained. Here, α is a positive real number, and ∇ represents Nabula.

しかし、最適な行動価値関数Q^opt(x, a)は未知であるため、何らかの形で近似しないとこのままでは式(14)を計算することができない。 However, since the optimal action value function Q ^opt (x, a) is unknown, Equation (14) cannot be calculated as it is unless approximated in some form.

そこで、エクスペリエンス・リプレー（Experience Replay）アルゴリズムでは、現在の行動価値関数Q(x, a|Θ)に従って行動を決定、実行し、そこから得られた報酬r_tに従って、パラメータΘを更新する。先ほどのゲームの例でいえば、何度もゲームをプレーし、その結果に基づきレバー操作の方針を改善していくイメージである。 Therefore, in the experience replay algorithm, an action is determined and executed according to the current action value function Q (x, a | Θ), and the parameter Θ is updated according to the reward r _t obtained therefrom. In the previous game example, the game is played many times and the lever operation policy is improved based on the result.

まず、現在の行動価値関数Q(x, a|Θ)に従って、観測x_1,…,T、行動a_1,…,T、報酬r_1,…,Tのペアを取得する。そして、目標とする（つまり、Q^opt(x, a)を近似する）行動価値関数Q_t ^targetを

と設定する。そして、目標値との誤差を表す関数（以下、誤差関数ともいう）L_Θを

として計算し、式(15)を用いてパラメータΘを更新する。 First, the current action-value function Q (x, a | Θ) in accordance with, the observation x _{1, ..., T,} action a _{1, ..., T,} reward r _{1, ...,} to get a pair of _T. Then, the action value function Q _t ^target as a ^target (ie, approximating Q ^opt (x, a))

And set. A function (hereinafter also referred to as an error function) L _Θ representing an error from the target value is

And the parameter Θ is updated using equation (15).

ただし、エクスペリエンス・リプレーアルゴリズムにおいて、時刻tにおける行動a_tを常に式(7)で決定すると、選択される行動が初期値に依存して偏ってしまうという問題がある。 However, the experience-replay algorithm, when determining the action a _t at time t always Equation (7), there is a problem that behavior to be selected will be biased in dependence on the initial value.

そこで、ε-グリーディー（ε-greedy）アルゴリズムでは、各時刻tにおいて、確率εで行動a_tをランダムに選択する。このようにすることで、選択される行動の偏りを防ぎ、より最適な行動価値関数Q(x, a|Θ)を学習することができるようになる。 Therefore, in the ε-greedy algorithm, at each time t, an action a _t is randomly selected with a probability ε. By doing so, it is possible to prevent the bias of the selected action and to learn a more optimal action value function Q (x, a | Θ).

次に、発明の原理について説明する。
＜発明の原理＞
聴感評点を向上させるような統計的音源強調の強化学習の基本的な枠組みは以下のようになる。つまり、(1)から(4)の手順に従い、行動価値関数Q(x, a|Θ)の学習を進める。
(1)環境の現時点tにおける観測を観測信号X_ω,t、エージェントを関数M(・)、報酬を聴感評点Z_iterとする（ただし、iterは繰り返しを示すインデックスである）。したがって、関数M(・)に従って決定されるウィナーフィルタ（ウィナーフィルタを識別する番号）が行動に相当する。
(2)目的音の学習データS_{ω,1,…,Ttrain}と雑音の学習データN_{ω,1,…,Ttrain}を使って、疑似的な観測音である観測信号X_{ω,1,…,Ttrain}を生成し、現在の関数M(・)の行動価値関数Q(x, a|Θ)に従い、音源強調を行う（状態xは観測信号X_ω,tを用いて表現される変数、行動aはウィナーフィルタを識別する番号とする）。ただし、ウィナーフィルタの推定関数である関数M(・)は、いくつかの行動パターンで実現される形式に変更しなくてはならない。
(3)音源強調の結果から聴感評点Z_iterを計算する。
(4)報酬である聴感評点Z_iterを最大化するように行動価値関数Q(x, a|Θ)を更新する。 Next, the principle of the invention will be described.
<Principle of the invention>
The basic framework for reinforcement learning of statistical sound source enhancement that improves auditory ratings is as follows. That is, learning of the action value function Q (x, a | Θ) is advanced according to the procedures (1) to (4).
(1) An observation signal X _{ω, t} at the current time t of the environment, an agent as a function M (•), and a reward as an auditory rating Z _iter (where iter is an index indicating repetition). Therefore, the winner filter (number identifying the winner filter) determined according to the function M (•) corresponds to the action.
(2) Using the target sound learning data S _{ω, 1, ..., Ttrain} and the noise learning data N _{ω, 1, ..., Ttrain} , the observation signal X _{ω, 1, ..., Ttrain} which is a pseudo observation sound , And perform sound source enhancement according to the action value function Q (x, a | Θ) of the current function M (•) (state x is a variable expressed using observation signals X _{ω, t} , action a is Number to identify the winner filter). However, the function M (•), which is an estimation function of the Wiener filter, must be changed to a form realized by several behavior patterns.
(3) The auditory rating score Z _iter is calculated from the result of sound source enhancement.
(4) The action value function Q (x, a | Θ) is updated so as to maximize the auditory rating score Z _iter as a reward.

しかし、この学習手順には以下の４つの課題が残っている。 However, the following four tasks remain in this learning procedure.

課題１は必ず解決する必要がある課題であり、この課題が解かれない限り上記手順は実行できない。一方、課題２〜４はオプションであり、これらの課題が解かれることにより音源強調の学習精度がより向上する。 Problem 1 is a problem that must be solved, and the above procedure cannot be executed unless this problem is solved. On the other hand, the tasks 2 to 4 are optional, and the learning accuracy of the sound source enhancement is further improved by solving these tasks.

（課題１）関数M(・)は、連続値のウィナーフィルタを返す関数である。したがって、関数M(・)の値の取りうるパターンは無限通りある。ところが、強化学習ではエージェントである関数M(・)の行動パターンを有限のA個に落とし込まなくてはならない。つまり、この学習で用いるウィナーフィルタを有限にする必要がある。 (Problem 1) The function M (•) is a function that returns a continuous value Wiener filter. Therefore, there are an infinite number of patterns that the value of the function M (•) can take. However, in reinforcement learning, the action pattern of the function M (•), which is an agent, must be dropped into a finite number A. That is, it is necessary to make the Wiener filter used in this learning finite.

（課題２）聴感評点は、ゲームの勝ち負けのような二値ではなく、連続値を取る。この場合、行動価値関数Q(x, a|Θ)は、式(12)のように直接聴感評点を推定するような回帰型の関数（例えば、重回帰分析などの、実数から実数への射影関数）で設計するのが一般的である。しかし、行動価値関数Q(x, a|Θ)を回帰型の関数で表現すると、一般に解空間が広くなるため、その学習が困難になる。 (Problem 2) The auditory score takes a continuous value, not a binary value as in the case of winning or losing a game. In this case, the behavioral value function Q (x, a | Θ) is a regression type function that estimates the auditory score directly as shown in Equation (12) (for example, the projection from real number to real number such as multiple regression analysis) It is common to design with a function. However, if the behavior value function Q (x, a | Θ) is expressed by a regression type function, the solution space is generally widened, so that learning becomes difficult.

この問題を解決するためには、聴感評点を二値化し、行動価値関数Q(x, a|Θ)を式(11)のように識別モデル（例えば、ロジスティック回帰などの、実数から二値への射影関数）で記述する必要がある。 To solve this problem, binarize the auditory score and convert the behavioral value function Q (x, a | Θ) from real number to binary as shown in Equation (11), such as logistic regression. Projective function).

（課題３）聴感評点は、一つの発話を強調し終わらないと評価できない。また、聴感評点は、一つの発話のうちのある局所的な部分だけ音質が劣化し、その他の部分では完璧な強調を行ったとしても、評点が下がってしまうという性質がある。しかし、エージェントは、聴感評点そのものを受け取っても、聴感評点が下がった理由までは分からないため、完璧な強調を行った部分も、悪い行動をしたものと判定してしまう（図２参照）。 (Problem 3) The auditory rating cannot be evaluated without emphasizing one utterance. In addition, the auditory score has a property that the sound quality deteriorates only in a certain local part of one utterance and the score is lowered even if perfect emphasis is performed in the other part. However, even if the agent receives the auditory score itself, it does not know the reason why the auditory score has been lowered, and therefore, the part that has been completely emphasized is determined to have acted badly (see FIG. 2).

この問題を解決するためには、聴感評点の低下を引き起こしている箇所（フレーム）だけを修正するアルゴリズムが必要である。 In order to solve this problem, an algorithm that corrects only a portion (frame) causing a decrease in the auditory rating score is required.

（課題４）強化学習では行動価値関数Q(x, a|Θ)の最適な目標値が与えらない。具体的に言えば、目標値は式(16)により逐次的に与えられる。このため、学習が初期値に依存しやすく、局所解に陥りやすい傾向がある。 (Problem 4) In reinforcement learning, the optimal target value of the action value function Q (x, a | Θ) is not given. Specifically, the target value is sequentially given by equation (16). For this reason, learning tends to depend on the initial value and tends to fall into a local solution.

この問題を解決するためには、音源強調に適した初期値の決定方法（つまり、行動価値関数Q(x, a|Θ)の初期化方法）が必要である。 In order to solve this problem, a method for determining an initial value suitable for sound source enhancement (that is, an initialization method for the action value function Q (x, a | Θ)) is required.

以下、上記４つの課題を解決する方法について説明する。
（課題１の解決法：ウィナーフィルタのテンプレート化）
ウィナーフィルタG_ω,tは式(3)で設計される。式(3)の分子は目的音S_ω,tによって決まるため、目的音の性質によってある程度パターン化できる。例えば、目的音S_ω,tが音声ならば母音を強調するウィナーフィルタが、スポーツ音ならばキック音などの突発音を強調するウィナーフィルタがそれぞれ高頻度で出現するであろうと考えられる。つまり、ウィナーフィルタG_ω,tは数種類のテンプレートを用いて十分に表現できると考えられ、行動価値関数Q(x, a|Θ)は、時刻tにおいてどのテンプレート（行動に相当）を選択するかを決定する関数となる。 Hereinafter, a method for solving the above four problems will be described.
(Solution for Problem 1: Creating a template for the winner filter)
The Wiener filter G _{ω, t} is designed by Equation (3). Since the numerator of Equation (3) is determined by the target sound S _{ω, t} , it can be patterned to some extent depending on the nature of the target sound. For example, it is considered that a Wiener filter that emphasizes vowels will appear frequently if the target sound S _{ω, t} is a speech, and a Wiener filter that emphasizes sudden sounds such as kick sounds will appear frequently if the target sound S _{ω, t} is a speech. In other words, the Wiener filter G _{ω, t} can be sufficiently expressed by using several types of templates, and the action value function Q (x, a | Θ) selects which template (corresponding to the action) at the time t. Is a function that determines

したがって、課題１を解決するためには、目的音の学習データS_{ω,1,…,Ttrain}と雑音の学習データN_{ω,1,…,Ttrain}からA個のウィナーフィルタのテンプレートを生成すればよい。具体的には、まず目的音の学習データS_{ω,1,…,Ttrain}と雑音の学習データN_{ω,1,…,Ttrain}から式(3)を用いて理想的なウィナーフィルタG_{ω,1,…,Ttrain}を生成する。次に、ウィナーフィルタG_{ω,1,…,Ttrain}をK-meansクラスタリングやヒストグラム法などを用いてクラスタリングし、A個のウィナーフィルタのテンプレートG_ω,1,…,Aを生成する。ここで、テンプレートとは、K-meansクラスタリング、GMM(Gaussian Mixture Model)、ベクトル量子化クラスタリングを用いた場合はクラスタ中心であり、ヒストグラム法を用いた場合はヒストグラムの各ビンである。 Therefore, in order to solve the problem 1, it is only necessary to generate A winner filter templates from the target sound learning data S _{ω, 1,..., Ttrain} and the noise learning data N _{ω, 1,.} . Specifically, first, the learning data S _{ω, 1, ..., Ttrain of the} target sound and the noise learning data N _{ω, 1, ..., Ttrain} are used to obtain an ideal winner filter G _{ω, 1, …, Generate Ttrain} . Next, the Wiener filters G _{ω, 1,..., Ttrain} are clustered using K-means clustering, a histogram method, or the like to generate _A winner filter templates G _{ω, 1} _,. Here, the template is a cluster center when K-means clustering, GMM (Gaussian Mixture Model), or vector quantization clustering is used, and each histogram bin when using the histogram method.

（課題２の解決法：聴感評点の閾値判定）
強化学習では二つの値しか取らない報酬を用いた方が識別モデルを適用できるため、行動価値関数を精度よく推定することができる。そこで、閾値判定を用いて聴感評点を二値化したバイナリ化聴感評点に変更することとする。 (Solution for Problem 2: Threshold judgment of auditory score)
In reinforce learning, an identification model can be applied to a reward that uses only two values, so that the action value function can be estimated with high accuracy. Therefore, the threshold evaluation is used to change the auditory score to a binarized auditory score that is binarized.

Z_iterをiter回目の更新における聴感評点とし、iter回目の更新の時刻tにおける報酬であるバイナリ化聴感評点r_t ^iterを閾値φを用いて

と決定する。 _Let Z _{iter be} the auditory score in the iter update, and use the threshold φ for the binary auditory score r _t ^iter that is the reward at the time t of the iter update.

And decide.

なお、閾値φは、ITER_thres-update回毎に以下の手順で更新するものとする。
(1)ITER_thres-update回分の聴感評点Z_iterの平均値をφ^-とする。
(2)平均値φ^-がφ-βより大きい場合、φ←φ^-+βとする（つまり、それ以外の場合、閾値φは変更しない）。ここで、βは閾値へのバイアスを示す正の実数である。 The threshold value φ is updated by the following procedure every ITER _thres-update .
(1) The average value of the auditory rating score Z _iter for ITER _thres-update is φ ⁻ .
(2) When the average value φ ⁻ is larger than φ−β, φ ← φ ⁻ + β is set (that is, otherwise, the threshold φ is not changed). Here, β is a positive real number indicating a bias to the threshold value.

ただし、式(19)を用いてバイナリ化聴感評点r_t ^iterを決定すると、聴感評点Z_iterの大小は一切考慮されない。つまり、Z_iter=φ+0.01のときもZ_iter=φ+1000のときもいずれも同じバイナリ化聴感評点r_t ^iter=Rとなってしまう。そこで、式(19)を用いて学習がうまく進まなかった場合には、バイナリ化聴感評点r_t ^iterを以下のように設定してもよい。

ここで、Υはバイナリ化聴感評点r_t ^iterの取りうる値の最大値がRとなるように制御する正の実数である。Υの値は用いる聴感評点Zの種類によって決定すべきであるが、例えば、0<Z<5の場合はΥ=50程度に設定すればよい。 However, if the ^binarized auditory score r _t ^iter is determined using equation (19), the magnitude of the auditory rating score Z _iter is not considered at all. That is, the same ^binarized auditory rating r _t ^iter = R is obtained both when Z _iter = φ + 0.01 and when Z _iter = φ + 1000. Therefore, when the learning did not proceed well by using equation (19), the binarization audibility score r _t ^iter may be set as follows.

Here, Υ is a positive real number that is controlled so that the maximum value that can be taken by the binarized auditory score r _t ^iter is R. The value of Υ should be determined according to the type of auditory rating score Z to be used. For example, when 0 <Z <5, Υ = 50 may be set.

上記説明において、以上となっている箇所をより大きいと、より大きいとなっている箇所を以上と、以下となっている箇所をより小さいと、より小さいとなっている箇所を以下と適宜変更してもよい。 In the above description, if the portion that is larger is larger, the portion that is larger is larger than the above, the portion that is smaller is smaller, the portion that is smaller is appropriately changed as follows. May be.

（課題３の解決法：二乗誤差ベースの更新重みの計算）
一つの発話の中で聴感評点の低下の要因となっている箇所だけを修正するための工夫について説明する（図２参照）。二乗誤差だけでは音質は最大化できないが、強調目的音Y_ω,tと目的音S_ω,tの二乗誤差は目的音の強調のよさを示す指標となる。ここでは、フレームt毎の二乗誤差値に従って、パラメータΘの更新量を調節するアルゴリズムを提案する。フレームtごとに計算される更新重みw_tを用いて目標値との誤差を表す関数L_Θを計算する。以下、具体的に説明する。 (Solution of Problem 3: Calculation of update weight based on square error)
A device for correcting only a part that causes a decrease in the auditory rating in one utterance will be described (see FIG. 2). Although the sound quality cannot be maximized only by the square error, the square error between the target sound Y _{ω, t} and the target sound S _{ω, t} is an index indicating the enhancement of the target sound. Here, an algorithm for adjusting the update amount of the parameter Θ according to the square error value for each frame t is proposed. A function L _Θ representing an error from the target value is calculated using the update weight w _t calculated for each frame t. This will be specifically described below.

まず、強調目的音Y_ω,tと目的音S_ω,tのフレームt毎の二乗誤差E_tを以下の式(21)で計算する。

なお、二乗誤差E_tは、メルフィルタバンクなどで圧縮したスペクトルから計算してもよい。 First, the square error E _t for each frame t of the emphasized target sound Y _{ω, t} and the target sound S _{ω, t} is calculated by the following equation (21).

Note that square error E _t may be calculated from the spectrum obtained by compressing the like mel filter bank.

次に、二乗誤差E_tを以下の式(22)で正規化する。なお、正規化後の二乗誤差E_tを正規化二乗誤差ということにする。

Next, normalized by the following equation (22) the squared error E _t. The normalized square error _{Et is referred} to as a normalized square error.

次に、更新重みw_tを以下の式(23)で計算する。

つまり、聴感評点がよい場合は、行動価値関数Q(x, a|Θ)が大きくなるように更新する必要がある。そこで、正規化二乗誤差の小さなフレームを積極的に更新する。また、聴感評点が悪い場合は、行動価値関数Q(x, a|Θ)が小さくなるように更新する必要がある。そこで、正規化二乗誤差の大きなフレームを積極的に更新する。たとえて言うならば、前者はほめるイメージであり、後者は叱るイメージである。 Next, the update weight w _t is calculated by the following equation (23).

That is, when the auditory rating is good, it is necessary to update the behavior value function Q (x, a | Θ) to be large. Therefore, a frame with a small normalized square error is actively updated. If the auditory rating is bad, it is necessary to update the behavior value function Q (x, a | Θ) to be small. Therefore, a frame with a large normalized square error is actively updated. For example, the former is an image of compliment, and the latter is an image of scolding.

次に、目標値との誤差を表す関数L_Θを以下の式(24)で計算する。つまり、式(24)は式(18)を代替するものである。

Next, a function L _Θ representing an error from the target value is calculated by the following equation (24). That is, Expression (24) replaces Expression (18).

最後に、このL_Θを用いて式(15)に従いパラメータΘを更新する。 Finally, the parameter Θ is updated according to the equation (15) using this L _Θ .

ここでは、E_tの計算に二乗誤差を用いて更新重みw_tを求める方法について説明したが、E_tの計算に用いる誤差は二乗誤差に限られるものではない。フレームt毎に強調目的音Y_ω,tと目的音S_ω,tの信号の歪み度合が計算できればよいので、例えば、二乗誤差の代わりに、信号対歪比SDR（Signal-to-Distortion Ratio）や信号対干渉比SIR(Signal-to-Interference Ratio)、またはこれらの重み付け和などを用いても、同様の効果が得られる。したがって、二乗誤差、正規化二乗誤差をそれぞれ単に誤差、正規化誤差ということもある。 Here has been described how using a square error in the calculation of E _t seek updating weight w _t, the error used to calculate the E _t is not limited to the square error. Since it is only necessary to calculate the degree of distortion of the signals of the enhanced target sound Y _{ω, t} and the target sound S _{ω, t} for each frame t, for example, instead of the square error, a signal-to-distortion ratio SDR (Signal-to-Distortion Ratio) Similar effects can be obtained by using a signal-to-interference ratio (SIR) or a weighted sum thereof. Therefore, the square error and the normalized square error may be simply referred to as an error and a normalization error, respectively.

（課題４の解決法：二乗誤差ベースの初期値の決定）
ここでは、行動価値関数Q(x, a|Θ)の初期化方法について説明する。例えば、行動価値関数Q(x, a|Θ)が式(8)から式(11)を用いて表現される場合は、W^(j)とb^(j)の初期値を決定することになる。先述した通り、強調目的音Y_ω,tと目的音S_ω,tの二乗誤差は目的音の強調のよさを示す指標である。そこで、初期値は学習データの各フレームにおいて二乗誤差の意味で最適なウィナーフィルタ番号を出力するような行動価値関数Q(x, a|Θ)とする。以下、その手順について説明する。 (Solution for Problem 4: Determination of square error based initial value)
Here, an initialization method of the behavior value function Q (x, a | Θ) will be described. For example, when the behavior value function Q (x, a | Θ) is expressed using equations (8) to (11), the initial values of W ^(j) and b ^(j) will be determined. . As described above _, the square error between the emphasized target sound Y _{ω, t} and the target sound S _{ω, t} is an index indicating the enhancement of the target sound. Therefore, the initial value is an action value function Q (x, a | Θ) that outputs an optimal Wiener filter number in the sense of a square error in each frame of learning data. Hereinafter, the procedure will be described.

まず、目的音の学習データS_{ω,1,…,Ttrain}と雑音の学習データN_{ω,1,…,Ttrain}から式(3)で理想的なウィナーフィルタG_{ω,1,…,Ttrain}を生成する。ここで、S_{ω,1,…,Ttrain}とN_{ω,1,…,Ttrain}から観測信号X_{ω,1,…,Ttrain}も生成しておく。 First, the ideal Wiener filter G _{ω, 1, ..., Ttrain} is generated from the learning data S _{ω, 1, ..., Ttrain of} the target sound and the learning data N _{ω, 1, ..., Ttrain of the noise} by Equation (3) . _{Here, S ω, 1, ...,} Ttrain and N _{ω, 1, ...,} observation signals X _omega from _{_Ttrain, 1, ...,} _Ttrain be kept generated.

次に、学習データの各フレーム（t=1,…,Ttrain）において、以下の式(25)を用いて二乗誤差の意味で最適なウィナーフィルタテンプレートの番号a_1,…,Ttrainを決定する。

ここで、aは（課題１の解決法：ウィナーフィルタのテンプレート化）で述べたテンプレートを識別する番号であり、a∈{1, …, A}である。 Next, in each frame of the learning data (t = 1,..., Ttrain), the optimum Wiener filter template number a _{1,..., Ttrain} is determined using the following equation (25) in terms of a square error.

Here, a is a number for identifying the template described in (Solution to Problem 1: Template of Wiener filter), and a∈ {1,..., A}.

なお、式(25)中の二乗誤差はメルフィルタバンクなどで圧縮したウィナーフィルタから計算してもよい。 The square error in equation (25) may be calculated from a Wiener filter compressed by a mel filter bank or the like.

最後に、観測信号X_{ω,1,…,Ttrain}から生成される状態ベクトルx_tを学習したときにテンプレートの番号a_tを出力するように行動価値関数Q(x, a|Θ)を識別学習する。先の例でいえば、W^(j)とb^(j)の初期値が決定されることになる。なお、識別学習には任意の方法を用いることができる。例えば、ロジスティック回帰などを用いるとよい。 Finally, the observed signal X _{ω, 1, ...,} action-value function Q (x, a | Θ) so as to output the number a _t of the template when you learn the state vector x _t that is generated from _Ttrain the identification learning To do. In the previous example, the initial values of W ^(j) and b ^(j) are determined. An arbitrary method can be used for identification learning. For example, logistic regression may be used.

また、状態ベクトルの具体的な生成方法については、後述する実施形態１の音源強調部１３０における状態ベクトルの生成と同様の方法を用いるのでよい。 As a specific method for generating the state vector, a method similar to the method for generating the state vector in the sound source emphasizing unit 130 of the first embodiment to be described later may be used.

＜実施形態１＞
以下、図３〜図４を参照して実施形態１の音源強調学習装置１００を説明する。図３は、音源強調学習装置１００の構成を示すブロック図である。図４は、音源強調学習装置１００の動作を示すフローチャートである。図１に示すように音源強調学習装置１００は、ウィナーフィルタテンプレート化部１１０と、行動価値関数初期化部１２０と、音源強調部１３０と、聴感評点計算部１４０と、行動価値関数更新部１５０と、収束判定部１６０を含む。 <Embodiment 1>
Hereinafter, the sound source enhancement learning apparatus 100 according to the first embodiment will be described with reference to FIGS. FIG. 3 is a block diagram illustrating a configuration of the sound source enhancement learning device 100. FIG. 4 is a flowchart showing the operation of the sound source enhancement learning apparatus 100. As shown in FIG. 1, the sound source enhancement learning device 100 includes a winner filter template conversion unit 110, a behavior value function initialization unit 120, a sound source enhancement unit 130, an auditory score calculation unit 140, and a behavior value function update unit 150. The convergence determination unit 160 is included.

音源強調学習装置１００は、目的音学習データ記録部９１０、雑音学習データ記録部９２０に接続している。目的音学習データ記録部９１０、雑音学習データ記録部９２０には、事前に収音した目的音と雑音が学習データとして記録されている。目的音は雑音を一切含まないクリーンな音である方がよい。 The sound source enhancement learning device 100 is connected to the target sound learning data recording unit 910 and the noise learning data recording unit 920. In the target sound learning data recording unit 910 and the noise learning data recording unit 920, the target sound and noise collected in advance are recorded as learning data. The target sound should be a clean sound that does not contain any noise.

また、目的音学習データ記録部９１０、雑音学習データ記録部９２０に記録される目的音及び雑音は、時間領域信号である方が望ましい。時間領域目的音、時間領域雑音は音源ごとに分割して記録しておく。例えば、目的音が音声である場合、発話単位に分割しておく。なお、以下では、簡単のために、音声以外の目的音であっても発話ということにする。 The target sound and noise recorded in the target sound learning data recording unit 910 and the noise learning data recording unit 920 are preferably time domain signals. The time domain target sound and time domain noise are recorded separately for each sound source. For example, when the target sound is speech, it is divided into speech units. In the following, for the sake of simplicity, even a target sound other than voice will be referred to as utterance.

また、以下では、目的音学習データ記録部９１０、雑音学習データ記録部９２０に記録される目的音及び雑音は、発話単位に分割された時間領域目的音及び時間領域雑音であるとして説明をする。 In the following description, it is assumed that the target sound and noise recorded in the target sound learning data recording unit 910 and the noise learning data recording unit 920 are time domain target sound and time domain noise divided into speech units.

音源強調学習装置１００の各構成部で用いる各種パラメータ（例えば、強化型学習、識別学習など学習アルゴリズムで用いるパラメータ）については、目的音学習データや雑音学習データと同様外部から入力されてもよいし、事前に各構成部に設定されていてもよい。各種パラメータの推奨値については、パラメータが関係する各構成部の説明の際に適宜説明することとする。 Various parameters (for example, parameters used in a learning algorithm such as reinforcement learning and identification learning) used in each component of the sound source enhancement learning device 100 may be input from the outside in the same manner as the target sound learning data and noise learning data. Alternatively, each component may be set in advance. The recommended values of the various parameters will be described as appropriate when explaining each component related to the parameters.

パラメータの一例をあげると、ウィナーフィルタテンプレート化部１１０で用いる行動パターン数（テンプレート数）Aがある。この行動パターン数Aの値は発話数や目的音の複雑さに応じて変更するのが好ましいため、外部から入力する方がよい。また、行動パターン数Aの推奨値は、音声の場合、64〜128程度である。 As an example of the parameter, there is the number of behavior patterns (number of templates) A used in the winner filter templating unit 110. Since the value of the behavior pattern number A is preferably changed according to the number of utterances and the complexity of the target sound, it is better to input from the outside. In addition, the recommended value of the number A of behavior patterns is about 64 to 128 in the case of voice.

ウィナーフィルタテンプレート化部１１０は、目的音学習データと雑音学習データを入力とし、A個のウィナーフィルタテンプレートG_ω,1,…,Aを生成する（Ｓ１１０）。具体的には、（課題１の解決法：ウィナーフィルタのテンプレート化）で説明した方法でテンプレートを生成する。 The winner filter template generation unit 110 receives the target sound learning data and the noise learning data as input, and generates _A winner filter templates _{Gω, 1,..., A} (S110). Specifically, a template is generated by the method described in (Solution for Problem 1: Template of Wiener Filter).

以下、図５〜図６を参照してウィナーフィルタテンプレート化部１１０について説明する。図５は、ウィナーフィルタテンプレート化部１１０の構成を示すブロック図である。図６は、ウィナーフィルタテンプレート化部１１０の動作を示すフローチャートである。図５に示すようにウィナーフィルタテンプレート化部１１０は、周波数領域変換部１１１と、ウィナーフィルタ生成部１１２と、クラスタリング部１１３を含む。 Hereinafter, the winner filter templating unit 110 will be described with reference to FIGS. FIG. 5 is a block diagram illustrating a configuration of the winner filter template forming unit 110. FIG. 6 is a flowchart showing the operation of the winner filter template forming unit 110. As shown in FIG. 5, the Wiener filter template conversion unit 110 includes a frequency domain conversion unit 111, a Wiener filter generation unit 112, and a clustering unit 113.

まず、周波数領域変換部１１１は、目的音学習データ記録部９１０、雑音学習データ記録部９２０から読み出した目的音学習データと雑音学習データを周波数領域目的音学習データS_{ω,1,…,Ttrain}と周波数領域雑音学習データN_{ω,1,…,Ttrain}に変換する（Ｓ１１１）。例えば、高速フーリエ変換（FFT）を用いて時間領域信号を周波数領域信号に変換すればよい。変換に必要なパラメータであるFFT長、シフト長は、サンプリングレートが16kHzである場合、FFT長を512、シフト長を256などに設定すればよい。 First, the frequency domain conversion unit 111 converts the target sound learning data and the noise learning data read from the target sound learning data recording unit 910 and the noise learning data recording unit 920 into frequency domain target sound learning data S _{ω, 1,.} The frequency domain noise learning data N _{ω, 1,..., Ttrain} are converted (S111). For example, a time domain signal may be converted into a frequency domain signal using fast Fourier transform (FFT). The FFT length and shift length, which are parameters required for conversion, may be set to 512, the shift length, 256, etc. when the sampling rate is 16 kHz.

次に、ウィナーフィルタ生成部１１２は、Ｓ１１１で生成した学習データS_{ω,1,…,Ttrain}とN_{ω,1,…,Ttrain}から式(3)を用いてウィナーフィルタG_{ω,1,…,Ttrain}を生成する（Ｓ１１２）。 Next, the winner filter generator 112 uses the learning data S _{ω, 1,..., Ttrain} and N _{ω, 1,..., Ttrain} generated in S111 to use the winner filter G _{ω, 1} _,. _Ttrain is generated (S112).

最後に、クラスタリング部１１３は、ウィナーフィルタG_{ω,1,…,Ttrain}からA個のウィナーフィルタテンプレートG_ω,1,…,Aを生成する（Ｓ１１３）。なお、有限個のテンプレートを生成することができるのであれば、クラスタリング以外の分類方法を用いてもよい。 Finally, the clustering unit 113 generates _A winner filter templates _{Gω, 1, ...,} A from the winner filters _{Gω, 1, ..., Ttrain} (S113). As long as a finite number of templates can be generated, a classification method other than clustering may be used.

行動価値関数初期化部１２０は、行動価値関数Q(x, a|Θ)を初期化する（Ｓ１２０）。つまり、行動価値関数Q(x, a|Θ)の初期値を生成する。行動価値関数が式(8)から式(11)を用いて表現される場合は、第j層の重み行列W^(j)と第j層のバイアスベクトルb^(j)の初期値を決定することになる。例えば、乱数を用いて重み行列W^(j)やバイアスベクトルb^(j)の各要素の値を生成するのでよい。また、クロスエントロピー基準のバックプロパゲーションを用いて生成するのでもよい。 The behavior value function initialization unit 120 initializes the behavior value function Q (x, a | Θ) (S120). That is, the initial value of the behavior value function Q (x, a | Θ) is generated. When the action value function is expressed using equations (8) to (11), determine the initial values of the weight matrix W ^(j) of the jth layer and the bias vector b ^(j) of the jth layer become. For example, the values of the elements of the weight matrix W ^(j) and the bias vector b ^(j) may be generated using random numbers. Alternatively, it may be generated using cross-entropy-based backpropagation.

音源強調部１３０は、Ｓ１１０で生成したウィナーフィルタテンプレート及び現時点の行動価値関数Q(x, a|Θ)を用いて、強調目的音、テンプレート番号、状態ベクトルを生成する（Ｓ１３０）。Ｓ１３０の処理は、学習データの組ごとに繰り返し実行されることになる。 The sound source emphasizing unit 130 generates an emphasis target sound, a template number, and a state vector using the winner filter template generated in S110 and the current action value function Q (x, a | Θ) (S130). The process of S130 is repeatedly executed for each set of learning data.

以下、図７〜図８を参照して音源強調部１３０について説明する。図７は、音源強調部１３０の構成を示すブロック図である。図８は、音源強調部１３０の動作を示すフローチャートである。図７に示すように音源強調部１３０は、観測信号生成部１３１と、周波数領域変換部１３２と、状態ベクトル生成部１３３と、テンプレート選択部１３４と、強調目的音生成部１３５と、時間領域変換部１３６と、出力生成部１３７を含む。 Hereinafter, the sound source emphasizing unit 130 will be described with reference to FIGS. FIG. 7 is a block diagram illustrating a configuration of the sound source emphasizing unit 130. FIG. 8 is a flowchart showing the operation of the sound source emphasizing unit 130. As shown in FIG. 7, the sound source enhancement unit 130 includes an observation signal generation unit 131, a frequency domain conversion unit 132, a state vector generation unit 133, a template selection unit 134, an enhancement target sound generation unit 135, and a time domain conversion. Unit 136 and an output generation unit 137.

まず、観測信号生成部１３１は、目的音学習データ記録部９１０、雑音学習データ記録部９２０に記録される目的音学習データ、雑音学習データを読出し、目的音学習データと雑音学習データを重畳し、時間領域観測信号を生成する（Ｓ１３１）。 First, the observation signal generation unit 131 reads out target sound learning data and noise learning data recorded in the target sound learning data recording unit 910 and the noise learning data recording unit 920, and superimposes the target sound learning data and the noise learning data, A time domain observation signal is generated (S131).

次に、周波数領域変換部１３２は、Ｓ１３１で生成した観測信号を周波数領域に変換し、周波数領域観測信号X_ω,tを生成する（Ｓ１３２）。周波数領域変換部１１１と同様、高速フーリエ変換（FFT）を用いて時間領域信号を周波数領域信号に変換すればよい。 Next, the frequency domain conversion unit 132 converts the observation signal generated in S131 into the frequency domain _, and generates a frequency domain observation signal _{Xω, t} (S132). Similar to the frequency domain transform unit 111, a time domain signal may be converted into a frequency domain signal using fast Fourier transform (FFT).

状態ベクトル生成部１３３は、Ｓ１３２で生成した観測信号X_ω,tから各時刻tにおける状態ベクトルx_tを生成する（Ｓ１３３）。例えば、フレームtの過去P₁フレームから未来P₂フレームまでの観測信号を縦に連結したものを状態ベクトルx_tとして、以下の式(26)のように生成するのでよい。

The state vector generation unit 133 generates a state vector x _t at each time t from the observation signal X _{ω, t} generated in S132 (S133). For example, the state vector xt may be generated by vertically linking observation signals from the past P ₁ frame to the future P ₂ frame of the frame _t as shown in the following equation (26).

なお、連結する観測信号としてメルフィルタバンクなどで圧縮したものを用いるのでもよい。また、P₁とP₂は10程度に設定すればよい。 Note that a signal compressed by a mel filter bank or the like may be used as an observation signal to be connected. Further, P ₁ and P ₂ may be set to about 10.

テンプレート選択部１３４は、Ｓ１１０で生成したウィナーフィルタテンプレートG_ω,1,…,Aを用いて、行動価値関数Q(x_t, a|Θ)（ただし、x_tはＳ１３３で生成した状態ベクトル、aはテンプレート番号）の値を計算し、式(7)を用いて最適なウィナーフィルタテンプレート（テンプレート番号a_t）を選択する（Ｓ１３４）。行動価値関数Q(x_t, a|Θ)の値は、例えば、式(8)〜式(11)を用いて計算すればよい。 The template selection unit 134 uses the winner filter template G _{ω, 1,..., A} generated in S110 to use the action value function Q (x _t , a | Θ) (where x _t is the state vector generated in S133, a is a template number), and an optimum winner filter template (template number a _t ) is selected using equation (7) (S134). The value of the behavior value function Q (x _t , a | Θ) may be calculated using, for example, Expression (8) to Expression (11).

なお、式(7)を用いる代わりに、ε-グリーディーアルゴリズムを用いてウィナーフィルタテンプレートを選択してもよい。この場合、εの値は、0.01や0.05に設定すればよい。 Note that a Wiener filter template may be selected using the ε-greedy algorithm instead of using Equation (7). In this case, the value of ε may be set to 0.01 or 0.05.

強調目的音生成部１３５は、Ｓ１３４で選択した最適なウィナーフィルタテンプレートG_ω,atと式(27)を用いて周波数領域強調目的音Y_ω,tを生成する（Ｓ１３５）。

ただし、a_tは、Ｓ１３４で選択した最適なウィナーフィルタテンプレートを識別する番号であり、a_t∈{1, …, A}である。 The enhancement target sound generation unit 135 generates the frequency domain enhancement target sound Y _{ω, t} using the optimum winner filter template G _{ω, at} selected in S134 and Equation (27) (S135).

However, a _t is a number that identifies the optimum Wiener filter template selected in _{S134, a t ∈ {1,} ..., A} is.

時間領域変換部１３６は、Ｓ１３５で生成した強調目的音Y_ω,tから時間領域強調目的音を生成する（Ｓ１３６）。時間領域への変換には逆フーリエ変換を用いればよい。 The time domain conversion unit 136 generates a time domain enhancement target sound from the enhancement target sound Y _{ω, t} generated in S135 (S136). An inverse Fourier transform may be used for the conversion to the time domain.

最後に、出力生成部１３７は、時間領域強調目的音、選択された最適なウィナーフィルタテンプレート番号a₁,…、各時刻の状態ベクトルx₁,…を出力する（Ｓ１３７）。 Finally, the output generation unit 137 outputs the time domain emphasis target sound, the selected optimal winner filter template number a ₁ ,..., And the state vectors x ₁ ,.

なお、音源強調部１３０が時間領域変換部１３６を備えない構成とすることも可能である。この場合は、Ｓ１３６は省略され、Ｓ１３７での出力が周波数領域強調目的音、選択された最適なウィナーフィルタテンプレート番号a₁,…、各時刻の状態ベクトルx₁,…となる。 Note that the sound source emphasizing unit 130 may not include the time domain conversion unit 136. In this case, S136 is omitted, and the output in S137 is the frequency domain emphasis target sound, the selected optimal winner filter template number a ₁ ,..., And the state vector x ₁ ,.

聴感評点計算部１４０は、Ｓ１３０で出力された強調目的音から聴感評点Z_iterを計算する（Ｓ１４０）。音質の指標となる聴感評点には、PESQ、STOIなど任意のものを用いることができる。なお、これらの聴感評点の計算に際して必要があれば、目的音学習データ記録部９１０、雑音学習データ記録部９２０から読み出した目的音学習データと雑音学習データを読み出すものとする。また、Ｓ１３０で出力された強調目的音が周波数領域信号である場合、必要に応じて時間領域信号に変換してから聴感評点Z_iterを計算することになる。 The audibility score calculator 140 calculates the audibility score Z _iter from the emphasized target sound output in S130 (S140). As the auditory rating as an index of sound quality, any one such as PESQ and STOI can be used. It should be noted that the target sound learning data and the noise learning data read from the target sound learning data recording unit 910 and the noise learning data recording unit 920 are read if necessary in calculating these auditory scores. In addition, when the enhancement target sound output in S130 is a frequency domain signal, the auditory rating score Z _iter is calculated after being converted to a time domain signal as necessary.

行動価値関数更新部１５０は、Ｓ１４０で計算された聴感評点を用いて行動価値関数Q(x, a|Θ)を更新する（Ｓ１５０）。具体的には、聴感評点Z_iterを報酬とし、式(18)及び式(15)を用いてパラメータΘを更新することにより、行動価値関数Q(x, a|Θ)を更新する。行動価値関数の各値Q(x_t, a_t|Θ)を計算するときは、Ｓ１３０で出力したテンプレート番号a₁,…、各時刻の状態ベクトルx₁,…を用いて計算する。 The behavior value function updating unit 150 updates the behavior value function Q (x, a | Θ) using the auditory rating score calculated in S140 (S150). Specifically, the behavioral value function Q (x, a | Θ) is updated by using the auditory rating score Z _iter as a reward and updating the parameter Θ using Equation (18) and Equation (15). Each value _{_{Q (x t, a t |}} Θ) action value function when calculating the template number a ₁ output in S130, ..., the state vector x ₁ at each time is calculated ... with.

なお、式(15)のαは10^-3程度に設定すればよい。 In the equation (15), α may be set to about 10 ⁻³ .

収束判定部１６０は、更新回数が実行開始時に指定した所定の回数に達した場合は、現時点の行動価値関数Q(x, a|Θ)を出力して処理を終了する一方、達していない場合はＳ１３０に戻り再度行動価値関数Q(x, a|Θ)の更新計算を行う（Ｓ１６０）。 Convergence determination unit 160 outputs the current action value function Q (x, a | Θ) when the number of updates reaches the predetermined number specified at the start of execution, and ends the process, but has not reached Returns to S130 and performs update calculation of the action value function Q (x, a | Θ) again (S160).

本実施形態の発明によれば、音源強調に強化学習を適用することにより、式(5)、式(6)に代表される二乗誤差以外の目的関数を用いることが可能となる。これにより、音情報処理技術に適した目的関数（具体的には、聴感評点を反映した行動価値関数）を用いて音源強調を最適化することが可能となる。つまり、音質劣化を抑制した音源強調の学習を行うことが可能となる。 According to the invention of the present embodiment, by applying reinforcement learning to sound source enhancement, it is possible to use an objective function other than the square error represented by Equation (5) and Equation (6). As a result, sound source enhancement can be optimized using an objective function suitable for sound information processing technology (specifically, an action value function reflecting an auditory rating score). That is, it is possible to perform sound source enhancement learning with suppressed sound quality deterioration.

＜実施形態２＞
実施形態１では、聴感評点計算部１４０で計算される聴感評点は一般に連続値であった。しかし、（課題２の解決法：聴感評点の閾値判定）で述べたように、強化学習では、報酬に二値を用いた方が識別モデルを適用できるため、行動価値関数を精度よく推定することができる。そこで、閾値判定を用いて連続値である聴感評点を二値化する処理を追加する。 <Embodiment 2>
In the first embodiment, the auditory score calculated by the auditory score calculator 140 is generally a continuous value. However, as described in (Solution for Problem 2: Threshold evaluation of auditory score), in reinforcement learning, it is possible to apply a discrimination model when using a binary value as a reward. Can do. Therefore, a process of binarizing the auditory rating score, which is a continuous value, using threshold determination is added.

以下、図９〜図１０を参照して実施形態２の音源強調学習装置２００を説明する。図９は、音源強調学習装置２００の構成を示すブロック図である。図１０は、音源強調学習装置２００の動作を示すフローチャートである。音源強調学習装置２００は聴感評点バイナリ化部２４０が追加されている点のみにおいて音源強調学習装置１００と異なる。 Hereinafter, the sound source enhancement learning apparatus 200 according to the second embodiment will be described with reference to FIGS. 9 to 10. FIG. 9 is a block diagram illustrating a configuration of the sound source enhancement learning device 200. FIG. 10 is a flowchart showing the operation of the sound source enhancement learning device 200. The sound source enhancement learning device 200 is different from the sound source enhancement learning device 100 only in that an auditory score binarization unit 240 is added.

そこで、以下では、図１１〜図１２を参照して聴感評点バイナリ化部２４０について説明する。図１１は、聴感評点バイナリ化部２４０の構成を示すブロック図である。図１２は、聴感評点バイナリ化部２４０の動作を示すフローチャートである。図１１に示すように聴感評点バイナリ化部２４０は、バイナリ化部２４１と、閾値更新部２４２を含む。 Therefore, hereinafter, the auditory score binarization unit 240 will be described with reference to FIGS. FIG. 11 is a block diagram showing the configuration of the auditory score binarization unit 240. FIG. 12 is a flowchart showing the operation of the auditory score binarization unit 240. As shown in FIG. 11, the auditory score binarization unit 240 includes a binarization unit 241 and a threshold update unit 242.

バイナリ化部２４１は、Ｓ１４０で生成した聴感評点Z_iterを二値変換し、バイナリ化聴感評点r_t ^iterを生成する（Ｓ２４１）。具体的には、（課題２の解決法：聴感評点の閾値判定）の式(19)を用いる。つまり、聴感評点Z_iterが閾値φ以上である場合、バイナリ化聴感評点r_t ^iterをRとし、聴感評点Z_iterが閾値φより小さい場合、バイナリ化聴感評点r_t ^iterを-Rとする。 The binarization unit 241 performs binary conversion on the auditory rating score Z _iter generated in S140 to generate a binarized auditory score r _t ^iter (S241). Specifically, Equation (19) of (Solution of Problem 2: Threshold evaluation of auditory rating score) is used. That is, when the auditory rating score Z _iter is equal to or greater than the threshold φ, the binarized auditory score r _t ^iter is R, and when the auditory score Z _iter is smaller than the threshold φ, the binarized auditory score r _t ^iter is −R.

なお、式(19)の代わりに、式(20)を用いてバイナリ化聴感評点r_t ^iterを生成してもよい。 Note that the ^binarized auditory score r _t ^iter may be generated using the equation (20) instead of the equation (19).

閾値更新部２４２は、Ｓ２４１がITER_thres-update回実行される度に、閾値φを更新する（Ｓ２４２）。具体的方法は、（課題２の解決法：聴感評点の閾値判定）にある通りであり、聴感評点Z_iterの平均値φ^-がφ-βより大きい場合のみ、φ←φ^-+βにより更新する。 The threshold update unit 242 updates the threshold φ every time S241 is executed ITER _thres-update times (S242). Specific method is: a street in (Problem 2 solutions threshold determination audibility score), the average value of the perceptual score Z _iter phi ^- only if greater than φ-β, φ ← φ ^- updated by + beta To do.

Rは0.05程度、βは3程度、φの初期値は0、ITER_thres-updateは20程度に設定すればよい。 R may be set to about 0.05, β may be set to about 3, initial value of φ may be set to 0, and ITER _thres-update may be set to about 20.

行動価値関数更新部１５０は、Ｓ２４０で計算したバイナリ化聴感評点を報酬として用いて行動価値関数Q(x, a|Θ)を更新する（Ｓ１５０）。 The behavior value function updating unit 150 updates the behavior value function Q (x, a | Θ) using the binarized auditory score calculated in S240 as a reward (S150).

本実施形態の発明によれば、強化学習において聴感評点として二値を用いるため、行動価値関数を精度よく推定することができる。つまり、音質劣化をより抑制する音源強調の学習を行うことが可能となる。 According to the invention of the present embodiment, since a binary value is used as an auditory rating score in reinforcement learning, an action value function can be estimated with high accuracy. That is, it is possible to perform sound source enhancement learning that further suppresses sound quality degradation.

＜実施形態３＞
実施形態１では、行動価値関数更新部１５０では式(18)を用いて目標値との誤差を表す関数L_Θを計算した。しかし、（課題３の解決法：二乗誤差ベースの更新重みの計算）で述べたように、一つの発話の中で聴感評点の低下の要因となっている箇所だけを修正するように、フレームt毎にL_Θの二乗誤差値を計算する方が行動価値関数を精度よく推定することができる。そこで、関数L_Θを計算する際に用いる更新重みw_tを計算する処理を追加する。 <Embodiment 3>
In the first embodiment, the action value function updating unit 150 calculates a function L _Θ representing an error from the target value using Expression (18). However, as described in (Solution to Problem 3: Calculation of update weight based on square error), frame t is corrected so that only the part that causes a decrease in auditory rating in one utterance is corrected. The action value function can be estimated more accurately by calculating the square error value of L _Θ every time. Therefore, a process for calculating the update weight w _t used when calculating the function L _Θ is added.

以下、図１３〜図１４を参照して実施形態３の音源強調学習装置３００を説明する。図１３は、音源強調学習装置３００の構成を示すブロック図である。図１４は、音源強調学習装置３００の動作を示すフローチャートである。音源強調学習装置３００は更新重み計算部３４０が追加されている点、行動価値関数更新部１５０の代わりに行動価値関数更新部３５０が追加されている点において音源強調学習装置１００と異なる。 Hereinafter, the sound source enhancement learning apparatus 300 according to the third embodiment will be described with reference to FIGS. 13 to 14. FIG. 13 is a block diagram illustrating a configuration of the sound source enhancement learning device 300. FIG. 14 is a flowchart showing the operation of the sound source enhancement learning device 300. The sound source enhancement learning device 300 is different from the sound source enhancement learning device 100 in that an update weight calculation unit 340 is added and an action value function update unit 350 is added instead of the action value function update unit 150.

以下では、まず、図１５〜図１６を参照して更新重み計算部３４０について説明する。図１５は、更新重み計算部３４０の構成を示すブロック図である。図１６は、更新重み計算部３４０の動作を示すフローチャートである。図１５に示すように更新重み計算部３４０は、周波数領域変換部３４１と、誤差計算部３４２と、誤差正規化部３４３と、更新重み決定部３４４を含む。 Hereinafter, first, the update weight calculation unit 340 will be described with reference to FIGS. 15 to 16. FIG. 15 is a block diagram illustrating a configuration of the update weight calculation unit 340. FIG. 16 is a flowchart showing the operation of the update weight calculation unit 340. As shown in FIG. 15, the update weight calculation unit 340 includes a frequency domain conversion unit 341, an error calculation unit 342, an error normalization unit 343, and an update weight determination unit 344.

周波数領域変換部３４１は、目的音学習データ記録部９１０に記録される目的音学習データを周波数領域に変換し、周波数領域目的音S_ω,tを生成する（Ｓ３４１）。周波数領域変換部１１１と同様、高速フーリエ変換（FFT）を用いて時間領域信号を周波数領域信号に変換すればよい。 The frequency domain conversion unit 341 converts the target sound learning data recorded in the target sound learning data recording unit 910 into the frequency domain _, and generates the frequency domain target sound S _{ω, t} (S341). Similar to the frequency domain transform unit 111, a time domain signal may be converted into a frequency domain signal using fast Fourier transform (FFT).

誤差計算部３４２は、Ｓ３４１で生成した目的音S_ω,tとＳ１３５で生成した周波数領域強調目的音Y_ω,tから式(21)を用いて二乗誤差E_tを計算する（Ｓ３４２）。誤差正規化部３４３は、二乗誤差E_tから式(22)を用いて正規化二乗誤差を計算する（Ｓ３４３）。 Error calculator 342, a target sound S _omega generated in _{S341, t} and frequency domain emphasis target sound generated in S135 Y _omega, calculates the square error E _t using equation (21) from _t (S342). Error normalization unit 343 calculates a normalized squared error using the equation (22) from the square error E _t (S343).

更新重み決定部３４４は、正規化二乗誤差から更新重みを決定する（Ｓ３４４）。具体的方法は、式(23)を用いる。つまり、聴感評点Z_iterが閾値φ以上である場合、更新重みw_tを1から正規化二乗誤差を引いた値、聴感評点Z_iterが閾値φより小さい場合、更新重みw_tを正規化二乗誤差そのものとする。 The update weight determination unit 344 determines an update weight from the normalized square error (S344). A specific method uses Equation (23). That is, if the auditory rating score Z _iter is equal to or greater than the threshold φ, the update weight w _t is a value obtained by subtracting the normalized square error from 1. If the auditory rating score Z _iter is smaller than the threshold φ, the update weight w _{t is set} to the normalized square error. Let it be.

なお、Ｓ３４２〜Ｓ３４４では、二乗誤差、正規化二乗誤差を用いて説明したが、（課題３の解決法：二乗誤差ベースの更新重みの計算）で述べたように、二乗誤差以外の誤差を用いてもよい。この場合、Ｓ３４２〜Ｓ３４４における、二乗誤差、正規化二乗誤差を誤差、正規化誤差に読みかえればよい。 In S342 to S344, the description has been given using the square error and the normalized square error. However, as described in (Solution of Problem 3: Calculation of update weight based on square error), an error other than the square error is used. May be. In this case, the square error and the normalized square error in S342 to S344 may be read as an error and a normalization error.

次に、行動価値関数更新部３５０について説明する。行動価値関数更新部３５０は、Ｓ１４０で計算された聴感評点とＳ３４０で計算された更新重みを用いて行動価値関数Q(x, a|Θ)を更新する（Ｓ３５０）。具体的には、聴感評点Z_iterを報酬とし、式(18)の代わりに式(24)を用いてL_Θを計算し、式(15)に従いパラメータΘを更新することにより、行動価値関数Q(x, a|Θ)を更新する。 Next, the behavior value function updating unit 350 will be described. The behavior value function updating unit 350 updates the behavior value function Q (x, a | Θ) using the auditory rating score calculated in S140 and the update weight calculated in S340 (S350). Specifically, the auditory score Z _iter is used as a reward, L _Θ is calculated using Equation (24) instead of Equation (18), and the parameter Θ is updated according to Equation (15). Update (x, a | Θ).

本実施形態の発明によれば、一つの発話の中で聴感評点の低下の要因となっている箇所だけを修正するように、フレームt毎にL_Θ中の二乗誤差値を計算するため、行動価値関数を精度よく推定することができる。つまり、音質劣化をより抑制する音源強調の学習を行うことが可能となる。 According to the invention of this embodiment, in order to calculate the square error value in L _{Θ for} each frame t so as to correct only the part that causes a decrease in auditory rating in one utterance, The value function can be estimated with high accuracy. That is, it is possible to perform sound source enhancement learning that further suppresses sound quality degradation.

＜実施形態４＞
実施形態１では、式(16)を用いて行動価値関数を逐次的に計算するため、強化学習が初期値に依存しやすく、局所解に陥りやすいという問題がある。そこで、最適な目標値に近い初期値から行動価値関数の計算を開始できるように、行動価値関数初期化部の処理を変更する。これにより、行動価値関数を精度よく推定することができるようになる。 <Embodiment 4>
In the first embodiment, since the action value function is calculated sequentially using the equation (16), there is a problem that reinforcement learning tends to depend on the initial value and easily falls into a local solution. Therefore, the process of the behavior value function initialization unit is changed so that the calculation of the behavior value function can be started from the initial value close to the optimum target value. As a result, the behavior value function can be accurately estimated.

以下、図１７〜図１８を参照して実施形態４の音源強調学習装置４００を説明する。図１７は、音源強調学習装置４００の構成を示すブロック図である。図１８は、音源強調学習装置４００の動作を示すフローチャートである。音源強調学習装置４００は行動価値関数初期化部１２０の代わりに行動価値関数初期化部４２０が追加されている点のみにおいて音源強調学習装置１００と異なる。 Hereinafter, the sound source enhancement learning apparatus 400 according to the fourth embodiment will be described with reference to FIGS. 17 to 18. FIG. 17 is a block diagram illustrating a configuration of the sound source enhancement learning apparatus 400. FIG. 18 is a flowchart showing the operation of the sound source enhancement learning apparatus 400. The sound source enhancement learning device 400 differs from the sound source enhancement learning device 100 only in that a behavior value function initialization unit 420 is added instead of the behavior value function initialization unit 120.

そこで、以下では、図１９〜図２０を参照して行動価値関数初期化部４２０について説明する。図１９は、行動価値関数初期化部４２０の構成を示すブロック図である。図２０、行動価値関数初期化部４２０の動作を示すフローチャートである。図１９に示すように行動価値関数初期化部４２０は、最適ウィナーフィルタ決定部４２１と、観測信号生成部１３１と、周波数領域変換部１３２と、状態ベクトル生成部１３３と、行動価値関数識別学習部４２２を含む。 Therefore, hereinafter, the behavior value function initialization unit 420 will be described with reference to FIGS. 19 to 20. FIG. 19 is a block diagram showing a configuration of the behavior value function initialization unit 420. FIG. 20 is a flowchart showing the operation of the behavior value function initialization unit 420. As shown in FIG. 19, the behavior value function initialization unit 420 includes an optimum winner filter determination unit 421, an observation signal generation unit 131, a frequency domain conversion unit 132, a state vector generation unit 133, and a behavior value function identification learning unit. 422.

最適ウィナーフィルタ決定部４２１は、Ｓ１１２で生成したウィナーフィルタG_{ω,1,…,Ttrain}を用いてウィナーフィルタ番号a_1,…,Ttrainを決定する（Ｓ４２１）。具体的には、（課題４の解決法：二乗誤差ベースの初期値の決定）の式(25)を用いる。つまり、二乗誤差の意味で最適なウィナーフィルタ番号a_1,…,Ttrainを決定する。 The optimum winner filter determination unit 421 determines the winner filter numbers a _{1,..., Ttrain} using the winner filters G _{ω, 1,..., Ttrain} generated in S112 (S421). Specifically, Expression (25) of (Solution of Problem 4: Determination of initial value based on square error) is used. That is, the optimum Wiener filter number a _{1,..., Ttrain} is determined in _terms of the square error.

観測信号生成部１３１、周波数領域変換部１３２、状態ベクトル生成部１３３は、目的音学習データ記録部９１０、雑音学習データ記録部９２０から読み出した目的音学習データ、雑音学習データから状態ベクトルを生成する（Ｓ１３１〜Ｓ１３３）。 The observation signal generation unit 131, the frequency domain conversion unit 132, and the state vector generation unit 133 generate state vectors from the target sound learning data and the noise learning data read from the target sound learning data recording unit 910 and the noise learning data recording unit 920. (S131-S133).

行動価値関数識別学習部４２２は、状態ベクトルx_tとＳ４２１で決定したウィナーフィルタ番号a_tに対応するウィナーフィルタの組から行動価値関数Q(x, a|Θ)を識別学習する（Ｓ４２２）。具体的には、状態ベクトルx_tを入力したときにテンプレートの番号a_tを出力するように行動価値関数Q(x, a|Θ)を識別学習すればよい。 Action value function identifying and learning unit 422, the state vector x _t and action value function Q from a set of Wiener filter corresponding to the determined Wiener filter number a _t at S421 (x, a | Θ) identifying learning (S422). Specifically, the state vector x action value function as _t and outputs the number a _t templates when you enter Q (x, a | Θ) may be an identification learning.

本実施形態の発明によれば、強化学習においてより好ましい初期値から行動価値関数の学習が始まるために、行動価値関数を精度よく推定することができる。つまり、音質劣化をより抑制する音源強調の学習を行うことが可能となる。 According to the invention of this embodiment, since the learning of the behavior value function starts from a more preferable initial value in reinforcement learning, the behavior value function can be estimated with high accuracy. That is, it is possible to perform sound source enhancement learning that further suppresses sound quality degradation.

＜実施形態５＞
実施形態１〜４では、目的音学習データと雑音学習データの組から行動価値関数Q(x, a|Θ)（ただし、xは目的音学習データと雑音学習データの組から生成した状態ベクトル、aはウィナーフィルタのテンプレート番号を表す）を学習する方法について説明した。ここでは、実施形態１〜４で学習した行動価値関数Q(x, a|Θ)を用いて、マイクロホンで収音した観測信号から強調目的音を生成する方法について説明する。これにより、音質劣化を抑制した、観測信号中の目的音を音源強調した強調目的音を出力することが可能となる。 <Embodiment 5>
In the first to fourth embodiments, an action value function Q (x, a | Θ) (where x is a state vector generated from a set of target sound learning data and noise learning data, (a represents the template number of the Wiener filter). Here, a method for generating an emphasized target sound from an observation signal collected by a microphone using the action value function Q (x, a | Θ) learned in the first to fourth embodiments will be described. As a result, it is possible to output an emphasized target sound in which the target sound in the observation signal is emphasized as a sound source with suppressed sound quality deterioration.

なお、ここでの行動価値関数Q(x, a|Θ)は、学習終了時のΘの値を用いて表現されるものである。 The action value function Q (x, a | Θ) here is expressed using the value of Θ at the end of learning.

以下、図２１〜図２２を参照して実施形態５の音源強調装置５００を説明する。図２１は、音源強調装置５００の構成を示すブロック図である。図２１は、音源強調装置５００の動作を示すフローチャートである。図２１に示すように音源強調装置５００は、状態ベクトル生成部５１０と、行動価値関数評価部５２０と、強調目的音生成部５３０を含む。 Hereinafter, a sound source emphasizing apparatus 500 according to the fifth embodiment will be described with reference to FIGS. FIG. 21 is a block diagram illustrating a configuration of the sound source enhancement device 500. FIG. 21 is a flowchart showing the operation of the sound source emphasizing apparatus 500. As illustrated in FIG. 21, the sound source enhancement device 500 includes a state vector generation unit 510, an action value function evaluation unit 520, and an enhancement target sound generation unit 530.

また、音源強調装置５００は、観測信号記録部９３０に接続している。観測信号記録部９３０には、事前に収音した観測信号が記録されている。観測信号は音源強調の対象となるものであり、簡単のため、周波数領域信号として記録されているものとする。 The sound source emphasizing apparatus 500 is connected to the observation signal recording unit 930. In the observation signal recording unit 930, an observation signal collected in advance is recorded. The observation signal is a target of sound source enhancement, and is assumed to be recorded as a frequency domain signal for simplicity.

さらに、音源強調装置５００は、学習結果記録部９４０に接続している。学習結果記録部９４０には、事前に音源強調学習装置１００〜４００のいずれかを用いて生成したウィナーフィルタテンプレートG_ω,1,…,A、行動価値関数Q(x, a|Θ)が記録されている。 Furthermore, the sound source emphasizing apparatus 500 is connected to the learning result recording unit 940. In the learning result recording unit 940, a winner filter template _{Gω, 1,..., A} and an action value function Q (x, a | Θ) generated using any of the sound source enhancement learning devices 100 to 400 in advance are recorded. Has been.

状態ベクトル生成部５１０は、マイクロホンで収音した観測信号X_ω,t（ただし、t∈{1, …, T}、ω∈{1, …, Ω}）から式（２６）を用いて状態ベクトルx_tを生成する（Ｓ５１０） The state vector generation unit 510 uses the observation signal X _{ω, t} (where t∈ {1,..., T}, ω∈ {1,..., Ω}) collected by the microphone to obtain the state using Equation (26). A vector x _t is generated (S510).

行動価値関数評価部５２０は、学習結果記録部９４０から読み出したウィナーフィルタテンプレートG_ω,1,…,Aと行動価値関数Q(x, a|Θ)を読出し、各ウィナーフィルタテンプレートG_ω,1,…,Aに対して行動価値関数Q(x_t, a|Θ)（ただし、x_tはＳ５１０で生成した状態ベクトル、aは1〜Aを示すテンプレート番号）を計算し、式(7)を用いて最適なウィナーフィルタテンプレートG_ω,at（テンプレート番号a_t）を選択する（Ｓ５２０）。つまり、基本的には、テンプレート選択部１３４と同様の処理を実行する。 The behavior value function evaluation unit 520 reads the winner filter template G _{ω, 1,..., A} and the behavior value function Q (x, a | Θ) read from the learning result recording unit 940, and each winner filter template G _{ω, 1 ,..., A is} calculated as an action value function Q (x _t , a | Θ) (where x _t is the state vector generated in S510, and a is a template number indicating 1 to A). Is used to select the optimal winner filter template G _{ω, at} (template number a _t ) (S520). That is, basically the same processing as that performed by the template selection unit 134 is executed.

強調目的音生成部５３０は、Ｓ５２０で選択した最適なウィナーフィルタテンプレートG_ω,atと式(27)を用いて周波数領域強調目的音Y_ω,tを生成する（Ｓ５３０）。つまり、基本的には、強調目的音生成部１３５と同様の処理を実行する。 The emphasis target sound generation unit 530 generates the frequency domain emphasis target sound Y _{ω, t} using the optimal Wiener filter template G _{ω, at} selected in S520 and Equation (27) (S530). That is, basically, the same processing as that of the enhancement target sound generation unit 135 is executed.

本実施形態の発明によれば、音質劣化を抑制した、観測信号中の目的音を音源強調した強調目的音を出力することが可能となる。 According to the invention of the present embodiment, it is possible to output an emphasized target sound in which the target sound in the observation signal is emphasized as a sound source with suppressed sound quality deterioration.

＜変形例１＞
実施形態１〜実施形態４では、報酬として聴感評点を採用し説明したが、報酬には聴感評点以外のものを用いることができる。例えば、音声認識向けに音源強調を最適化したい場合は、報酬として音声認識が正解だったか不正解だったかの二値を用いればよい。この場合、実施形態１〜実施形態４における聴感評点計算部１４０を、Ｓ１３０で出力された強調目的音を入力として音声認識した結果を出力する音声認識部に置換するのでよい。 <Modification 1>
In the first to fourth embodiments, the auditory score is adopted as the reward, but a reward other than the auditory score can be used. For example, when it is desired to optimize sound source enhancement for speech recognition, a binary value indicating whether speech recognition is correct or incorrect may be used as a reward. In this case, the auditory score calculation unit 140 in the first to fourth embodiments may be replaced with a speech recognition unit that outputs a result of speech recognition using the emphasized target sound output in S130 as an input.

＜変形例２＞
実施形態１〜実施形態４のウィナーフィルタテンプレート化部１１０では、クラスタリングを用いて有限個のウィナーフィルタテンプレートを生成している。クラスタリングを用いる代わりに、式(5)や式(6)のような二乗誤差に基づいてウィナーフィルタを生成し、式(3)のようなSN比を基準（以下、第１の基準という）としてSN比の高いウィナーフィルタテンプレートを生成するのでもよい。また、特定の入力音（例えば、音声やスポーツ音）に対してSN比が高くなるように、入力音の特性を考慮してウィナーフィルタテンプレートを設計するのでもよい。 <Modification 2>
The winner filter template forming unit 110 according to the first to fourth embodiments generates a finite number of winner filter templates using clustering. Instead of using clustering, a Wiener filter is generated based on a square error such as Equation (5) or Equation (6), and the SN ratio as in Equation (3) is used as a reference (hereinafter referred to as the first reference). A Wiener filter template having a high SN ratio may be generated. In addition, the winner filter template may be designed in consideration of the characteristics of the input sound so that the SN ratio is high with respect to a specific input sound (for example, voice or sports sound).

このようにウィナーフィルタテンプレートを生成すると、必ずしも聴感評点は高くならない。そこで、行動価値関数更新部１５０では、聴感評点を基準（以下、第２の基準という）として聴感評点が高くなるよう行動価値関数を更新する、具体的には、第１の基準だけでなく、第２の基準も満たすようなウィナーフィルタテンプレートを選択する形で、行動価値関数を更新する。 When the Wiener filter template is generated in this way, the auditory rating does not necessarily increase. Therefore, the behavior value function updating unit 150 updates the behavior value function so that the auditory score becomes high with the auditory score as a reference (hereinafter referred to as a second reference). Specifically, not only the first reference, The action value function is updated by selecting a winner filter template that also satisfies the second criterion.

これにより、音質劣化を抑制した音源強調の学習を行うことが可能となる。 This makes it possible to perform sound source enhancement learning with suppressed sound quality degradation.

＜補記＞
本発明の装置は、例えば単一のハードウェアエンティティとして、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ハードウェアエンティティの外部に通信可能な通信装置（例えば通信ケーブル）が接続可能な通信部、ＣＰＵ（Central Processing Unit、キャッシュメモリやレジスタなどを備えていてもよい）、メモリであるＲＡＭやＲＯＭ、ハードディスクである外部記憶装置並びにこれらの入力部、出力部、通信部、ＣＰＵ、ＲＡＭ、ＲＯＭ、外部記憶装置の間のデータのやり取りが可能なように接続するバスを有している。また必要に応じて、ハードウェアエンティティに、ＣＤ−ＲＯＭなどの記録媒体を読み書きできる装置（ドライブ）などを設けることとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 <Supplementary note>
The apparatus of the present invention includes, for example, a single hardware entity as an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, and a communication device (for example, a communication cable) capable of communicating outside the hardware entity. Can be connected to a communication unit, a CPU (Central Processing Unit, may include a cache memory or a register), a RAM or ROM that is a memory, an external storage device that is a hard disk, and an input unit, an output unit, or a communication unit thereof , A CPU, a RAM, a ROM, and a bus connected so that data can be exchanged between the external storage devices. If necessary, the hardware entity may be provided with a device (drive) that can read and write a recording medium such as a CD-ROM. A physical entity having such hardware resources includes a general-purpose computer.

ハードウェアエンティティの外部記憶装置には、上述の機能を実現するために必要となるプログラムおよびこのプログラムの処理において必要となるデータなどが記憶されている（外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるＲＯＭに記憶させておくこととしてもよい）。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭや外部記憶装置などに適宜に記憶される。 The external storage device of the hardware entity stores a program necessary for realizing the above functions and data necessary for processing the program (not limited to the external storage device, for example, reading a program) It may be stored in a ROM that is a dedicated storage device). Data obtained by the processing of these programs is appropriately stored in a RAM or an external storage device.

ハードウェアエンティティでは、外部記憶装置（あるいはＲＯＭなど）に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてメモリに読み込まれて、適宜にＣＰＵで解釈実行・処理される。その結果、ＣＰＵが所定の機能（上記、…部、…手段などと表した各構成要件）を実現する。 In the hardware entity, each program stored in an external storage device (or ROM or the like) and data necessary for processing each program are read into a memory as necessary, and are interpreted and executed by a CPU as appropriate. . As a result, the CPU realizes a predetermined function (respective component requirements expressed as the above-described unit, unit, etc.).

本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 The present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the spirit of the present invention. In addition, the processing described in the above embodiment may be executed not only in time series according to the order of description but also in parallel or individually as required by the processing capability of the apparatus that executes the processing. .

既述のように、上記実施形態において説明したハードウェアエンティティ（本発明の装置）における処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。 As described above, when the processing functions in the hardware entity (the apparatus of the present invention) described in the above embodiments are realized by a computer, the processing contents of the functions that the hardware entity should have are described by a program. Then, by executing this program on a computer, the processing functions in the hardware entity are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, a hardware entity is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

１００音源強調学習装置
１１０ウィナーフィルタテンプレート化部
１１１周波数領域変換部
１１２ウィナーフィルタ生成部
１１３クラスタリング部
１２０行動価値関数初期化部
１３０音源強調部
１３１観測信号生成部
１３２周波数領域変換部
１３３状態ベクトル生成部
１３４テンプレート選択部
１３５強調目的音生成部
１３６時間領域変換部
１３７出力生成部
１４０聴感評点計算部
１５０行動価値関数更新部
１６０収束判定部
２００音源強調学習装置
２４０聴感評点バイナリ化部
２４１バイナリ化部
２４２閾値更新部
３００音源強調学習装置
３４０更新重み計算部
３４１周波数領域変換部
３４２誤差計算部
３４３誤差正規化部
３４４更新重み決定部
４００音源強調学習装置
４２０行動価値関数初期化部
４２１最適ウィナーフィルタ決定部
４２２行動価値関数識別学習部
５００音源強調装置
５１０状態ベクトル生成部
５２０行動価値関数評価部
５３０強調目的音生成部 DESCRIPTION OF SYMBOLS 100 Sound source emphasis learning apparatus 110 Wiener filter template conversion part 111 Frequency domain conversion part 112 Wiener filter generation part 113 Clustering part 120 Action value function initialization part 130 Sound source emphasis part 131 Observation signal generation part 132 Frequency domain conversion part 133 State vector generation part 134 Template selection unit 135 Enhancement target sound generation unit 136 Time domain conversion unit 137 Output generation unit 140 Auditory score calculation unit 150 Action value function update unit 160 Convergence determination unit 200 Sound source enhancement learning device 240 Auditory score binarization unit 241 Binary conversion unit 242 Threshold update unit 300 Sound source enhancement learning device 340 Update weight calculation unit 341 Frequency domain conversion unit 342 Error calculation unit 343 Error normalization unit 344 Update weight determination unit 400 Sound source enhancement learning device 420 Action value function initialization unit 421 Optimal winner The filter determining unit 422 action value function identification learning unit 500 sound enhancement apparatus 510 state vector generator 520 action value function evaluation section 530 emphasis target sound generator

Claims

A winner filter templating unit that generates a finite number of winner filters as a winner filter template from a set of frequency domain target sound learning data and frequency domain noise learning data;
An action value function initialization unit for generating an initial value of the action value function;
A state vector expressed using a frequency domain observation signal generated from a set of the frequency domain target sound learning data and the frequency domain noise learning data is generated, and the state vector and the Wiener filter template are used to calculate the state vector. A sound source emphasizing unit that generates an emphasis target sound by applying an optimal Wiener filter template selected based on the value of the action value function to the frequency domain observation signal;
An action value function updating unit for updating the action value function using an auditory rating calculated from the emphasized target sound;
A sound source enhancement learning device comprising: a convergence determination unit that outputs the action value function when a predetermined convergence condition is satisfied.

The sound source enhancement learning device according to claim 1,
further,
A sound source enhancement learning apparatus including an auditory score binarization unit that generates an auditory score obtained by binary conversion of the auditory score.

The sound source enhancement learning device according to claim 1 or 2,
A function that is used when the behavior value function update unit updates the behavior value function, and is expressed as a sum of values calculated for each frame using the auditory score (hereinafter referred to as an error value) for all frames. Let the error function L _Θ be
further,
An update weight calculation unit that calculates an update weight for each frame using the frequency domain target sound learning data and the frequency domain emphasized target sound generated by the sound source enhancement unit;
The behavior value function updating unit uses the error function L _Θ as the function expressed as the sum of the error function L _Θ multiplied by the update weight and the value for all frames. A sound source enhancement learning apparatus that updates

The sound source enhancement learning device according to any one of claims 1 to 3,
The behavior value function initialization unit includes:
A number for identifying an optimal winner filter is determined from among the winner filters generated from the set of the frequency domain target sound learning data and the frequency domain noise learning data, and the frequency domain target sound learning data and the frequency domain noise learning data are determined. A sound source emphasizing learning apparatus using an action value function discriminated and learned using a state vector generated from a set of the number and the number set as the initial value.

A sound source that generates a frequency domain emphasized target sound obtained by sound source emphasizing a frequency domain observation signal, using the winner filter template and the action value function generated by using the sound source emphasis learning device according to claim 1. An emphasis device,
A state vector generator for generating a state vector from the frequency domain observation signal;
An action value function evaluation unit that selects an optimal winner filter template based on the value of the action value function calculated using the state vector and the winner filter template;
A sound source emphasizing apparatus including: an emphasis target sound generation unit that generates the frequency domain emphasis target sound from the frequency domain observation signal using the optimal winner filter template.

A Wiener filter templating unit that generates a Wiener filter that satisfies the first criterion from a set of frequency domain target sound learning data and frequency domain noise learning data as a Wiener filter template;
An action value function initialization unit for generating an initial value of the action value function;
A state vector expressed using a frequency domain observation signal generated from a set of the frequency domain target sound learning data and the frequency domain noise learning data is generated, and the state vector and the Wiener filter template are used to calculate the state vector. A sound source emphasizing unit that generates an emphasis target sound by applying an optimal Wiener filter template selected based on the value of the action value function to the frequency domain observation signal;
An action value function update unit that updates the action value function so that a winner filter template that satisfies the first criterion and the second criterion is selected by using an auditory score that is a value obtained by evaluating the emphasized target sound. When,
A sound source enhancement learning device comprising: a convergence determination unit that outputs the action value function when a predetermined convergence condition is satisfied.

A sound source enhancement learning method in which a sound source enhancement learning device generates and outputs an action value function from a set of frequency domain target sound learning data and frequency domain noise learning data,
The sound source enhancement learning device generates a finite number of winner filters from the set of the frequency domain target sound learning data and the frequency domain noise learning data as a winner filter template,
The sound source emphasis learning device initializes the action value function, an action value function initialization step;
The sound source enhancement learning device generates a state vector expressed using a frequency domain observation signal generated from a set of the frequency domain target sound learning data and the frequency domain noise learning data, and the state vector and the Wiener filter A sound source emphasizing step for generating an emphasized target sound by applying an optimal Wiener filter template selected based on the value of the action value function calculated using a template to the frequency domain observation signal;
An action value function update step in which the sound source enhancement learning device updates the action value function using an auditory score calculated from the emphasized target sound;
A sound source enhancement learning method comprising: a convergence determination step of outputting the action value function when the sound source enhancement learning device satisfies a predetermined convergence condition.

A program for causing a computer to function as the sound source enhancement learning device according to any one of claims 1 to 4 or the sound source enhancement device according to claim 5.