JP5821584B2

JP5821584B2 - Audio processing apparatus, audio processing method, and audio processing program

Info

Publication number: JP5821584B2
Application number: JP2011265168A
Authority: JP
Inventors: 悟朗河崎; 松尾　直司; 直司松尾
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-12-02
Filing date: 2011-12-02
Publication date: 2015-11-24
Anticipated expiration: 2031-12-02
Also published as: JP2013117639A

Description

本発明は、音声処理装置、音声処理方法及び音声処理プログラムに関する。 The present invention relates to a voice processing device, a voice processing method, and a voice processing program.

従来、例えば、音声信号にフィルタ処理を行う場合に、周波数領域でのフィルタ処理が広く行われている。これは、時間領域でのフィルタ処理に比べてフィルタ処理に要する計算量が少ないからである。 Conventionally, for example, when filtering is performed on an audio signal, filtering in the frequency domain has been widely performed. This is because the amount of calculation required for the filter processing is small compared to the filter processing in the time domain.

例えば、周波数領域でのフィルタ処理では、時間領域の音声信号を周波数領域に変換するために、音声信号から複数の分析フレームが切り出される。ここで、切り出された分析フレームを加算すると、分析フレームごとの端が連続に繋がらず、元の音声信号に雑音が含まれてしまう。このため、分析フレームの端を連続にして元の音声信号に音質を近づけるために、例えば、オーバラップ加算と呼ばれる方法が用いられる。これは、例えば、それぞれの分析フレームに窓関数を掛け、フレーム長が５０％ずつオーバラップするように、それぞれの分析フレームを加算する方法である。 For example, in the filter processing in the frequency domain, a plurality of analysis frames are cut out from the audio signal in order to convert the audio signal in the time domain into the frequency domain. Here, when the extracted analysis frames are added, the ends of the analysis frames are not connected continuously, and noise is included in the original audio signal. For this reason, for example, a method called overlap addition is used in order to make the end of the analysis frame continuous and bring the sound quality closer to the original audio signal. This is, for example, a method of multiplying each analysis frame by a window function and adding each analysis frame so that the frame lengths overlap each other by 50%.

また、例えば、上記のオーバラップ加算の改良方法として、分析フレームがオーバラップする割合を８７．５％とすることも提案されている。なお、分析フレームがオーバラップする割合は、オーバラップ割合と呼んでいる。 Further, for example, as an improvement method of the above overlap addition, it is also proposed that the analysis frame overlap ratio is 87.5%. The rate at which the analysis frames overlap is called the overlap rate.

特開２００４−２０６７９号公報JP 2004-20679 A

ローズ・カーチス著「コンピュータ音楽歴史・テクノロジー・アート」東京電機大学出版局２００１年１月Rose Curtis "Computer Music History, Technology, Art" Tokyo Denki University Press January 2001

しかしながら、上記の従来技術では、フィルタ処理の計算量が増大してしまうという課題があった。例えば、音質を向上させるためにオーバラップ割合を５０％、７５％、８７．５％・・・と増加させると、フィルタ処理の計算量は２倍、４倍、８倍・・・と増大してしまう。このため、例えば、低音質な音源に対してフィルタ処理を行う場合や雑音が多い状況でフィルタ処理を行う場合には、オーバラップ割合を一定以上増加させても音質がほとんど向上しなくなり、フィルタ処理の計算量のみが増大してしまうことがあった。 However, the above-described conventional technique has a problem in that the amount of calculation for the filter processing increases. For example, if the overlap ratio is increased to 50%, 75%, 87.5%, etc. in order to improve the sound quality, the calculation amount of the filter processing increases to 2 times, 4 times, 8 times, etc. End up. For this reason, for example, when filtering a low-quality sound source or when performing filtering in a noisy situation, even if the overlap ratio is increased more than a certain level, the sound quality is hardly improved. In some cases, only the amount of calculation increases.

開示の技術は、上記に鑑みてなされたものであって、フィルタ処理の計算量を抑制することができる音声処理装置、音声処理方法及び音声処理プログラムを提供することを目的とする。 The disclosed technology has been made in view of the above, and an object thereof is to provide an audio processing device, an audio processing method, and an audio processing program capable of suppressing the amount of calculation of filter processing.

本願の開示する技術は、一つの態様において、フィルタ処理部と、検出部とを備える。フィルタ処理部は、入力信号に対して、所定の割合で分析フレームがオーバラップする窓関数処理を用いて周波数領域のフィルタ処理を実行する。検出部は、分析フレームがオーバラップする割合を増加させるごとに、フィルタ処理が実行された後の信号と入力信号に基づいた信号との類似度をそれぞれ算出し、算出した類似度に基づいて、フィルタ処理部に設定する割合を検出する。 The technique which this application discloses is provided with a filter process part and a detection part in one mode. The filter processing unit performs frequency domain filter processing on the input signal using window function processing in which analysis frames overlap at a predetermined rate. Each time the detection unit increases the rate of overlap of the analysis frames, the detection unit calculates the similarity between the signal after the filtering process and the signal based on the input signal, and based on the calculated similarity, The ratio set in the filter processing unit is detected.

本願の開示する技術の一つの態様によれば、フィルタ処理の計算量を抑制することができるという効果を奏する。 According to one aspect of the technology disclosed in the present application, there is an effect that the amount of calculation of the filter processing can be suppressed.

図１は、実施例１に係る音声処理装置の機能構成を示すブロック図である。FIG. 1 is a block diagram illustrating a functional configuration of the speech processing apparatus according to the first embodiment. 図２は、実施例１に係る音声処理装置における信号の流れの一例を説明するための図である。FIG. 2 is a diagram for explaining an example of a signal flow in the sound processing apparatus according to the first embodiment. 図３は、オーバラップについて説明するための図である。FIG. 3 is a diagram for explaining the overlap. 図４は、オーバラップについて説明するための図である。FIG. 4 is a diagram for explaining the overlap. 図５は、オーバラップについて説明するための図である。FIG. 5 is a diagram for explaining the overlap. 図６は、５０％オーバラップ加算によるフィルタ処理音と原音とを比較する処理を説明するための図である。FIG. 6 is a diagram for explaining a process of comparing the filtered sound and the original sound by 50% overlap addition. 図７は、７５％オーバラップ加算によるフィルタ処理音と原音とを比較する処理を説明するための図である。FIG. 7 is a diagram for explaining processing for comparing the filtered sound and the original sound by 75% overlap addition. 図８は、８７．５％オーバラップ加算によるフィルタ処理音と原音とを比較する処理を説明するための図である。FIG. 8 is a diagram for explaining processing for comparing the filtered sound and the original sound by 87.5% overlap addition. 図９は、オーバラップ割合と決定係数との関係を説明するための図である。FIG. 9 is a diagram for explaining the relationship between the overlap ratio and the determination coefficient. 図１０は、実施例１に係る音声処理装置の処理手順を示すフローチャートである。FIG. 10 is a flowchart illustrating the processing procedure of the speech processing apparatus according to the first embodiment. 図１１は、実施例２に係る音声処理装置の機能構成を示すブロック図である。FIG. 11 is a block diagram illustrating a functional configuration of the speech processing apparatus according to the second embodiment. 図１２は、実施例２に係る音声処理装置における信号の流れの一例を説明するための図である。FIG. 12 is a diagram for explaining an example of a signal flow in the sound processing apparatus according to the second embodiment. 図１３は、実施例２に係る音声処理装置の処理手順を示すフローチャートである。FIG. 13 is a flowchart illustrating the processing procedure of the sound processing apparatus according to the second embodiment. 図１４は、決定係数と音質との関係について説明するための図である。FIG. 14 is a diagram for explaining the relationship between the determination coefficient and the sound quality. 図１５は、決定係数と音質との関係について説明するための図である。FIG. 15 is a diagram for explaining the relationship between the determination coefficient and the sound quality. 図１６は、実施例３に係る音声処理装置の処理手順を示すフローチャートである。FIG. 16 is a flowchart illustrating the processing procedure of the speech processing apparatus according to the third embodiment. 図１７は、実施例３に係る音声処理装置の処理手順を示すフローチャートである。FIG. 17 is a flowchart illustrating the processing procedure of the speech processing apparatus according to the third embodiment. 図１８は、音声処理プログラムを実行するコンピュータの一例を示す図である。FIG. 18 is a diagram illustrating an example of a computer that executes a voice processing program.

以下に、本願の開示する音声処理装置、音声処理方法及び音声処理プログラムの実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。各実施例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Hereinafter, embodiments of a voice processing device, a voice processing method, and a voice processing program disclosed in the present application will be described in detail with reference to the drawings. Note that the present invention is not limited to the embodiments. Each embodiment can be appropriately combined within a range in which processing contents do not contradict each other.

実施例１に係る音声処理装置の機能構成の一例について説明する。図１は、実施例１に係る音声処理装置の機能構成を示すブロック図である。図１に示すように、この音声処理装置１００は、信号取得部１１０と、算出部１２０と、フィルタ処理部１３０と、割合決定部１４０とを有する。音声処理装置１００は、例えば、携帯電話機やスマートフォン、ＰＨＳ（Personal Handy-phone System）端末などの携帯端末装置や、ＣＤ（Compact Disk）プレーヤーやデジタルオーディオプレーヤーなどの音声再生装置などに対応する。また、音声処理装置１００は、スピーカ１０と、マイク２０とに接続される。スピーカ１０は、スピーカ１０の周辺に音を出力する装置である。マイク２０は、マイク２０の周辺の音を集音する装置である。なお、ここでは、スピーカ１０及びマイク２０が外部装置として音声処理装置１００に接続される場合を説明したが、本発明はこれに限定されるものではない。例えば、スピーカ１０及びマイク２０は音声処理装置１００に内蔵されていても良い。 An example of a functional configuration of the speech processing apparatus according to the first embodiment will be described. FIG. 1 is a block diagram illustrating a functional configuration of the speech processing apparatus according to the first embodiment. As illustrated in FIG. 1, the speech processing apparatus 100 includes a signal acquisition unit 110, a calculation unit 120, a filter processing unit 130, and a ratio determination unit 140. The audio processing device 100 corresponds to, for example, a mobile terminal device such as a mobile phone, a smartphone, or a PHS (Personal Handy-phone System) terminal, or an audio playback device such as a CD (Compact Disk) player or a digital audio player. The audio processing device 100 is connected to the speaker 10 and the microphone 20. The speaker 10 is a device that outputs sound around the speaker 10. The microphone 20 is a device that collects sounds around the microphone 20. In addition, although the case where the speaker 10 and the microphone 20 are connected to the audio processing device 100 as external devices has been described here, the present invention is not limited to this. For example, the speaker 10 and the microphone 20 may be built in the audio processing device 100.

なお、信号取得部１１０、算出部１２０、フィルタ処理部１３０及び割合決定部１４０にて行われる各処理機能は、以下のように実現される。すなわち、これらの各処理機能は、その全部または任意の一部が、ＣＰＵ（Central Processing Unit）および当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 The processing functions performed by the signal acquisition unit 110, the calculation unit 120, the filter processing unit 130, and the ratio determination unit 140 are realized as follows. In other words, all or a part of these processing functions can be realized by a CPU (Central Processing Unit) and a program analyzed and executed by the CPU, or can be realized as hardware by wired logic. .

また、図２は、実施例１に係る音声処理装置における信号の流れの一例を説明するための図である。図２に示す各処理機能は、図１に示した同一符号の各処理機能に対応する。なお、音声処理装置における信号の流れについては、音声処理装置１００の各処理機能とともに説明する。 FIG. 2 is a diagram for explaining an example of a signal flow in the sound processing apparatus according to the first embodiment. Each processing function shown in FIG. 2 corresponds to each processing function having the same reference numeral shown in FIG. The signal flow in the voice processing device will be described together with each processing function of the voice processing device 100.

信号取得部１１０は、音源から音声信号を取得して、取得した音声信号を出力する。例えば、信号取得部１１０は、ＦＭ（Frequency Modulation）ラジオを受信するアンテナから音声信号を取得する。また、例えば、信号取得部１１０は、音声データが記憶されたメモリから音声信号を取得する。また、例えば、信号取得部１１０は、図２に示すように、取得した音声信号を、スピーカ１０、算出部１２０、窓関数処理部１３１及び検出部１４１に出力する。なお、信号取得部１１０は、取得した音声信号がアナログ信号である場合には、かかるアナログ信号をデジタル信号に変換した上で出力する。また、以下の説明では、信号取得部１１０から出力される音声信号を「原音」とも称する。 The signal acquisition unit 110 acquires an audio signal from the sound source and outputs the acquired audio signal. For example, the signal acquisition unit 110 acquires an audio signal from an antenna that receives FM (Frequency Modulation) radio. For example, the signal acquisition unit 110 acquires an audio signal from a memory in which audio data is stored. For example, as illustrated in FIG. 2, the signal acquisition unit 110 outputs the acquired audio signal to the speaker 10, the calculation unit 120, the window function processing unit 131, and the detection unit 141. When the acquired audio signal is an analog signal, the signal acquisition unit 110 converts the analog signal into a digital signal and outputs the digital signal. In the following description, the audio signal output from the signal acquisition unit 110 is also referred to as “original sound”.

算出部１２０は、ＦＩＲ（Finite Impulse Response）フィルタを算出する。例えば、算出部１２０は、後述するＦＩＲフィルタ１３３に適用するフィルタ計数Ｘ（ｓ）を算出する。例えば、算出部１２０は、信号取得部１１０から出力された音声信号と、マイク２０により集音された信号とを用いてフィルタ計数Ｘ（ｓ）を算出し、算出したフィルタ計数Ｘ（ｓ）をＦＩＲフィルタ１３３に設定する。例えば、算出部１２０は、図２に示すように、算出したフィルタ計数Ｘ（ｓ）をＦＩＲフィルタ１３３に設定するための情報を、ＦＩＲフィルタ１３３に出力する。 The calculation unit 120 calculates a FIR (Finite Impulse Response) filter. For example, the calculation unit 120 calculates a filter count X (s) to be applied to the FIR filter 133 described later. For example, the calculation unit 120 calculates the filter count X (s) using the audio signal output from the signal acquisition unit 110 and the signal collected by the microphone 20, and calculates the calculated filter count X (s). Set to FIR filter 133. For example, as illustrated in FIG. 2, the calculation unit 120 outputs information for setting the calculated filter count X (s) in the FIR filter 133 to the FIR filter 133.

ここで、算出部１２０がフィルタ計数Ｘ（ｓ）を算出する処理について説明する。例えば、算出部１２０は、スピーカ１０からマイク２０までの音響特性であるインパルスレスポンスＨ（ｓ）を算出する。例えば、算出部１２０は、信号取得部１１０から出力された音声信号とマイク２０により集音された信号とをそれぞれ周波数領域に変換する。算出部１２０は、周波数領域における信号取得部１１０から出力された音声信号とマイク２０により集音された信号との比からインパルスレスポンスＨ（ｓ）を算出する。 Here, a process in which the calculation unit 120 calculates the filter count X (s) will be described. For example, the calculation unit 120 calculates an impulse response H (s) that is an acoustic characteristic from the speaker 10 to the microphone 20. For example, the calculation unit 120 converts the audio signal output from the signal acquisition unit 110 and the signal collected by the microphone 20 into frequency domains. The calculation unit 120 calculates the impulse response H (s) from the ratio between the audio signal output from the signal acquisition unit 110 in the frequency domain and the signal collected by the microphone 20.

例えば、算出部１２０は、算出したインパルスレスポンスＨ（ｓ）の逆特性Ｘ（ｓ）＝Ｈ（ｓ）^−１を算出する。算出部１２０は、算出した逆特性Ｘ（ｓ）をＦＩＲフィルタ１３３のフィルタ計数Ｘ（ｓ）として設定する。ここで、スピーカ１０からマイク２０までの音響特性の逆特性をフィルタ計数として用いるのは、マイク２０の位置で原音を再現できるからである。なお、フィルタ計数Ｘ（ｓ）算出時の零割防止の影響により、Ｘ（ｓ）Ｈ（ｓ）＝１は保証されない。 For example, the calculation unit 120 calculates the inverse characteristic X (s) = H (s) ⁻¹ of the calculated impulse response H (s). The calculation unit 120 sets the calculated inverse characteristic X (s) as the filter count X (s) of the FIR filter 133. Here, the reverse characteristic of the acoustic characteristics from the speaker 10 to the microphone 20 is used as the filter count because the original sound can be reproduced at the position of the microphone 20. It should be noted that X (s) H (s) = 1 is not guaranteed due to the effect of preventing zeroing when calculating the filter count X (s).

フィルタ処理部１３０は、例えば、入力信号に対して、所定の割合で分析フレームがオーバラップする窓関数処理を用いて周波数領域のフィルタ処理を実行する。例えば、フィルタ処理部１３０は、入力信号として原音を受け付け、受け付けた原音に対して所定の割合で分析フレームがオーバラップする窓関数処理を用いて周波数領域のフィルタ処理を実行する。フィルタ処理部１３０は、例えば、窓関数処理部１３１と、変換部１３２と、ＦＩＲフィルタ１３３と、逆変換部１３４と、加算部１３５とを有する。 For example, the filter processing unit 130 performs a frequency domain filtering process on the input signal using window function processing in which analysis frames overlap at a predetermined rate. For example, the filter processing unit 130 receives an original sound as an input signal, and executes a frequency domain filter process using window function processing in which analysis frames overlap at a predetermined rate with respect to the received original sound. The filter processing unit 130 includes, for example, a window function processing unit 131, a conversion unit 132, an FIR filter 133, an inverse conversion unit 134, and an addition unit 135.

窓関数処理部１３１は、例えば、時間領域の音声信号に窓関数処理を実行する。例えば、窓関数処理部１３１は、音声信号に対して、所定の割合で分析フレームがオーバラップする窓関数処理を実行し、時間領域の音声信号から複数の分析フレームを切り出す。例えば、窓関数処理部１３１は、図２に示すように、切り出した複数の分析フレームを変換部１３２に出力する。 For example, the window function processing unit 131 performs window function processing on an audio signal in the time domain. For example, the window function processing unit 131 performs window function processing in which analysis frames overlap each other at a predetermined rate with respect to the audio signal, and cuts out a plurality of analysis frames from the time domain audio signal. For example, as shown in FIG. 2, the window function processing unit 131 outputs a plurality of cut out analysis frames to the conversion unit 132.

例えば、窓関数処理部１３１は、信号取得部１１０から出力された音声信号を受け付ける。窓関数処理部１３１は、受け付けた音声信号に対して、後述する設定部１４２により設定された割合で隣り合う分析フレーム同士がオーバラップするようにハニング窓を掛け、複数の分析フレームを切り出す。例えば、窓関数処理部１３１は、隣り合う分析フレーム同士が５０％ずつオーバラップするようにハニング窓を掛ける。窓関数処理部１３１は、切り出した複数の分析フレームを変換部１３２に出力する。なお、ここでは、窓関数処理部１３１が窓関数としてハニング窓を用いる場合を説明したが、本発明はこれに限定されるものではない。ハニング窓は窓関数の端が０に収束していることと、５０％ずらして足し合わせるとちょうど１になるという性質があり、窓端部に発生する雑音を抑制することに適している。例えば、上記のような性質をもつ窓関数なら窓関数処理部１３１は、バートレット窓など別の窓関数を窓関数として用いても良い。 For example, the window function processing unit 131 receives an audio signal output from the signal acquisition unit 110. The window function processing unit 131 multiplies the received audio signal by a Hanning window so that adjacent analysis frames overlap at a rate set by the setting unit 142 described later, and cuts out a plurality of analysis frames. For example, the window function processing unit 131 sets the Hanning window so that adjacent analysis frames overlap each other by 50%. The window function processing unit 131 outputs the extracted plurality of analysis frames to the conversion unit 132. Although the case where the window function processing unit 131 uses a Hanning window as the window function has been described here, the present invention is not limited to this. The Hanning window has the property that the end of the window function has converged to 0 and that it is exactly 1 when shifted by 50%, and is suitable for suppressing noise generated at the window end. For example, if the window function has the above properties, the window function processing unit 131 may use another window function such as a Bartlett window as the window function.

ここで、図３から図５を用いて、オーバラップについて説明する。図３から図５は、オーバラップについて説明するための図である。図３は、時間領域における音声信号の波形の一例を示す。図３の横軸は時間［ｍｓ］を示し、縦軸は振幅を示す。例えば、８キロヘルツサンプリングで取得された１サンプルの信号を１６ビットで表した場合には、その値の範囲は−３２７６７から＋３２７６８となる。図３には、時間領域の音声信号から分析フレーム１ａと、分析フレーム１ｂと、分析フレーム１ｃとが切り出される場合を示す。なお、分析フレーム１ａと分析フレーム１ｃとは５０％オーバラップし、分析フレーム１ｂと分析フレーム１ｃとは５０％オーバラップする。なお、８キロヘルツサンプリングとは、８千分の１秒ごとにサンプリングすることを表す。 Here, the overlap will be described with reference to FIGS. 3 to 5 are diagrams for explaining the overlap. FIG. 3 shows an example of a waveform of an audio signal in the time domain. In FIG. 3, the horizontal axis indicates time [ms], and the vertical axis indicates amplitude. For example, when a signal of one sample acquired by 8 kHz sampling is expressed by 16 bits, the range of the value is −32767 to +32768. FIG. 3 shows a case where the analysis frame 1a, the analysis frame 1b, and the analysis frame 1c are cut out from the audio signal in the time domain. The analysis frame 1a and the analysis frame 1c overlap 50%, and the analysis frame 1b and the analysis frame 1c overlap 50%. In addition, 8 kilohertz sampling represents sampling every 1/8000 second.

図４には、オーバラップしない分析フレームを加算する場合を示す。図４の分析フレーム２ａの信号は、分析フレーム１ａの信号に周波数領域のフィルタ処理を実行した後に、時間領域に逆変換した信号を示す。分析フレーム２ｂの信号は、分析フレーム１ｂの信号に周波数領域のフィルタ処理を実行した後に、時間領域に逆変換した信号を示す。図４に示すように、分析フレーム２ａと分析フレーム２ｂとを加算しても、波形の間にギャップ２ｃが存在してしまう。このため、分析フレームの端が連続に繋がらず、元の音声信号に雑音が含まれてしまう。 FIG. 4 shows a case where analysis frames that do not overlap are added. The signal of the analysis frame 2a in FIG. 4 indicates a signal that has been subjected to frequency domain filtering on the signal of the analysis frame 1a and then inversely transformed to the time domain. The signal of the analysis frame 2b indicates a signal that has been subjected to frequency domain filtering on the signal of the analysis frame 1b and then inversely transformed to the time domain. As shown in FIG. 4, even if the analysis frame 2a and the analysis frame 2b are added, a gap 2c exists between the waveforms. For this reason, the ends of the analysis frames are not continuously connected, and noise is included in the original audio signal.

図５には、オーバラップする分析フレームを加算する場合を示す。図５の分析フレーム３ａの信号は、分析フレーム１ａの信号にハニング窓を掛けてから周波数領域のフィルタ処理を実行した後に、時間領域に逆変換した信号を示す。分析フレーム３ｂの信号は、分析フレーム１ｂの信号にハニング窓を掛けてから周波数領域のフィルタ処理を実行した後に、時間領域に逆変換した信号を示す。分析フレーム３ｃの信号は、分析フレーム１ｃの信号にハニング窓を掛けてから周波数領域のフィルタ処理を実行した後に、時間領域に逆変換した信号を示す。図５に示すように、窓関数を掛けた分析フレーム３ａと分析フレーム３ｂとを加算すると、波形の端にギャップが存在しない。このため、分析フレームの端が連続に繋がる。また、分析フレーム３ａ及び分析フレーム３ｂと５０％ずつオーバラップする分析フレーム３ｃをさらに加算することにより、窓関数によって抑制された波形を補うことができる。なお、このように、隣り合う分析フレームがオーバラップするように加算する方法を「オーバラップ加算」と称し、オーバラップする割合を「オーバラップ割合」と称する。また、例えば、５０％のオーバラップ割合でオーバラップ加算することを「５０％オーバラップ加算」とも称する。 FIG. 5 shows a case where overlapping analysis frames are added. The signal of the analysis frame 3a in FIG. 5 indicates a signal that has been subjected to frequency domain filtering after the Hanning window is applied to the signal of the analysis frame 1a and then inversely transformed to the time domain. The signal of the analysis frame 3b indicates a signal that has been subjected to frequency domain filtering after the Hanning window is applied to the signal of the analysis frame 1b and then inversely transformed to the time domain. The signal of the analysis frame 3c indicates a signal that has been subjected to frequency domain filtering after the Hanning window is applied to the signal of the analysis frame 1c and then inversely transformed to the time domain. As shown in FIG. 5, when the analysis frame 3a multiplied by the window function and the analysis frame 3b are added, there is no gap at the end of the waveform. For this reason, the end of the analysis frame is continuously connected. Further, the waveform suppressed by the window function can be compensated by further adding the analysis frame 3c that overlaps the analysis frame 3a and the analysis frame 3b by 50%. Note that the method of adding so that adjacent analysis frames overlap in this way is referred to as “overlap addition”, and the overlapping ratio is referred to as “overlap ratio”. Further, for example, adding overlap at a 50% overlap ratio is also referred to as “50% overlap addition”.

例えば、窓関数処理部１３１は、所定のオーバラップ割合で音声信号に窓関数処理を実行する。例えば、窓関数処理部１３１は、（１−１／２^ｎ）×１００％のオーバラップ割合で音声信号にハニング窓を掛ける。なお、ｎは１以上の整数であり、例えば、フィルタ処理が実行されるごとに、１インクリメントされる。つまり、窓関数処理部１３１は、フィルタ処理が実行されるごとに、オーバラップ割合を５０％、７５％、８７．５％・・・と段階的に増やしながら音声信号に窓関数処理を実行する。なお、窓関数処理部１３１が窓関数処理に用いるオーバラップ割合は上記の例に限定されるものではない。例えば、窓関数処理部１３１は、１以上の整数ｎと任意のオーバラップ割合とを対応づけて記憶しておき、ｎに対応するオーバラップ割合を選択するようにしても良い。 For example, the window function processing unit 131 performs window function processing on the audio signal at a predetermined overlap rate. For example, the window function processing unit 131 multiplies the audio signal with a Hanning window at an overlap ratio of (1-1 / 2 ⁿ ) × 100%. Note that n is an integer equal to or greater than 1, and is incremented by 1 each time the filter process is executed, for example. That is, each time the filter process is executed, the window function processing unit 131 executes the window function process on the audio signal while gradually increasing the overlap ratio to 50%, 75%, 87.5%,. . Note that the overlap ratio used by the window function processing unit 131 for the window function processing is not limited to the above example. For example, the window function processing unit 131 may store an integer n of 1 or more and an arbitrary overlap ratio in association with each other, and may select an overlap ratio corresponding to n.

図１の説明に戻る。変換部１３２は、例えば、音声信号をフーリエ変換する。例えば、変換部１３２は、窓関数処理部１３１により出力された複数の分析フレームごとに音声信号をフーリエ変換する。例えば、変換部１３２は、図２に示すように、フーリエ変換した音声信号をＦＩＲフィルタ１３３に出力する。 Returning to the description of FIG. For example, the conversion unit 132 performs Fourier transform on the audio signal. For example, the conversion unit 132 performs a Fourier transform on the audio signal for each of the plurality of analysis frames output by the window function processing unit 131. For example, as illustrated in FIG. 2, the conversion unit 132 outputs an audio signal subjected to Fourier transform to the FIR filter 133.

ＦＩＲフィルタ１３３は、例えば、音声信号に対して、周波数領域のフィルタ処理を実行する。例えば、ＦＩＲフィルタ１３３は、変換部１３２によってフーリエ変換された音声信号を受け付ける。ＦＩＲフィルタ１３３は、算出部１２０により設定されたフィルタ計数Ｘ（ｓ）で、分析フレームごとにフィルタ処理を実行する。例えば、ＦＩＲフィルタ１３３は、図２に示すように、フィルタ処理を実行した音声信号を逆変換部１３４に出力する。なお、ここでは、フィルタ処理部１３０がマイク２０の位置で原音を再現するためのＦＩＲフィルタ１３３を用いる場合を説明したが、本発明はこれに限定されるものではない。例えば、フィルタ処理部１３０は、ＦＩＲフィルタ１３３に替えて、音声信号に含まれる雑音を抑制するためのローパスフィルタを用いても良い。 For example, the FIR filter 133 performs frequency domain filtering on the audio signal. For example, the FIR filter 133 receives the audio signal that has been Fourier-transformed by the conversion unit 132. The FIR filter 133 performs filter processing for each analysis frame with the filter count X (s) set by the calculation unit 120. For example, as shown in FIG. 2, the FIR filter 133 outputs the audio signal subjected to the filter process to the inverse conversion unit 134. Although the case where the filter processing unit 130 uses the FIR filter 133 for reproducing the original sound at the position of the microphone 20 has been described here, the present invention is not limited to this. For example, instead of the FIR filter 133, the filter processing unit 130 may use a low-pass filter for suppressing noise included in the audio signal.

逆変換部１３４は、例えば、音声信号を逆フーリエ変換する。例えば、逆変換部１３４は、ＦＩＲフィルタ１３３により出力された複数の分析フレームごとに音声信号を逆フーリエ変換する。例えば、逆変換部１３４は、図２に示すように、逆フーリエ変換した音声信号を加算部１３５に出力する。 For example, the inverse transform unit 134 performs inverse Fourier transform on the audio signal. For example, the inverse transform unit 134 performs inverse Fourier transform on the audio signal for each of the plurality of analysis frames output by the FIR filter 133. For example, as illustrated in FIG. 2, the inverse transform unit 134 outputs an audio signal obtained by performing the inverse Fourier transform to the adder unit 135.

加算部１３５は、例えば、時間領域の音声信号を加算する。例えば、加算部１３５は、逆変換部１３４により逆フーリエ変換された分析フレームごとの音声信号を受け付け、受け付けた音声信号をオーバラップ加算する。例えば、加算部１３５は、図２に示すように、加算した音声信号をスピーカ１０と、検出部１４１とに出力する。 The adding unit 135 adds, for example, audio signals in the time domain. For example, the adder 135 receives the audio signal for each analysis frame that has been subjected to inverse Fourier transform by the inverse transformer 134, and overlaps the received audio signal. For example, as illustrated in FIG. 2, the adding unit 135 outputs the added audio signal to the speaker 10 and the detecting unit 141.

このように、例えば、フィルタ処理部１３０は、音声信号に対して、（１−１／２^ｎ）×１００％のオーバラップ割合で窓関数処理を実行し、周波数領域のフィルタ処理を実行する。フィルタ処理部１３０は、フィルタ処理を実行した音声信号を出力する。なお、フィルタ処理部１３０により出力される音声信号を「フィルタ処理音」とも称する。すなわち、フィルタ処理部１３０は、フィルタ処理音をスピーカ１０と、検出部１４１とに出力する。 Thus, for example, the filter processing unit 130 performs window function processing on the audio signal at an overlap ratio of (1-1 / 2 ⁿ ) × 100%, and performs frequency domain filter processing. The filter processing unit 130 outputs the audio signal that has been subjected to the filter processing. The audio signal output by the filter processing unit 130 is also referred to as “filtered sound”. That is, the filter processing unit 130 outputs the filtered sound to the speaker 10 and the detection unit 141.

割合決定部１４０は、例えば、オーバラップ割合を決定する。例えば、割合決定部１４０は、フィルタ処理部１３０に設定するオーバラップ割合を決定する。例えば、割合決定部１４０は、検出部１４１と、設定部１４２とを有する。 For example, the ratio determining unit 140 determines an overlap ratio. For example, the ratio determining unit 140 determines the overlap ratio to be set in the filter processing unit 130. For example, the ratio determination unit 140 includes a detection unit 141 and a setting unit 142.

検出部１４１は、例えば、オーバラップ割合を増加させるごとに、フィルタ処理音と原音との相関をそれぞれ算出し、算出した相関に基づいて、フィルタ処理部１３０に設定するオーバラップ割合を検出する。例えば、検出部１４１は、算出した相関のうち、今回算出した相関と前回算出した相関との比率を算出し、算出した比率が閾値未満の場合に、前回算出した相関が算出された際の割合を検出する。例えば、検出部１４１は、図２に示すように、検出したオーバラップ割合を設定部１４２に出力する。 For example, each time the overlap ratio is increased, the detection unit 141 calculates the correlation between the filtered sound and the original sound, and detects the overlap ratio set in the filter processing unit 130 based on the calculated correlation. For example, the detection unit 141 calculates a ratio between the calculated correlation and the previously calculated correlation among the calculated correlations, and when the calculated ratio is less than the threshold, the ratio when the previously calculated correlation is calculated Is detected. For example, the detection unit 141 outputs the detected overlap ratio to the setting unit 142 as illustrated in FIG.

以下において、検出部１４１の処理を説明する。例えば、検出部１４１は、フィルタ処理部１３０により出力されたフィルタ処理音と、信号取得部１１０から出力された原音とを比較する。例えば、検出部１４１は、５０％オーバラップ加算によるフィルタ処理音と原音とを比較する。 Hereinafter, processing of the detection unit 141 will be described. For example, the detection unit 141 compares the filtered sound output from the filter processing unit 130 with the original sound output from the signal acquisition unit 110. For example, the detection unit 141 compares the filtered sound by the 50% overlap addition with the original sound.

図６は、５０％オーバラップ加算によるフィルタ処理音と原音とを比較する処理を説明するための図である。図６の横軸は時間領域における原音の波形の振幅を示し、縦軸は時間領域におけるフィルタ処理音の波形の振幅を示す。図６における各プロットは、原音の波形におけるサンプルの振幅値と、フィルタ処理音の波形において対応するサンプルの振幅値とをプロットしたものである。つまり、例えば、横軸の値が８０００であり、縦軸の値が８０２０であるプロットは、原音の波形における所定のサンプルの振幅値が、フィルタ処理によって８０００から８０２０にずれたことを示す。 FIG. 6 is a diagram for explaining a process of comparing the filtered sound and the original sound by 50% overlap addition. The horizontal axis of FIG. 6 shows the amplitude of the waveform of the original sound in the time domain, and the vertical axis shows the amplitude of the waveform of the filtered sound in the time domain. Each plot in FIG. 6 is a plot of the amplitude value of the sample in the waveform of the original sound and the amplitude value of the corresponding sample in the waveform of the filtered sound. That is, for example, a plot with a horizontal axis value of 8000 and a vertical axis value of 8020 indicates that the amplitude value of a predetermined sample in the waveform of the original sound has shifted from 8000 to 8020 by the filtering process.

例えば、検出部１４１は、図６に示すように、５０％オーバラップ加算によるフィルタ処理音と原音の類似度を比較する。例えば、検出部１４１は、下記の式（１）に基づいて、フィルタ処理音と原音との類似度として決定係数を算出する。なお、決定係数は、Ｒ^２とも表記する。この計算の場合、類似度として決定係数を用いたが、類似度を計算する手段としては他にもユークリッド距離などがあり、他の方法を用いても良い。 For example, as illustrated in FIG. 6, the detection unit 141 compares the similarity between the filtered sound and the original sound by 50% overlap addition. For example, the detection unit 141 calculates a determination coefficient as the similarity between the filtered sound and the original sound based on the following equation (1). Incidentally, the coefficient of determination is both ^{R 2} notation. In this calculation, the determination coefficient is used as the similarity, but other means for calculating the similarity include the Euclidean distance, and other methods may be used.

式（１）において、ｉは、サンプリング数を示す。ｙは、標本値、つまり、フィルタ処理音の振幅値を示す。ｙ_ｉは、ｉ番目のサンプルの標本値を示す。ｆ_ｉは、回帰方程式による推定値を示す。ここで、原音の振幅値をｘとすると、例えば、図６において、推定値ｆ及び決定係数Ｒ^２は、下記の通りである。
ｆ＝１．００３２０６６９ｘ＋０．０８１３４４４１３７
Ｒ^２＝０．９９４８７３０１８ In equation (1), i represents the number of samplings. y represents the sample value, that is, the amplitude value of the filtered sound. y _i indicates the sample value of the i-th sample. f _i indicates an estimated value based on a regression equation. Here, if the amplitude value of the original sound and x, for example, in FIG. 6, the estimated value f and the coefficient of determination R ² is as follows.
f = 1.000320669x + 0.081344144
R ² = 0.994887318

例えば、検出部１４１は、５０％オーバラップ加算によるフィルタ処理音と原音との決定係数Ｒ^２を算出することで、双方を比較する。なお、決定係数Ｒ^２は、フィルタ処理音の波形が原音の波形に類似しているほど、１に近づく値である。また、決定係数Ｒ^２は、「相関」の一例である。 For example, the detection unit 141, by calculating the coefficient of determination R ² and filtering sound and original sound by 50% overlap-add, compare both. Incidentally, the coefficient of determination R ² are as waveform filtering sound is similar to the waveform of original sound, which is a value close to 1. The coefficient of determination R ² is an example of a "correlation."

また、例えば、検出部１４１は、５０％オーバラップ加算によるフィルタ処理音と同様に、７５％オーバラップ加算及び８７．５％オーバラップ加算によるフィルタ処理音についても、原音と比較した場合の決定係数Ｒ^２を算出する。 Further, for example, the detection unit 141 determines the determination coefficient when compared with the original sound for the 75% overlap addition and the 87.5% overlap addition for the filter process sound by the 50% overlap addition. to calculate the R ^2.

図７は、７５％オーバラップ加算によるフィルタ処理音と原音とを比較する処理を説明するための図である。図７の説明は、図６の説明と同様であるので省略する。図７に示すように、例えば、検出部１４１は、上記の式（１）に基づいて、７５％オーバラップ加算によるフィルタ処理音と原音との決定係数Ｒ^２と、推定値ｆとを算出する。図７において、推定値ｆ及び決定係数Ｒ^２は、下記の通りである。
ｆ＝０．９９９５５３２３４ｘ＋０．３９３５６５２４０
Ｒ^２＝０．９９９２７９９００ FIG. 7 is a diagram for explaining processing for comparing the filtered sound and the original sound by 75% overlap addition. The description of FIG. 7 is the same as the description of FIG. As illustrated in FIG. 7, for example, the detection unit 141 calculates the determination coefficient R ² between the filtered sound and the original sound by 75% overlap addition and the estimated value f based on the above equation (1). . 7, the estimated value f and the coefficient of determination R ² is as follows.
f = 0.999553234x + 0.3935565240
R ² = 0.999279900

図８は、８７．５％オーバラップ加算によるフィルタ処理音と原音とを比較する処理を説明するための図である。図８の説明は、図６の説明と同様であるので省略する。図８に示すように、例えば、検出部１４１は、上記の式（１）に基づいて、８７．５％オーバラップ加算によるフィルタ処理音と原音との決定係数Ｒ^２と、推定値ｆとを算出する。図８において、推定値ｆ及び決定係数Ｒ^２は、下記の通りである。
ｆ＝０．９９９４８４３０７ｘ＋０．６３４１４７４０４
Ｒ^２＝０．９９９０９７１３７ FIG. 8 is a diagram for explaining processing for comparing the filtered sound and the original sound by 87.5% overlap addition. The description of FIG. 8 is the same as the description of FIG. As shown in FIG. 8, for example, the detection unit 141 obtains the determination coefficient R ² between the filtered sound and the original sound by the 87.5% overlap addition and the estimated value f based on the above equation (1). calculate. 8, the estimated value f and the coefficient of determination R ² is as follows.
f = 0.9994844307x + 0.634147404
R ² = 0.999097137

すなわち、検出部１４１は、ｎ＝１の場合には、５０％オーバラップ加算によるフィルタ処理音と原音との決定係数Ｒ^２＝０．９９４８７３０１８を算出する。また、検出部１４１は、ｎ＝２の場合には、７５％オーバラップ加算によるフィルタ処理音と原音との決定係数Ｒ^２＝０．９９９２７９９００を算出する。また、検出部１４１は、ｎ＝１の場合には、８７．５％オーバラップ加算によるフィルタ処理音と原音との決定係数Ｒ^２＝０．９９９０９７１３７を算出する。 That is, when n = 1, the detection unit 141 calculates the determination coefficient R ² = 0.9994873018 of the filtered sound and the original sound by 50% overlap addition. In addition, when n = 2, the detection unit 141 calculates a determination coefficient R ² = 0.9999279900 between the filtered sound and the original sound by 75% overlap addition. When n = 1, the detection unit 141 calculates a determination coefficient R ² = 0.9999097137 between the filtered sound and the original sound by 87.5% overlap addition.

また、例えば、検出部１４１は、ｎ−１の時の決定係数とｎの時の決定係数とを比較し、一定割合以上増加しているか否かを判定する。一定割合以上増加している場合には、検出部１４１は、フィルタ処理部１３０に設定されたｎを１インクリメントする。一方、一定割合以上増加していない場合には、検出部１４１は、ｎ−１の時のオーバラップ割合を設定部１４２に出力する。ここで、一定割合としては、例えば、０．１％が設定される。 Further, for example, the detection unit 141 compares the determination coefficient at the time of n−1 with the determination coefficient at the time of n, and determines whether or not it has increased by a certain percentage or more. If it has increased by a certain percentage or more, the detection unit 141 increments n set in the filter processing unit 130 by one. On the other hand, if it has not increased by a certain ratio or more, the detection unit 141 outputs the overlap ratio at the time of n−1 to the setting unit 142. Here, for example, 0.1% is set as the fixed ratio.

例えば、ｎ＝２の場合には、検出部１４１は、（０．９９９２７９９００−０．９９４８７３０１８）／０．９９４８７３０１８×１００＝０．４４３％を算出する。この値は０．１％以上であるので、検出部１４１は、フィルタ処理部１３０に設定されたｎを１インクリメントする。 For example, when n = 2, the detection unit 141 calculates (0.9999279900−0.994887318) /0.9994873018×100=0.443%. Since this value is 0.1% or more, the detection unit 141 increments n set in the filter processing unit 130 by one.

例えば、ｎ＝３の場合には、検出部１４１は、（０．９９９０９７１３７−０．９９９２７９９００）／０．９９９２７９９００×１００＝−０．０１８％を算出する。この値は０．１％未満であるので、検出部１４１は、ｎ＝２の時のオーバラップ割合「７５％」を設定部１４２に出力する。 For example, when n = 3, the detection unit 141 calculates (0.9999097137−0.9999279900) /0.9999279900×100=−0.018%. Since this value is less than 0.1%, the detection unit 141 outputs the overlap ratio “75%” when n = 2 to the setting unit 142.

ここで、検出部１４１が、決定係数が向上しなくなるオーバラップ割合を検知するのは、それ以上オーバラップ割合を増加させても、フィルタ処理にかかる計算量のみが増加してしまうからである。図９は、オーバラップ割合と決定係数との関係を説明するための図である。図９の横軸はオーバラップ割合［％］を示し、縦軸は決定係数を示す。図９は、図６から図８の決定係数をプロットした場合を示す。図９に示すように、オーバラップ割合を増加させると、隣り合う分析フレームの端が滑らかに繋がるので、決定係数、つまり、音質は向上する。しかし、オーバラップ割合が所定値を超えると、決定係数は向上しなくなる。これは、このときのオーバラップ割合によって隣り合う分析フレームの端が十分に滑らかに繋がっていることを示唆する。一方、オーバラップ割合を５０％、７５％、８７．５％・・・と増加させると、フィルタ処理の計算量は２倍、４倍、８倍・・・と増大してしまう。また、決定係数が向上しなくなるオーバラップ割合は、原音の音質やマイク２０周辺の環境に依存する。したがって、検出部１４１が検知したオーバラップ割合を窓関数処理に適用することで、フィルタ処理に係る計算量の増加を防ぐことができる。 Here, the reason why the detection unit 141 detects the overlap ratio at which the determination coefficient is not improved is that even if the overlap ratio is further increased, only the amount of calculation for the filter processing increases. FIG. 9 is a diagram for explaining the relationship between the overlap ratio and the determination coefficient. The horizontal axis in FIG. 9 indicates the overlap ratio [%], and the vertical axis indicates the determination coefficient. FIG. 9 shows a case where the determination coefficients of FIGS. 6 to 8 are plotted. As shown in FIG. 9, when the overlap ratio is increased, the ends of adjacent analysis frames are smoothly connected, so that the determination coefficient, that is, the sound quality is improved. However, when the overlap ratio exceeds a predetermined value, the determination coefficient is not improved. This suggests that the ends of adjacent analysis frames are sufficiently smoothly connected by the overlap ratio at this time. On the other hand, when the overlap ratio is increased to 50%, 75%, 87.5%,..., The calculation amount of the filter processing increases to 2 times, 4 times, 8 times,. The overlap ratio at which the coefficient of determination is not improved depends on the sound quality of the original sound and the environment around the microphone 20. Therefore, by applying the overlap ratio detected by the detection unit 141 to the window function process, it is possible to prevent an increase in the amount of calculation related to the filter process.

このように、検出部１４１は、オーバラップ割合を増加させるごとに、フィルタ処理音と原音との相関を算出する。検出部１４１は、算出した相関に基づいて、フィルタ処理部１３０に設定するオーバラップ割合を検出する。なお、ここでは、検出部１４１が今回算出した相関と前回算出した相関との比率を算出してオーバラップ割合を検出する方法を説明したが、本発明はこれに限定されるものではない。例えば、検出部１４１は、算出した相関が閾値以上になった場合のオーバラップ割合を検出しても良い。 As described above, the detection unit 141 calculates the correlation between the filtered sound and the original sound every time the overlap ratio is increased. The detection unit 141 detects the overlap ratio set in the filter processing unit 130 based on the calculated correlation. Here, the method of detecting the overlap ratio by calculating the ratio between the correlation calculated this time by the detection unit 141 and the previously calculated correlation has been described, but the present invention is not limited to this. For example, the detection unit 141 may detect an overlap ratio when the calculated correlation is equal to or greater than a threshold value.

図１の説明に戻る。設定部１４２は、例えば、検出部１４１により検出されたオーバラップ割合を、フィルタ処理部１３０に設定する。例えば、設定部１４２は、オーバラップ割合を検出部１４１から受け付けて、受け付けたオーバラップ割合を窓関数処理部１３１に設定する。図９に示す例では、設定部１４２は、ｎ＝２の時のオーバラップ割合「７５％」を検出部１４１から受け付けて、受け付けたオーバラップ割合を窓関数処理部１３１に設定する。例えば、設定部１４２は、図２に示すように、オーバラップ割合を設定するための情報を窓関数処理部１３１に出力する。 Returning to the description of FIG. For example, the setting unit 142 sets the overlap ratio detected by the detection unit 141 in the filter processing unit 130. For example, the setting unit 142 receives the overlap ratio from the detection unit 141 and sets the received overlap ratio in the window function processing unit 131. In the example illustrated in FIG. 9, the setting unit 142 receives the overlap ratio “75%” when n = 2 from the detection unit 141 and sets the received overlap ratio in the window function processing unit 131. For example, the setting unit 142 outputs information for setting the overlap ratio to the window function processing unit 131 as shown in FIG.

次に、実施例１に係る音声処理装置１００の処理手順について説明する。図１０は、実施例１に係る音声処理装置の処理手順を示すフローチャートである。図１０に示す処理は、例えば、音声処理装置１００において、電源から電力が供給される間に所定の間隔で実行される。 Next, a processing procedure of the speech processing apparatus 100 according to the first embodiment will be described. FIG. 10 is a flowchart illustrating the processing procedure of the speech processing apparatus according to the first embodiment. The process illustrated in FIG. 10 is executed at a predetermined interval while power is supplied from the power supply, for example, in the audio processing apparatus 100.

図１０に示すように、処理タイミングになると（ステップＳ１０１，Ｙｅｓ）、信号取得部１１０は、原音をスピーカ１０から出力させる（ステップＳ１０２）。なお、処理タイミングになるまでは（ステップＳ１０１，Ｎｏ）、図１０に示す処理は、待機状態である。 As shown in FIG. 10, when the processing timing comes (step S101, Yes), the signal acquisition unit 110 outputs the original sound from the speaker 10 (step S102). Until the processing timing comes (No in step S101), the processing illustrated in FIG. 10 is in a standby state.

マイク２０は、スピーカ１０から出力された原音を集音する（ステップＳ１０３）。算出部１２０は、ＦＩＲフィルタ１３３を算出する（ステップＳ１０４）。つまり、算出部１２０は、ＦＩＲフィルタ１３３に適用するフィルタ計数Ｘ（ｓ）を算出する。 The microphone 20 collects the original sound output from the speaker 10 (step S103). The calculation unit 120 calculates the FIR filter 133 (step S104). That is, the calculation unit 120 calculates the filter count X (s) applied to the FIR filter 133.

フィルタ処理部１３０は、ｎ＝１を設定する（ステップＳ１０５）。フィルタ処理部１３０は、（１−１／２^ｎ）×１００％のオーバラップ割合で窓関数処理を実行したフィルタ処理音を算出する（ステップＳ１０６）。 The filter processing unit 130 sets n = 1 (step S105). The filter processing unit 130 calculates a filter processing sound obtained by executing the window function processing at an overlap ratio of (1-1 / 2 ⁿ ) × 100% (Step S106).

割合決定部１４０は、フィルタ処理音と原音とを比較し、決定係数を算出する（ステップＳ１０７）。ｎ＝１ではない場合には（ステップＳ１０８，Ｎｏ）、割合決定部１４０は、ｎ−１の時の決定係数とｎの時の決定係数とを比較し、一定割合以上増加しているか否かを判定する（ステップＳ１０９）。 The ratio determining unit 140 compares the filtered sound with the original sound and calculates a determination coefficient (step S107). If n = 1 is not satisfied (step S108, No), the ratio determination unit 140 compares the determination coefficient at the time of n-1 with the determination coefficient at the time of n, and determines whether or not it has increased by a certain ratio or more. Is determined (step S109).

一定割合以上増加していない場合には（ステップＳ１０９，Ｎｏ）、割合決定部１４０は、ｎ−１の時のオーバラップ割合をフィルタ処理部１３０に設定する（ステップＳ１１１）。 If it has not increased by more than a certain ratio (No at Step S109), the ratio determining unit 140 sets the overlap ratio at the time of n-1 in the filter processing unit 130 (Step S111).

一方、一定割合以上増加している場合には（ステップＳ１０９，Ｙｅｓ）、割合決定部１４０は、フィルタ処理部１３０に設定されたｎを１インクリメントし（ステップＳ１１０）、ステップＳ１０６に移行する。 On the other hand, if it has increased by a certain ratio or more (step S109, Yes), the ratio determination unit 140 increments n set in the filter processing unit 130 by 1 (step S110), and proceeds to step S106.

一方、ｎ＝１である場合には（ステップＳ１０８，Ｙｅｓ）、割合決定部１４０は、フィルタ処理部１３０に設定されたｎを１インクリメントし（ステップＳ１１０）、ステップＳ１０６に移行する。 On the other hand, when n = 1 (step S108, Yes), the ratio determination unit 140 increments n set in the filter processing unit 130 by 1 (step S110), and proceeds to step S106.

次に、実施例１に係る音声処理装置１００の効果について説明する。音声処理装置１００は、入力信号に対して、所定の割合で分析フレームがオーバラップする窓関数処理を用いて周波数領域のフィルタ処理を実行する。音声処理装置１００は、分析フレームがオーバラップする割合を増加させるごとに、フィルタ処理が実行された後の信号と任意の信号との類似度をそれぞれ算出し、算出した類似度に基づいて、フィルタ処理部に設定する割合を検出する。このため、音声処理装置１００は、フィルタ処理の計算量を抑制することができる。例えば、音声処理装置１００は、検出した割合を設定した装置において、フィルタ処理の計算量を抑制することができる。例えば、音声処理装置１００は、原音が高音質である場合や雑音が混入しにくい環境下では、オーバラップ割合を増加させることで、音質を向上させることができる。また、例えば、音声処理装置１００は、原音が低音質である場合や雑音が混入しやすい環境下では、過度なオーバラップ割合の増加を抑制させることで、音質を向上させつつフィルタ処理にかかる計算量を抑制することができる。 Next, effects of the sound processing apparatus 100 according to the first embodiment will be described. The speech processing apparatus 100 performs frequency domain filtering on the input signal using window function processing in which analysis frames overlap at a predetermined rate. The speech processing apparatus 100 calculates the degree of similarity between the signal after the filter process is executed and an arbitrary signal each time the ratio of overlapping analysis frames is increased, and based on the calculated degree of similarity, the filter The ratio set in the processing unit is detected. For this reason, the speech processing apparatus 100 can suppress the calculation amount of the filter processing. For example, the speech processing apparatus 100 can suppress the calculation amount of the filter processing in an apparatus in which the detected ratio is set. For example, the sound processing apparatus 100 can improve the sound quality by increasing the overlap ratio when the original sound has high sound quality or in an environment in which noise is difficult to be mixed. In addition, for example, the speech processing apparatus 100 calculates the filter processing while improving the sound quality by suppressing an excessive increase in the overlap ratio when the original sound has low sound quality or in an environment where noise is likely to be mixed. The amount can be suppressed.

また、例えば、音声処理装置１００は、算出した相関のうち、今回算出した相関と前回算出した相関との比率を算出し、算出した比率が閾値未満の場合に、前回算出した相関が算出された際の割合を検出する。このため、音声処理装置１００は、検出した割合を設定した装置において、フィルタ処理の計算量を抑制することができる。 Further, for example, the speech processing apparatus 100 calculates the ratio between the calculated correlation and the previously calculated correlation among the calculated correlations, and when the calculated ratio is less than the threshold, the previously calculated correlation is calculated. Detect the percentage of the moment. For this reason, the speech processing apparatus 100 can suppress the calculation amount of the filter processing in the apparatus in which the detected ratio is set.

また、例えば、音声処理装置１００は、分析フレームがオーバラップする割合を増加させるごとに、フィルタ処理が実行された後の信号と原音との相関をそれぞれ算出し、算出した相関に基づいて、フィルタ処理部に設定する割合を検出する。このため、音声処理装置１００は、検出した割合を設定した装置において、原音に近い音を少ない計算量で得ることができる。 Further, for example, each time the analysis frame overlap ratio is increased, the speech processing apparatus 100 calculates the correlation between the signal after the filtering process and the original sound, and based on the calculated correlation, the filter The ratio set in the processing unit is detected. For this reason, the speech processing apparatus 100 can obtain a sound close to the original sound with a small amount of calculation in the apparatus in which the detected ratio is set.

また、例えば、音声処理装置１００は、検出したオーバラップ割合をフィルタ処理部に設定するので、ユーザが何度も視聴しながらトライアンドエラーを繰り返すことなくフィルタ処理の計算量を抑制することができる。 Further, for example, since the audio processing apparatus 100 sets the detected overlap ratio in the filter processing unit, it is possible to suppress the calculation amount of the filter processing without repeating the trial and error while the user views many times. .

また、例えば、音声処理装置１００は、フィルタ処理の計算量を抑制するので、計算量にかかる消費電力を抑制することができる。これは、例えば、携帯電話や携帯音楽プレーヤーなどのバッテリーで駆動する装置において特に有効である。 Further, for example, since the speech processing apparatus 100 suppresses the calculation amount of the filter processing, it is possible to suppress the power consumption related to the calculation amount. This is particularly effective in a device driven by a battery such as a cellular phone or a portable music player.

また、例えば、音声処理装置１００は、フィルタ処理の計算量を抑制するので、計算量にかかる装置の発熱を抑制することができる。これは、例えば、携帯電話や携帯音楽プレーヤーなど、ユーザが携帯する装置において特に有効である。 For example, since the speech processing apparatus 100 suppresses the calculation amount of the filter processing, it is possible to suppress the heat generation of the apparatus related to the calculation amount. This is particularly effective in devices carried by the user, such as mobile phones and portable music players.

実施例１では、フィルタ処理音を原音に近づける場合を説明した。しかし、本発明は、これに限定されるものではなく、例えば、フィルタ処理音を任意の信号に近づけることもできる。よって、実施例２では、音声処理装置がフィルタ処理音を時間領域のＦＩＲフィルタ処理を実行した音声信号に近づける場合を説明する。 In the first embodiment, the case where the filtered sound is brought close to the original sound has been described. However, the present invention is not limited to this, and for example, the filtered sound can be brought close to an arbitrary signal. Therefore, in the second embodiment, a case will be described in which the sound processing apparatus brings the filtered sound close to the sound signal that has been subjected to the time domain FIR filter processing.

実施例２に係る音声処理装置の機能構成の一例について説明する。図１１は、実施例２に係る音声処理装置の機能構成を示すブロック図である。図１１に示すように、この音声処理装置２００は、信号取得部１１０と、算出部１２０と、フィルタ処理部１３０と、割合決定部１４０と、ＦＩＲフィルタ２１０とを有する。このうち、図１１に示す信号取得部１１０、算出部１２０、フィルタ処理部１３０及び割合決定部１４０の説明は、図１に示した信号取得部１１０、算出部１２０、フィルタ処理部１３０及び割合決定部１４０の説明と同様であるので省略する。 An example of a functional configuration of the speech processing apparatus according to the second embodiment will be described. FIG. 11 is a block diagram illustrating a functional configuration of the speech processing apparatus according to the second embodiment. As shown in FIG. 11, the speech processing apparatus 200 includes a signal acquisition unit 110, a calculation unit 120, a filter processing unit 130, a ratio determination unit 140, and an FIR filter 210. Among these, the description of the signal acquisition unit 110, the calculation unit 120, the filter processing unit 130, and the ratio determination unit 140 illustrated in FIG. 11 is the same as that of the signal acquisition unit 110, the calculation unit 120, the filter processing unit 130, and the ratio determination illustrated in FIG. Since it is the same as the description of the unit 140, the description is omitted.

また、図１２は、実施例２に係る音声処理装置における信号の流れの一例を説明するための図である。図１２に示す各処理機能は、図１１に示した同一符号の各処理機能に対応する。なお、音声処理装置における信号の流れについては、音声処理装置２００の各処理機能とともに説明する。 FIG. 12 is a diagram for explaining an example of a signal flow in the sound processing apparatus according to the second embodiment. Each processing function shown in FIG. 12 corresponds to each processing function having the same reference numeral shown in FIG. The signal flow in the voice processing device will be described together with each processing function of the voice processing device 200.

ＦＩＲフィルタ２１０は、例えば、音声信号に対して、時間領域のＦＩＲフィルタ処理を実行する。例えば、ＦＩＲフィルタ２１０は、算出部１２０により設定されたフィルタ計数Ｘ（ｓ）で、フィルタ処理を実行する。例えば、ＦＩＲフィルタ２１０は、図１２に示すように、時間領域のＦＩＲフィルタ処理を実行した音声信号を検出部１４１に出力する。 For example, the FIR filter 210 performs time-domain FIR filter processing on an audio signal. For example, the FIR filter 210 performs a filter process with the filter count X (s) set by the calculation unit 120. For example, as shown in FIG. 12, the FIR filter 210 outputs an audio signal subjected to the time-domain FIR filter processing to the detection unit 141.

次に、実施例２に係る音声処理装置２００の処理手順について説明する。図１３は、実施例２に係る音声処理装置の処理手順を示すフローチャートである。図１３に示す処理は、例えば、音声処理装置２００において、電源から電力が供給される間に所定の間隔で実行される。 Next, the processing procedure of the speech processing apparatus 200 according to the second embodiment will be described. FIG. 13 is a flowchart illustrating the processing procedure of the sound processing apparatus according to the second embodiment. The process shown in FIG. 13 is executed at a predetermined interval while power is supplied from the power supply in the audio processing device 200, for example.

図１３に示すように、処理タイミングになると（ステップＳ２０１，Ｙｅｓ）、信号取得部１１０は、原音をスピーカ１０から出力させる（ステップＳ２０２）。なお、処理タイミングになるまでは（ステップＳ２０１，Ｎｏ）、図１３に示す処理は、待機状態である。 As shown in FIG. 13, when the processing timing comes (step S201, Yes), the signal acquisition unit 110 outputs the original sound from the speaker 10 (step S202). Until the processing timing comes (No in step S201), the processing illustrated in FIG. 13 is in a standby state.

マイク２０は、スピーカ１０から出力された原音を集音する（ステップＳ２０３）。算出部１２０は、ＦＩＲフィルタ１３３を算出する（ステップＳ２０４）。つまり、算出部１２０は、ＦＩＲフィルタ１３３に適用するフィルタ計数Ｘ（ｓ）を算出する。 The microphone 20 collects the original sound output from the speaker 10 (step S203). The calculation unit 120 calculates the FIR filter 133 (step S204). That is, the calculation unit 120 calculates the filter count X (s) applied to the FIR filter 133.

フィルタ処理部１３０は、ｎ＝１を設定する（ステップＳ２０５）。フィルタ処理部１３０は、（１−１／２^ｎ）×１００％のオーバラップ割合で窓関数処理を実行したフィルタ処理音を算出する（ステップＳ２０６）。 The filter processing unit 130 sets n = 1 (step S205). The filter processing unit 130 calculates a filter processing sound that has been subjected to the window function processing at an overlap ratio of (1-1 / 2 ⁿ ) × 100% (step S206).

割合決定部１４０は、ＦＩＲフィルタ２１０によって時間領域のＦＩＲフィルタ処理を実行した音声信号と、フィルタ処理音とを比較し、決定係数を算出する（ステップＳ２０７）。ｎ＝１ではない場合には（ステップＳ２０８，Ｎｏ）、割合決定部１４０は、ｎ−１の時の決定係数とｎの時の決定係数とを比較し、一定割合以上増加しているか否かを判定する（ステップＳ２０９）。 The ratio determining unit 140 compares the audio signal that has been subjected to the time-domain FIR filter processing by the FIR filter 210 with the filtered sound, and calculates a determination coefficient (step S207). If n = 1 is not satisfied (step S208, No), the ratio determination unit 140 compares the determination coefficient at the time of n-1 with the determination coefficient at the time of n, and determines whether or not it has increased by a certain ratio or more. Is determined (step S209).

一定割合以上増加していない場合には（ステップＳ２０９，Ｎｏ）、割合決定部１４０は、ｎ−１の時のオーバラップ割合をフィルタ処理部１３０に設定する（ステップＳ２１１）。 If it has not increased by more than a certain ratio (No at Step S209), the ratio determining unit 140 sets the overlap ratio at n-1 in the filter processing unit 130 (Step S211).

一方、一定割合以上増加している場合には（ステップＳ２０９，Ｙｅｓ）、割合決定部１４０は、フィルタ処理部１３０に設定されたｎを１インクリメントし（ステップＳ２１０）、ステップＳ２０６に移行する。 On the other hand, if it has increased by a certain percentage or more (step S209, Yes), the percentage determination unit 140 increments n set in the filter processing unit 130 by 1 (step S210), and proceeds to step S206.

一方、ｎ＝１である場合には（ステップＳ２０８，Ｙｅｓ）、割合決定部１４０は、フィルタ処理部１３０に設定されたｎを１インクリメントし（ステップＳ２１０）、ステップＳ２０６に移行する。 On the other hand, if n = 1 (step S208, Yes), the ratio determining unit 140 increments n set in the filter processing unit 130 by 1 (step S210), and proceeds to step S206.

次に、実施例２に係る音声処理装置２００の効果について説明する。音声処理装置２００は、入力信号に対して、所定の割合で分析フレームがオーバラップする窓関数処理を用いて周波数領域のフィルタ処理を実行する。音声処理装置２００は、分析フレームがオーバラップする割合を増加させるごとに、フィルタ処理が実行された後の信号と任意の信号との類似度をそれぞれ算出し、算出した類似度に基づいて、フィルタ処理部に設定する割合を検出する。このため、音声処理装置２００は、フィルタ処理した信号を任意の信号に近づけつつ、フィルタ処理の計算量を抑制することができる。例えば、音声処理装置２００は、フィルタ処理音を時間領域のＦＩＲフィルタ処理を実行した音声信号に近づけつつ、フィルタ処理の計算量を抑制することができる。例えば、音声処理装置２００は、フィルタ処理音を、聴衆者に聞かせたい理想の音に近づけつつ、フィルタ処理の計算量を抑制することができる。 Next, effects of the sound processing device 200 according to the second embodiment will be described. The audio processing device 200 executes frequency domain filtering using an input signal that uses window function processing in which analysis frames overlap at a predetermined rate. Each time the speech processing apparatus 200 increases the rate of overlap of analysis frames, the speech processing apparatus 200 calculates the similarity between the signal after the filter processing is executed and an arbitrary signal, and based on the calculated similarity, the filter The ratio set in the processing unit is detected. For this reason, the audio processing device 200 can suppress the calculation amount of the filter processing while bringing the filtered signal close to an arbitrary signal. For example, the audio processing device 200 can suppress the calculation amount of the filter processing while bringing the filtered sound close to the audio signal that has been subjected to the time-domain FIR filter processing. For example, the sound processing device 200 can suppress the amount of calculation of the filter processing while bringing the filtered sound close to the ideal sound that the audience wants to hear.

さて、これまで本発明の実施例について説明したが、本発明は上述した実施例以外にも、その他の実施例にて実施されても良い。そこで、以下では、その他の実施例について説明する。 Although the embodiments of the present invention have been described so far, the present invention may be implemented in other embodiments besides the above-described embodiments. Therefore, other embodiments will be described below.

例えば、実施例１及び実施例２において、検出部１４１は、ｎ−１の時の決定係数とｎの時の決定係数との比率が一定割合以上増加していない場合にオーバラップ割合を検出することとして説明した。しかし、本発明はこれに限定されるものではない。例えば、検出部１４１は、閾値を用いてオーバラップ割合を検出しても良い。 For example, in the first embodiment and the second embodiment, the detection unit 141 detects the overlap ratio when the ratio between the determination coefficient at n-1 and the determination coefficient at n has not increased by a certain ratio or more. Explained. However, the present invention is not limited to this. For example, the detection unit 141 may detect the overlap ratio using a threshold value.

例えば、検出部１４１は、ｎの時に算出した決定係数が閾値以上となった場合にオーバラップ割合を検出する。例えば、検出部１４１は、ｎの時に算出した決定係数が、閾値「０．９９９９４」以上となった場合に、ｎの時のオーバラップ割合を検出する。例えば、検出部１４１は、検出したオーバラップ割合を設定部１４２に出力する。なお、ここでは、閾値を０．９９９９４として説明するが、本発明はこれに限定されるものではなく、音声処理装置１００，２００を利用する者が任意の値に設定することができる。 For example, the detection unit 141 detects the overlap ratio when the determination coefficient calculated at n is equal to or greater than a threshold value. For example, when the determination coefficient calculated at n is equal to or greater than the threshold value “0.99994”, the detection unit 141 detects the overlap ratio at n. For example, the detection unit 141 outputs the detected overlap ratio to the setting unit 142. Here, the threshold value is described as 0.99994, but the present invention is not limited to this, and a person who uses the speech processing apparatuses 100 and 200 can set an arbitrary value.

ここで、検出部１４１が閾値を用いてオーバラップ割合を検出するのは、出力する音が低音質でも良く、音質を上げすぎても意味がない場合があるからである。例えば、出力する音がＦＭラジオと同程度の音質で良い場合には、決定係数の閾値として「０．９９９９４」を用いると良い。図１４は、決定係数と音質との関係について説明するための図である。図１４では、オーディオＣＤと同程度の音質である４４．１ｋサンプリング１６ｂｉｔデータの原音を、ＦＭラジオと同程度の音質である４４．１ｋサンプリング８ｂｉｔデータに変換し、決定係数を求めたものである。図１４の横軸は４４．１ｋサンプリング１６ｂｉｔデータの波形の振幅を示し、縦軸は４４．１ｋサンプリング８ｂｉｔデータの波形の振幅を示す。図１４の説明は、図６の説明と同様であるので省略する。図１４において、推定値ｆ及び決定係数Ｒ^２は、下記の通りである。
ｆ＝１．００００６４ｘ−０．９７７０４２
Ｒ^２＝０．９９９９４０ Here, the reason why the detection unit 141 detects the overlap ratio using the threshold value is that the sound to be output may be low sound quality, or it may be meaningless to improve the sound quality too much. For example, if the sound to be output has a sound quality comparable to that of FM radio, “0.99994” may be used as the threshold value of the determination coefficient. FIG. 14 is a diagram for explaining the relationship between the determination coefficient and the sound quality. In FIG. 14, the original sound of 44.1k sampling 16-bit data having the same sound quality as that of the audio CD is converted into 44.1k sampling 8bit data having the same sound quality as that of FM radio, and the determination coefficient is obtained. . The horizontal axis of FIG. 14 indicates the amplitude of the waveform of 44.1k sampling 16-bit data, and the vertical axis indicates the amplitude of the waveform of 44.1k sampling 8-bit data. The description of FIG. 14 is the same as the description of FIG. 14, the estimated value f and the coefficient of determination R ² is as follows.
f = 1.00060006x-0.977042
R ² = 0.9999940

つまり、出力する音がＦＭラジオと同程度の音質で良い場合には、決定係数は「０．９９９９４」以上であれば十分であると言える。このため、検出部１４１は、出力する音がＦＭラジオと同程度の音質で良い場合には、決定係数が「０．９９９９４」以上となった場合に、オーバラップ割合を検出することで、フィルタ処理の計算量を抑制することができる。 That is, when the output sound may have the same sound quality as FM radio, it can be said that it is sufficient that the determination coefficient is “0.99994” or more. For this reason, the detection unit 141 detects the overlap ratio when the sound to be output has a sound quality equivalent to that of the FM radio, and the determination coefficient becomes “0.99999” or more, thereby filtering the filter. The calculation amount of processing can be suppressed.

なお、上述したように、決定係数は任意に変更可能であるが、「０．９９９」以下は好ましくないと考えられる。図１５は、決定係数と音質との関係について説明するための図である。図１５では、オーディオＣＤの音質に雑音除去を目的として、１ｋＨｚの抑圧が０ｄＢ、１０ｋＨｚ以降の抑圧が−１００ｄＢとなるローパスフィルタを適用した場合を示す。図１５の横軸はオーディオＣＤの波形の振幅を示し、縦軸はローパスフィルタ適用後の波形の振幅を示す。図１５の説明は、図６の説明と同様であるので省略する。図１５において、推定値ｆ及び決定係数Ｒ^２は、下記の通りである。
ｆ＝０．９９６７５９３０ｘ＋０．００７２９９４２
Ｒ^２＝０．９９８９９０６０ As described above, the coefficient of determination can be arbitrarily changed, but it is considered that “0.999” or less is not preferable. FIG. 15 is a diagram for explaining the relationship between the determination coefficient and the sound quality. FIG. 15 shows a case where a low-pass filter in which suppression of 1 kHz is 0 dB and suppression after 10 kHz is −100 dB is applied to the sound quality of the audio CD for the purpose of noise removal. The horizontal axis in FIG. 15 indicates the amplitude of the waveform of the audio CD, and the vertical axis indicates the amplitude of the waveform after the low-pass filter is applied. The description of FIG. 15 is the same as the description of FIG. 15, the estimated value f and the coefficient of determination R ² is as follows.
f = 0.969675930x + 0.00729994
R ² = 0.999899060

図１５において、オーディオＣＤに含まれていたシンバルの音がほとんど聞こえなくなっていた。つまり、決定係数が「０．９９９」程度では、原音の音質を保てないと考えられる。 In FIG. 15, the cymbal sound included in the audio CD is almost inaudible. That is, it is considered that the sound quality of the original sound cannot be maintained when the determination coefficient is about “0.999”.

ここで、検出部１４１が閾値を用いてオーバラップ割合を検出する場合の処理手順について説明する。図１６は、実施例３に係る音声処理装置の処理手順を示すフローチャートである。図１６では、実施例１において説明した音声処理装置１００が閾値を用いてオーバラップ割合を検出する場合を説明する。図１６に示す処理は、例えば、音声処理装置１００において、電源から電力が供給される間に所定の間隔で実行される。 Here, a processing procedure in a case where the detection unit 141 detects an overlap ratio using a threshold value will be described. FIG. 16 is a flowchart illustrating the processing procedure of the speech processing apparatus according to the third embodiment. FIG. 16 illustrates a case where the speech processing apparatus 100 described in the first embodiment detects an overlap ratio using a threshold value. The process shown in FIG. 16 is executed at a predetermined interval while power is supplied from the power supply, for example, in the audio processing apparatus 100.

図１６に示すように、処理タイミングになると（ステップＳ３０１，Ｙｅｓ）、信号取得部１１０は、原音をスピーカ１０から出力させる（ステップＳ３０２）。なお、処理タイミングになるまでは（ステップＳ３０１，Ｎｏ）、図１６に示す処理は、待機状態である。 As shown in FIG. 16, when the processing timing comes (Yes in step S301), the signal acquisition unit 110 outputs the original sound from the speaker 10 (step S302). Until the processing timing comes (No in step S301), the processing illustrated in FIG. 16 is in a standby state.

マイク２０は、スピーカ１０から出力された原音を集音する（ステップＳ３０３）。算出部１２０は、ＦＩＲフィルタ１３３を算出する（ステップＳ３０４）。つまり、算出部１２０は、ＦＩＲフィルタ１３３に適用するフィルタ計数Ｘ（ｓ）を算出する。 The microphone 20 collects the original sound output from the speaker 10 (step S303). The calculation unit 120 calculates the FIR filter 133 (step S304). That is, the calculation unit 120 calculates the filter count X (s) applied to the FIR filter 133.

フィルタ処理部１３０は、ｎ＝１を設定する（ステップＳ３０５）。フィルタ処理部１３０は、（１−１／２^ｎ）×１００％のオーバラップ割合で窓関数処理を実行したフィルタ処理音を算出する（ステップＳ３０６）。 The filter processing unit 130 sets n = 1 (step S305). The filter processing unit 130 calculates a filter processing sound that has been subjected to the window function processing at an overlap ratio of (1-1 / 2 ⁿ ) × 100% (step S306).

割合決定部１４０は、フィルタ処理音と原音とを比較し、決定係数を算出する（ステップＳ３０７）。割合決定部１４０は、決定係数が閾値以上か否かを判定する（ステップＳ３０８）。決定係数が閾値以上でない場合には（ステップＳ３０８，Ｎｏ）、割合決定部１４０は、フィルタ処理部１３０に設定されたｎを１インクリメントし（ステップＳ３０９）、ステップＳ３０６に移行する。 The ratio determining unit 140 compares the filtered sound with the original sound and calculates a determination coefficient (step S307). The ratio determining unit 140 determines whether or not the determination coefficient is equal to or greater than a threshold value (step S308). If the determination coefficient is not equal to or greater than the threshold (No at Step S308), the ratio determination unit 140 increments n set in the filter processing unit 130 by 1 (Step S309), and proceeds to Step S306.

一方、決定係数が閾値以上である場合には（ステップＳ３０８，Ｙｅｓ）、割合決定部１４０は、オーバラップ割合をフィルタ処理部１３０に設定する（ステップＳ３１０）。 On the other hand, when the determination coefficient is greater than or equal to the threshold (step S308, Yes), the ratio determining unit 140 sets the overlap ratio in the filter processing unit 130 (step S310).

図１７は、実施例３に係る音声処理装置の処理手順を示すフローチャートである。図１７では、実施例２において説明した音声処理装置２００が閾値を用いてオーバラップ割合を検出する場合を説明する。図１７に示す処理は、例えば、音声処理装置２００において、電源から電力が供給される間に所定の間隔で実行される。 FIG. 17 is a flowchart illustrating the processing procedure of the speech processing apparatus according to the third embodiment. FIG. 17 illustrates a case where the speech processing apparatus 200 described in the second embodiment detects an overlap ratio using a threshold value. The process shown in FIG. 17 is executed at a predetermined interval while power is supplied from the power supply in the audio processing device 200, for example.

図１７に示すように、処理タイミングになると（ステップＳ４０１，Ｙｅｓ）、信号取得部１１０は、原音をスピーカ１０から出力させる（ステップＳ４０２）。なお、処理タイミングになるまでは（ステップＳ４０１，Ｎｏ）、図１７に示す処理は、待機状態である。 As shown in FIG. 17, when the processing timing comes (step S401, Yes), the signal acquisition unit 110 outputs the original sound from the speaker 10 (step S402). Until the processing timing comes (No in step S401), the processing illustrated in FIG. 17 is in a standby state.

マイク２０は、スピーカ１０から出力された原音を集音する（ステップＳ４０３）。算出部１２０は、ＦＩＲフィルタ１３３を算出する（ステップＳ４０４）。つまり、算出部１２０は、ＦＩＲフィルタ１３３に適用するフィルタ計数Ｘ（ｓ）を算出する。 The microphone 20 collects the original sound output from the speaker 10 (step S403). The calculation unit 120 calculates the FIR filter 133 (step S404). That is, the calculation unit 120 calculates the filter count X (s) applied to the FIR filter 133.

フィルタ処理部１３０は、ｎ＝１を設定する（ステップＳ４０５）。フィルタ処理部１３０は、（１−１／２^ｎ）×１００％のオーバラップ割合で窓関数処理を実行したフィルタ処理音を算出する（ステップＳ４０６）。 The filter processing unit 130 sets n = 1 (step S405). The filter processing unit 130 calculates a filter processing sound that has been subjected to the window function processing at an overlap ratio of (1-1 / 2 ⁿ ) × 100% (step S406).

割合決定部１４０は、ＦＩＲフィルタ２１０によって時間領域のＦＩＲフィルタ処理を実行した音声信号と、フィルタ処理音とを比較し、決定係数を算出する（ステップＳ４０７）。割合決定部１４０は、決定係数が閾値以上か否かを判定する（ステップＳ４０８）。決定係数が閾値以上でない場合には（ステップＳ４０８，Ｎｏ）、割合決定部１４０は、フィルタ処理部１３０に設定されたｎを１インクリメントし（ステップＳ４０９）、ステップＳ４０６に移行する。 The ratio determination unit 140 compares the sound signal that has been subjected to the time-domain FIR filter processing by the FIR filter 210 with the filter processing sound, and calculates a determination coefficient (step S407). The ratio determining unit 140 determines whether or not the determination coefficient is equal to or greater than a threshold (step S408). If the determination coefficient is not equal to or greater than the threshold (No at Step S408), the ratio determination unit 140 increments n set in the filter processing unit 130 by 1 (Step S409), and proceeds to Step S406.

一方、決定係数が閾値以上である場合には（ステップＳ４０８，Ｙｅｓ）、割合決定部１４０は、オーバラップ割合をフィルタ処理部１３０に設定する（ステップＳ４１０）。 On the other hand, when the determination coefficient is equal to or larger than the threshold (step S408, Yes), the ratio determining unit 140 sets the overlap ratio in the filter processing unit 130 (step S410).

また、例えば、実施例１及び実施例２において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うことができる。あるいは、各処理のうち、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上述文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Further, for example, among the processes described in the first and second embodiments, all or a part of the processes described as being automatically performed can be manually performed. Alternatively, all or part of the processes described as being manually performed among the processes can be automatically performed by a known method. In addition, the processing procedures, control procedures, specific names, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

また、図１，１１に示した音声処理装置１００，２００の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、音声処理装置１００，２００の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、音声処理装置１００は、設定部１４２を必ずしも有していなくても良い。例えば、音声処理装置１００が検出したオーバラップ割合を、他の装置に設定しても良い。 1 and 11 are functionally conceptual elements, and need not be physically configured as illustrated. That is, the specific form of distribution / integration of the speech processing apparatuses 100 and 200 is not limited to the illustrated one, and all or a part thereof can be functionally or physically processed in arbitrary units according to various loads and usage conditions. Can be distributed and integrated. For example, the voice processing apparatus 100 does not necessarily have the setting unit 142. For example, the overlap ratio detected by the voice processing device 100 may be set in another device.

また、音声処理装置１００，２００は、音声処理装置１００，２００の各機能を既知の情報処理装置に搭載することによって実現することもできる。既知の情報処理装置は、例えば、パーソナルコンピュータ、携帯電話、ＰＨＳ（Personal Handy-phone System）端末、移動体通信端末またはＰＤＡ（Personal Digital Assistant）などの装置に対応する。 The voice processing devices 100 and 200 can also be realized by mounting the functions of the voice processing devices 100 and 200 on a known information processing device. The known information processing apparatus corresponds to a device such as a personal computer, a mobile phone, a PHS (Personal Handy-phone System) terminal, a mobile communication terminal, or a PDA (Personal Digital Assistant).

図１８は、音声処理プログラムを実行するコンピュータの一例を示す図である。図１８に示すように、コンピュータ３００は、各種演算処理を実行するＣＰＵ３０１と、ユーザからデータの入力を受け付ける入力装置３０２と、モニタ３０３とを有する。また、コンピュータ３００は、記憶媒体からプログラム等を読み取る媒体読み取り装置３０４と、他の装置と接続するためのインターフェース装置３０５と、他の装置と無線により接続するための無線通信装置３０６とを有する。また、コンピュータ３００は、各種情報を一時記憶するＲＡＭ（Random Access Memory）３０７と、ハードディスク装置３０８とを有する。また、各装置３０１〜３０８は、バス３０９に接続される。また、図示しないが、コンピュータ３００は、マイク及びスピーカに接続される。 FIG. 18 is a diagram illustrating an example of a computer that executes a voice processing program. As illustrated in FIG. 18, the computer 300 includes a CPU 301 that executes various arithmetic processes, an input device 302 that receives data input from a user, and a monitor 303. The computer 300 also includes a medium reading device 304 that reads a program or the like from a storage medium, an interface device 305 for connecting to another device, and a wireless communication device 306 for connecting to another device wirelessly. The computer 300 also includes a RAM (Random Access Memory) 307 that temporarily stores various types of information and a hard disk device 308. Each device 301 to 308 is connected to a bus 309. Although not shown, the computer 300 is connected to a microphone and a speaker.

ハードディスク装置３０８には、図１，１１に示したフィルタ処理部１３０及び検出部１４１との各処理部と同様の機能を有する音声処理プログラムが記憶される。また、ハードディスク装置３０８には、音声処理プログラムを実現するための各種データが記憶される。 The hard disk device 308 stores an audio processing program having the same functions as the processing units of the filter processing unit 130 and the detection unit 141 shown in FIGS. The hard disk device 308 stores various data for realizing the voice processing program.

ＣＰＵ３０１は、ハードディスク装置３０８に記憶された各プログラムを読み出して、ＲＡＭ３０７に展開し、各種の処理を行う。また、これらのプログラムは、コンピュータを図１，９に示したフィルタ処理部１３０及び検出部１４１として機能させることができる。 The CPU 301 reads each program stored in the hard disk device 308, develops it in the RAM 307, and performs various processes. In addition, these programs can cause the computer to function as the filter processing unit 130 and the detection unit 141 illustrated in FIGS.

なお、上記の音声処理プログラムは、必ずしもハードディスク装置３０８に記憶されている必要はない。例えば、コンピュータが読み取り可能な記録媒体に記憶されたプログラムを、コンピュータ３００が読み出して実行するようにしても良い。コンピュータが読み取り可能な記録媒体は、例えば、ＣＤ−ＲＯＭやＤＶＤディスク、ＵＳＢメモリ等の可搬型記録媒体、フラッシュメモリ等の半導体メモリ、ハードディスクドライブ等が対応する。また、公衆回線、インターネット、ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等に接続された装置にこのプログラムを記憶させておき、コンピュータ３００がこれらからプログラムを読み出して実行するようにしても良い。 Note that the above-described voice processing program is not necessarily stored in the hard disk device 308. For example, the computer 300 may read and execute a program stored in a computer-readable recording medium. As the computer-readable recording medium, for example, a portable recording medium such as a CD-ROM, a DVD disk, and a USB memory, a semiconductor memory such as a flash memory, a hard disk drive, and the like are supported. Further, the program may be stored in a device connected to a public line, the Internet, a LAN (Local Area Network), a WAN (Wide Area Network), etc., and the computer 300 may read and execute the program therefrom. good.

以上の各実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）入力信号に対して、所定の割合で分析フレームがオーバラップする窓関数処理を用いて周波数領域のフィルタ処理を実行するフィルタ処理部と、
前記割合を増加させるごとに、前記フィルタ処理が実行された後の信号と任意の信号との類似度をそれぞれ算出し、算出した類似度に基づいて、前記フィルタ処理部に設定する割合を検出する検出部と
を備えたことを特徴とする音声処理装置。 (Supplementary Note 1) A filter processing unit that performs frequency domain filtering using window function processing in which analysis frames overlap at a predetermined rate with respect to an input signal;
Each time the ratio is increased, the degree of similarity between the signal after the filter processing is executed and an arbitrary signal is calculated, and the ratio set in the filter processing unit is detected based on the calculated degree of similarity. A speech processing apparatus comprising: a detection unit.

（付記２）前記検出部は、算出した類似度のうち、今回算出した類似度と前回算出した類似度との比率を算出し、算出した比率が閾値未満の場合に、当該前回算出した類似度が算出された際の前記割合を検出することを特徴とする付記１に記載の音声処理装置。 (Additional remark 2) The said detection part calculates the ratio of the similarity calculated this time and the similarity calculated last time among the calculated similarities, and when the calculated ratio is less than a threshold value, the similarity calculated the last time The speech processing apparatus according to appendix 1, wherein the ratio when the value is calculated is detected.

（付記３）前記検出部は、算出した類似度が閾値以上となった場合に、当該類似度が算出された際の前記割合を検出することを特徴とする付記１に記載の音声処理装置。 (Additional remark 3) The said detection part detects the said ratio when the said similarity is calculated, when the calculated similarity becomes more than a threshold value, The audio processing apparatus of Additional remark 1 characterized by the above-mentioned.

（付記４）前記検出部は、前記任意の信号として、前記入力信号を用いることを特徴とする付記１乃至３のいずれか一つに記載の音声処理装置。 (Supplementary note 4) The speech processing apparatus according to any one of supplementary notes 1 to 3, wherein the detection unit uses the input signal as the arbitrary signal.

（付記５）前記検出部は、前記任意の信号として、前記入力信号に対して時間領域のフィルタ処理が実行された後の信号を用いることを特徴とする付記１乃至３のいずれか一つに記載の音声処理装置。 (Additional remark 5) The said detection part uses the signal after the filter process of the time domain was performed with respect to the said input signal as said arbitrary signal, Any one of Additional remark 1 thru | or 3 characterized by the above-mentioned. The speech processing apparatus according to the description.

（付記６）前記検出部により検出された割合を、前記フィルタ処理部に設定する設定部を、さらに備えたことを特徴とする付記１乃至５のいずれか一つに記載の音声処理装置。 (Supplementary note 6) The speech processing apparatus according to any one of supplementary notes 1 to 5, further comprising a setting unit that sets the ratio detected by the detection unit in the filter processing unit.

（付記７）コンピュータによって実行される音声処理方法であって、
入力信号に対して、所定の割合で分析フレームがオーバラップする窓関数処理を用いて周波数領域のフィルタ処理を実行し、
前記割合を増加させるごとに、前記フィルタ処理が実行された後の信号と任意の信号との類似度をそれぞれ算出し、算出した類似度に基づいて、前記フィルタ処理に設定する割合を検出する
ことを特徴とする音声処理方法。 (Supplementary note 7) A voice processing method executed by a computer,
Performs frequency domain filtering using the window function processing where the analysis frames overlap with the input signal at a predetermined rate,
Each time the ratio is increased, the degree of similarity between the signal after the filter process is executed and an arbitrary signal is calculated, and the ratio set in the filter process is detected based on the calculated degree of similarity. A voice processing method characterized by the above.

（付記８）前記検出する処理は、算出した類似度のうち、今回算出した類似度と前回算出した類似度との比率を算出し、算出した比率が閾値未満の場合に、当該前回算出した類似度が算出された際の前記割合を検出することを特徴とする付記７に記載の音声処理方法。 (Additional remark 8) The said process to detect calculates the ratio of the similarity calculated this time and the similarity calculated last time among the calculated similarities, and when the calculated ratio is less than a threshold value, the similarity calculated the last time The voice processing method according to appendix 7, wherein the ratio when the degree is calculated is detected.

（付記９）前記検出する処理は、算出した類似度が閾値以上となった場合に、当該類似度が算出された際の前記割合を検出することを特徴とする付記７に記載の音声処理方法。 (Supplementary note 9) The voice processing method according to supplementary note 7, wherein, when the calculated similarity is equal to or greater than a threshold value, the detection processing detects the ratio when the similarity is calculated. .

（付記１０）前記検出する処理は、前記任意の信号として、前記入力信号を用いることを特徴とする付記７乃至９のいずれか一つに記載の音声処理方法。 (Supplementary note 10) The voice processing method according to any one of supplementary notes 7 to 9, wherein the detection process uses the input signal as the arbitrary signal.

（付記１１）前記検出する処理は、前記任意の信号として、前記入力信号に対して時間領域のフィルタ処理が実行された後の信号を用いることを特徴とする付記７乃至９のいずれか一つに記載の音声処理方法。 (Additional remark 11) The said process to detect uses the signal after the filter process of the time domain was performed with respect to the said input signal as said arbitrary signal, Any one of Additional remark 7 thru | or 9 characterized by the above-mentioned. The voice processing method described in 1.

（付記１２）前記検出する処理により検出された割合を、前記フィルタ処理に設定することを特徴とする付記７乃至１１のいずれか一つに記載の音声処理方法。 (Supplementary note 12) The voice processing method according to any one of supplementary notes 7 to 11, wherein the ratio detected by the detection process is set in the filter process.

（付記１３）コンピュータに、
入力信号に対して、所定の割合で分析フレームがオーバラップする窓関数処理を用いて周波数領域のフィルタ処理を実行し、
前記割合を増加させるごとに、前記フィルタ処理が実行された後の信号と任意の信号との類似度をそれぞれ算出し、算出した類似度に基づいて、前記フィルタ処理に設定する割合を検出する
各処理を実行させることを特徴とする音声処理プログラム。 (Supplementary note 13)
Performs frequency domain filtering using the window function processing where the analysis frames overlap with the input signal at a predetermined rate,
Each time the ratio is increased, the similarity between the signal after the filter process is executed and an arbitrary signal is calculated, and the ratio set in the filter process is detected based on the calculated similarity. A voice processing program for executing a process.

（付記１４）前記検出する処理は、算出した類似度のうち、今回算出した類似度と前回算出した類似度との比率を算出し、算出した比率が閾値未満の場合に、当該前回算出した類似度が算出された際の前記割合を検出することを特徴とする付記１３に記載の音声処理プログラム。 (Additional remark 14) The said process to detect calculates the ratio of the similarity calculated this time and the similarity calculated last time among the calculated similarities, and when the calculated ratio is less than a threshold value, the similarity calculated last time 14. The voice processing program according to appendix 13, wherein the ratio when the degree is calculated is detected.

（付記１５）前記検出する処理は、算出した類似度が閾値以上となった場合に、当該類似度が算出された際の前記割合を検出することを特徴とする付記１３に記載の音声処理プログラム。 (Supplementary note 15) The voice processing program according to supplementary note 13, wherein, when the calculated similarity is equal to or greater than a threshold, the processing to detect detects the ratio when the similarity is calculated. .

（付記１６）前記検出する処理は、前記任意の信号として、前記入力信号を用いることを特徴とする付記１３乃至１５のいずれか一つに記載の音声処理プログラム。 (Supplementary note 16) The sound processing program according to any one of supplementary notes 13 to 15, wherein the detection process uses the input signal as the arbitrary signal.

（付記１７）前記検出する処理は、前記任意の信号として、前記入力信号に対して時間領域のフィルタ処理が実行された後の信号を用いることを特徴とする付記１３乃至１５のいずれか一つに記載の音声処理プログラム。 (Supplementary note 17) Any one of Supplementary notes 13 to 15, wherein the detection process uses, as the arbitrary signal, a signal after a time domain filtering process is performed on the input signal. The voice processing program described in 1.

（付記１８）前記検出する処理により検出された割合を、前記フィルタ処理に設定することを特徴とする付記１３乃至１７のいずれか一つに記載の音声処理プログラム。 (Supplementary note 18) The audio processing program according to any one of supplementary notes 13 to 17, wherein the ratio detected by the detection processing is set in the filter processing.

１０スピーカ
２０マイク
１００，２００音声処理装置
１１０信号取得部
１２０算出部
１３０フィルタ処理部
１３１窓関数処理部
１３２変換部
１３３ＦＩＲフィルタ
１３４逆変換部
１３５加算部
１４０割合決定部
１４１検出部
１４２設定部
２１０ＦＩＲフィルタ DESCRIPTION OF SYMBOLS 10 Speaker 20 Microphone 100,200 Audio | voice processing apparatus 110 Signal acquisition part 120 Calculation part 130 Filter process part 131 Window function process part 132 Conversion part 133 FIR filter 134 Inverse conversion part 135 Addition part 140 Ratio determination part 141 Detection part 142 Setting part 210 FIR filter

Claims

A filter processing unit that performs frequency domain filtering using window function processing in which analysis frames overlap at a predetermined rate with respect to an input signal;
Each time the ratio is increased, the degree of similarity between the signal after the filter processing is executed and the signal based on the input signal is calculated, and set in the filter processing unit based on the calculated degree of similarity. An audio processing apparatus comprising: a detection unit that detects a ratio.

The detection unit calculates a ratio between the calculated similarity and the previously calculated similarity, and when the calculated ratio is less than a threshold, the previously calculated similarity is calculated. The voice processing apparatus according to claim 1, wherein the ratio is detected.

The speech processing apparatus according to claim 1, wherein when the calculated similarity is equal to or greater than a threshold, the detection unit detects the ratio when the similarity is calculated.

The audio processing apparatus according to claim 1, wherein the detection unit uses the input signal as a signal based on the input signal.

The said detection part uses the signal after the filter process of the time domain was performed with respect to the said input signal as a signal based on the said input signal, The Claim 1 characterized by the above-mentioned. The speech processing apparatus according to the description.

The speech processing apparatus according to claim 1, further comprising a setting unit configured to set a ratio detected by the detection unit in the filter processing unit.

An audio processing method executed by a computer,
Performs frequency domain filtering using the window function processing where the analysis frames overlap with the input signal at a predetermined rate,
Each time the ratio is increased, the degree of similarity between the signal after the filter process is executed and the signal based on the input signal is calculated, and the ratio is set for the filter process based on the calculated degree of similarity. A speech processing method characterized by detecting a signal.

On the computer,
Performs frequency domain filtering using the window function processing where the analysis frames overlap with the input signal at a predetermined rate,
Each time the ratio is increased, the degree of similarity between the signal after the filter process is executed and the signal based on the input signal is calculated, and the ratio is set for the filter process based on the calculated degree of similarity. A voice processing program for executing each process.