JP5251808B2

JP5251808B2 - Noise removal device

Info

Publication number: JP5251808B2
Application number: JP2009219436A
Authority: JP
Inventors: 利知金岡
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-09-24
Filing date: 2009-09-24
Publication date: 2013-07-31
Anticipated expiration: 2029-09-24
Also published as: JP2011069901A

Abstract

<P>PROBLEM TO BE SOLVED: To accurately remove noise from input voice. <P>SOLUTION: A removing part removes the component of a second voice from a first signal obtained from a first input part to which first voice is mainly input, by using a filter coefficient which is updated according to a second signal obtained from a second input part to which second voice is mainly input. A mixing detection part detects that the first voice component is mixed in the second signal, when it is determined that a correlation value for indicating correlation between the first signal and the second signal exceeds a predetermined threshold, and that the difference of an output level of the first signal from that of the second signal exceeds a predetermined threshold. A control part controls to stop updating of the filter coefficient, when it is detected that the first voice component is included in the second signal, by the mixing detection part. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

この発明は、雑音除去装置に関する。 The present invention relates to a noise removal device.

従来より、マイクロホンなどの音声入力装置で取得した音声信号の認識精度を向上させることを目的として、例えば、取得した音声信号から周辺の雑音に相当する信号を除去する雑音除去技術が存在する。なお、上記の雑音除去技術を実現する雑音除去装置は、ショッピングセンタなどの公共スペースにおいて音声コミュニケーションを使った対人サービスを行う対人サービス提供ロボットや、ハンズフリー電話などに適用される場合がある。 Conventionally, for the purpose of improving the recognition accuracy of an audio signal acquired by an audio input device such as a microphone, for example, there is a noise removal technique for removing a signal corresponding to ambient noise from the acquired audio signal. Note that a noise removal device that implements the above-described noise removal technology may be applied to a personal service providing robot that performs a personal service using voice communication in a public space such as a shopping center or a hands-free telephone.

従来の雑音除去装置は、音声取得用マイクロホン（MIC_S）および雑音取得用マイクロホン（MIC_N）を有する。そして、ハンズフリー電話に適用した従来の雑音除去装置は、雑音除去装置自体が有する音声再生スピーカを音源として発せられる受話音声、雑音除去装置を取り巻く環境内に流れているアナウンスやＢＧＭなどの環境音を雑音除去対象として処理する。 The conventional noise removal apparatus has a voice acquisition microphone (MIC_S) and a noise acquisition microphone (MIC_N). The conventional noise removing device applied to the hands-free telephone is a received voice that is emitted using a sound reproduction speaker included in the noise removing device itself as a sound source, and an environmental sound such as an announcement or BGM that flows in the environment surrounding the noise removing device. Are processed as noise removal targets.

例えば、従来の雑音除去装置は、音声取得用マイクロホンにより取得した音声から受話音声を除去する場合、音声再生出力信号をエコーキャンセル部にフィードバックすることで、音声入力信号から受話音声信号を除去するエコーキャンセル処理を行う。すなわち、従来の雑音除去装置は、エコーキャンセル用の適応フィルタを用いて、音声入力信号から雑音成分を差し引く処理を実行する。 For example, when a received noise is removed from a voice acquired by a voice acquisition microphone, a conventional noise removing device provides an echo that removes a received voice signal from a voice input signal by feeding back a voice reproduction output signal to an echo cancellation unit. Cancel processing is performed. That is, the conventional noise removal apparatus executes a process of subtracting the noise component from the voice input signal using an echo canceling adaptive filter.

また、従来の雑音除去装置は、音声取得用マイクロホンにより取得した音声から環境音を除去する場合、雑音取得用マイクロホン（MIC_N）により取得した環境音の信号をノイズキャンセル部に入力することで、音声入力信号から環境音の信号を除去するノイズキャンセル処理を行う。すなわち、従来の雑音除去装置は、ノイズキャンセル用の適応フィルタを用いて、音声入力信号から雑音成分を差し引く処理を実行する。 In addition, when removing the environmental sound from the sound acquired by the sound acquisition microphone, the conventional noise removal device inputs the sound of the environmental sound acquired by the noise acquisition microphone (MIC_N) to the noise cancellation unit. A noise cancellation process is performed to remove the environmental sound signal from the input signal. That is, the conventional noise removal apparatus performs a process of subtracting the noise component from the voice input signal using an adaptive filter for noise cancellation.

ところで、従来の雑音除去装置には、雑音取得用マイクロホンを使用する場合に、次の二つの課題があった。課題の一つ目は、音声取得用マイクロホンだけでなく、雑音取得用マイクロホンにも受話音声が入力されることにより、ノイズキャンセル用あるいはエコーキャンセル用の適応フィルタの係数推定が不安定になることである。課題の二つ目は、音声取得用マイクロホンに入力する音声が雑音取得用マイクロホンに回り込んでしまうことによって、ノイズキャンセル処理により音声入力信号を劣化させてしまうことである。 By the way, the conventional noise removal apparatus has the following two problems when a noise acquisition microphone is used. The first problem is that the coefficient estimation of the adaptive filter for noise cancellation or echo cancellation becomes unstable because the received voice is input not only to the voice acquisition microphone but also to the noise acquisition microphone. is there. The second problem is that the sound input signal is deteriorated by the noise cancellation processing when the sound input to the sound acquisition microphone wraps around the noise acquisition microphone.

そこで、上述した二つの課題に対処するために、受話音声の有無を検出する音声出力検出器、音声入力の有無を検出する音声入力検出器を用いて、適応フィルタの係数更新を制御する技術が提案されている。例えば、音声出力検出器にて受話音声のパワーが所定の閾値を超えると判定された場合に、受話音声の入力があったものと検出する。同様に、音声入力検出器にて音声入力のパワーが所定の閾値を超えると判定された場合に、音声入力があったものと検出する。そして、音声入力が検出されると、適応フィルタの係数更新を停止するように制御することで、音声入力信号の劣化を防止する。 Therefore, in order to cope with the above-described two problems, there is a technique for controlling the coefficient update of the adaptive filter using a voice output detector that detects the presence or absence of a received voice and a voice input detector that detects the presence or absence of a voice input. Proposed. For example, when it is determined by the audio output detector that the power of the received voice exceeds a predetermined threshold, it is detected that the received voice has been input. Similarly, when it is determined by the voice input detector that the power of voice input exceeds a predetermined threshold, it is detected that there is voice input. Then, when voice input is detected, control is performed so as to stop the coefficient update of the adaptive filter, thereby preventing deterioration of the voice input signal.

特開平６−１４１０１号公報JP-A-6-14101

上述した適応フィルタの更新を制御する技術では、受話音声の音量や音声入力の音量、雑音レベルの変化に合わせて、音声入力検出のために適正な閾値を設定する必要があり、検出精度が問題となる。つまり、受話音声の音量や音声入力の音量、雑音レベルは常に変化するものであるので、雑音除去装置を取り巻くあらゆる状況にも対応可能な適正な閾値を設定することは極めて困難である。このため、音声入力の検出精度が大きく低下し、結果として、音声入力信号の劣化が引き起こされる。 In the above-described technology for controlling the update of the adaptive filter, it is necessary to set an appropriate threshold value for detecting the voice input in accordance with the volume of the received voice, the volume of the voice input, and the change in the noise level. It becomes. That is, since the volume of the received voice, the volume of the voice input, and the noise level constantly change, it is extremely difficult to set an appropriate threshold value that can cope with any situation surrounding the noise removal apparatus. For this reason, the detection accuracy of the voice input is greatly lowered, and as a result, the voice input signal is deteriorated.

開示の技術は、上記に鑑みてなされたものであって、入力音声から精度よく雑音を除去することが可能な雑音除去装置を提供することを目的とする。 The disclosed technology has been made in view of the above, and an object of the present invention is to provide a noise removing device capable of accurately removing noise from input speech.

本願の開示する技術は、一つの態様において、主に第２の音声が入力される第２の入力部から取得した第２の信号に基づいて更新されるフィルタ係数を用いて、主に第１の音声が入力される第１の入力部から取得した第１の信号から前記第２の音声の成分を除去する除去部と、前記第１の信号と前記第２の信号との間の相関性を示す相関値を算出し、当該算出された相関値が所定の閾値を超えるか否かを判定するとともに、前記第１の信号の出力レベルと前記第２の信号の出力レベルとを比較して前記第１の信号の出力レベルと前記第２の信号の出力レベルとの差が所定の閾値を超えるか否かを判定し、前記相関値が所定の閾値を超えるものと判定し、かつ前記出力レベルの差が所定の閾値を超えるものと判定した場合には、前記第２の信号に前記第１の音声成分が混入している旨を検知する混入検知部と、前記混入検知部により前記第２の信号に前記第１の音声の成分が混入している旨が検知された場合に、前記フィルタ係数の更新を停止するように制御する制御部とを有する。 In one aspect, the technology disclosed in the present application mainly uses a filter coefficient that is updated based on a second signal acquired from a second input unit to which second audio is input. Between the first signal and the second signal, and a removal unit that removes the component of the second sound from the first signal acquired from the first input unit to which the sound is input And calculating whether or not the calculated correlation value exceeds a predetermined threshold, and comparing the output level of the first signal with the output level of the second signal Determining whether a difference between an output level of the first signal and an output level of the second signal exceeds a predetermined threshold; determining that the correlation value exceeds a predetermined threshold; and outputting the output When it is determined that the level difference exceeds a predetermined threshold, the second signal A mixing detection unit for detecting that the first audio component is mixed, and when the mixing detection unit detects that the first audio component is mixed in the second signal And a control unit that controls to stop the update of the filter coefficient.

本願の開示する技術の一つの態様によれば、入力音声から精度よく雑音を除去できる。 According to one aspect of the technology disclosed in the present application, noise can be accurately removed from input speech.

図１は、実施例１に係る雑音除去装置を示す図である。FIG. 1 is a diagram illustrating the noise removal device according to the first embodiment. 図２は、実施例２に係る構成を示す図である。FIG. 2 is a diagram illustrating a configuration according to the second embodiment. 図３は、実施例２に係る音声入力検出器の構成を示す図である。FIG. 3 is a diagram illustrating the configuration of the voice input detector according to the second embodiment. 図４は、実施例２に係る音声入力フラグの判定テーブル例を示す図である。FIG. 4 is a diagram illustrating an example of a voice input flag determination table according to the second embodiment. 図５は、実施例２に係るフィルタ係数更新の実行判定テーブル例を示す図である。FIG. 5 is a diagram illustrating an example of a filter coefficient update execution determination table according to the second embodiment. 図６は、実施例２に係るフィルタ係数更新の実行判定テーブル例を示す図である。FIG. 6 is a diagram illustrating an example of a filter coefficient update execution determination table according to the second embodiment. 図７は、実施例２に係る雑音除去装置による処理の流れを示す図である。FIG. 7 is a diagram illustrating a flow of processing by the noise removal device according to the second embodiment. 図８は、実施例２に係る雑音除去装置による処理の流れを示す図である。FIG. 8 is a diagram illustrating a process flow of the noise removal apparatus according to the second embodiment. 図９は、実施例２に係る雑音除去装置による処理の流れを示す図である。FIG. 9 is a diagram illustrating a flow of processing performed by the noise removal apparatus according to the second embodiment. 図１０は、実施例３に係る構成を示す図である。FIG. 10 is a diagram illustrating a configuration according to the third embodiment. 図１１は、実施例４に係る構成を示す図である。FIG. 11 is a diagram illustrating a configuration according to the fourth embodiment. 図１２は、実施例４に係る受話音声入力検出器の構成を示す図である。FIG. 12 is a diagram illustrating the configuration of the received voice input detector according to the fourth embodiment. 図１３は、実施例４に係る雑音除去装置による処理の流れを示す図である。FIG. 13 is a diagram illustrating a process flow of the noise removal apparatus according to the fourth embodiment. 図１４は、実施例４に係る雑音除去装置による処理の流れを示す図である。FIG. 14 is a diagram illustrating a flow of processing by the noise removal device according to the fourth embodiment. 図１５は、実施例４に係る雑音除去装置による処理の流れを示す図である。FIG. 15 is a diagram illustrating a process flow of the noise removal device according to the fourth embodiment. 図１６は、実施例５に係る構成を示す図である。FIG. 16 is a diagram illustrating a configuration according to the fifth embodiment. 図１７は、実施例６に係る構成を示す図である。FIG. 17 is a diagram illustrating a configuration according to the sixth embodiment. 図１８は、実施例６に係る雑音除去装置の動作を説明する図である。FIG. 18 is a diagram for explaining the operation of the noise removal device according to the sixth embodiment.

以下に、図面を参照しつつ、本願の開示する雑音除去装置の一実施形態について詳細に説明する。なお、以下では、本願の開示する雑音除去装置の一実施形態として後述する実施例により、本願が開示する技術が限定されるものではない。 Hereinafter, an embodiment of a noise removal device disclosed in the present application will be described in detail with reference to the drawings. In the following description, the technology disclosed in the present application is not limited by an example described later as an embodiment of the noise removal device disclosed in the present application.

図１は、実施例１に係る雑音除去装置を示す図である。同図に示すように、実施例１に係る雑音除去装置１は、除去部４、混入検知部５および制御部６を有する。 FIG. 1 is a diagram illustrating the noise removal device according to the first embodiment. As illustrated in FIG. 1, the noise removal device 1 according to the first embodiment includes a removal unit 4, a mixing detection unit 5, and a control unit 6.

そして、図１に示す第１の入力部２は、主に第１の音声が入力される。また、図１に示す第２の入力部３は、主に第２の音声が入力される。 And the 1st audio | voice is mainly input into the 1st input part 2 shown in FIG. The second input unit 3 shown in FIG. 1 mainly receives the second sound.

また、図１に示す除去部４は、第２の入力部３から取得した第２の信号に基づいて更新されるフィルタ係数を用いて、第１の入力部２から取得した第１の信号から第２の音声の成分を除去する。 Further, the removal unit 4 shown in FIG. 1 uses the filter coefficient updated based on the second signal acquired from the second input unit 3, from the first signal acquired from the first input unit 2. The second audio component is removed.

また、図１に示す混入検知部５は、第１の信号と第２の信号との間の相関性を示す相関値を算出し、当該算出された相関値が所定の閾値を超えるか否かを判定する。さらに、混入検知部５は、第１の信号の出力レベルと第２の信号の出力レベルとを比較し、第１の信号の出力レベルと第２の信号の出力レベルとの差が所定の閾値を超えるか否かを判定する。そして、混入検知部５は、相関値が所定の閾値を超えるものと判定し、かつ出力レベルの差が所定の閾値を超えるものと判定した場合には、第２の信号に第１の音声成分が混入している旨を検知する。 Further, the mixing detection unit 5 shown in FIG. 1 calculates a correlation value indicating the correlation between the first signal and the second signal, and whether or not the calculated correlation value exceeds a predetermined threshold value. Determine. Further, the mixing detection unit 5 compares the output level of the first signal with the output level of the second signal, and the difference between the output level of the first signal and the output level of the second signal is a predetermined threshold value. It is determined whether or not. When the contamination detection unit 5 determines that the correlation value exceeds a predetermined threshold and determines that the difference in output level exceeds the predetermined threshold, the first audio component is added to the second signal. Detects that is mixed.

また、図１に示す制御部６は、混入検知部５により第２の信号に第１の音声の成分が含まれている旨が検知された場合に、フィルタ係数の更新を停止するように制御する。 Further, the control unit 6 shown in FIG. 1 performs control so as to stop the update of the filter coefficient when the mixing detection unit 5 detects that the first signal component is included in the second signal. To do.

すなわち、実施例１に係る雑音除去装置は、第２の信号に第１の音声の成分が混入している場合には、フィルタ係数の更新を停止する。このとき、実施例１に係る雑音除去装置は、第１の信号と第２の信号との相関の度合いが高いかどうかを判定することにより、第２の信号に第１の音声の成分が混入しているか否かを判定する。したがって、第２の信号に第１の音声の成分が混入しているか否かを検知する精度を上げることができ、第２の信号に第１の音声の成分が混入している状態でのフィルタ係数の更新を回避できる。よって、結果的に、第１の音声から第２の音声の成分を精度よく除去できる。 That is, the noise removal apparatus according to the first embodiment stops updating the filter coefficient when the first sound component is mixed in the second signal. At this time, the noise removal apparatus according to the first embodiment determines whether or not the degree of correlation between the first signal and the second signal is high, so that the first audio component is mixed into the second signal. It is determined whether or not. Therefore, it is possible to improve the accuracy of detecting whether or not the first audio component is mixed in the second signal, and the filter in a state where the first audio component is mixed in the second signal. Coefficient update can be avoided. Therefore, as a result, the second sound component can be accurately removed from the first sound.

［雑音除去装置の構成（実施例２）］
図２は、実施例２に係る構成を示す図である。なお、以下では、実施例２に係る雑音除去装置を、ショッピングセンタなどの公共スペースにて音声コミュニケーションを使った対人サービスを行うサービス提供ロボットに適用した場合の一実施形態について説明する。 [Configuration of Noise Reduction Device (Example 2)]
FIG. 2 is a diagram illustrating a configuration according to the second embodiment. In the following, an embodiment in which the noise removal apparatus according to the second embodiment is applied to a service providing robot that performs an interpersonal service using voice communication in a public space such as a shopping center will be described.

図２に示すように、サービス提供ロボット１００は、音声取得用マイクロホン（MIC_S）１１０、雑音取得用マイクロホン（MIC_N）１２０および音声再生スピーカ１３０を有する。さらに、サービス提供ロボット１００は、同図に示すように、Ａ／Ｄ（アナログデジタル変換器）１４０ａ〜１４０ｃ、Ｄ／Ａ（デジタルアナログ変換器）１５０、音声認識部１６０およびロボットコントローラー１７０を有する。 As shown in FIG. 2, the service providing robot 100 includes a sound acquisition microphone (MIC_S) 110, a noise acquisition microphone (MIC_N) 120, and a sound reproduction speaker 130. Furthermore, the service providing robot 100 includes A / D (analog / digital converters) 140a to 140c, a D / A (digital / analog converter) 150, a voice recognition unit 160, and a robot controller 170, as shown in FIG.

音声取得用マイクロホン１１０は、主にサービス提供ロボット１００の利用者から発せられた発話音声の入力を受け付ける。雑音取得用マイクロホン１２０は、主にサービス提供ロボット１００を取り巻く環境内に流れているアナウンスやＢＧＭなど、利用者から発せられる発話音声以外の環境音の入力を受け付ける。音声再生スピーカ１３０は、サービス提供ロボット１００にて再生される音声を利用者に向けて出力する。 The voice acquisition microphone 110 accepts input of uttered voice mainly from a user of the service providing robot 100. The noise acquisition microphone 120 accepts input of environmental sounds other than uttered voices such as announcements and BGM that are mainly flowing in the environment surrounding the service providing robot 100. The audio reproduction speaker 130 outputs the audio reproduced by the service providing robot 100 to the user.

Ａ／Ｄ１４０ａは、音声取得用マイクロホン１１０を介して入力されるアナログの音声信号をデジタルの音声信号に変換し、雑音除去装置２００に出力する。Ａ／Ｄ１４０ｂは、雑音取得用マイクロホン１２０を介して入力されるアナログの雑音信号をデジタルの雑音信号に変換し、雑音除去装置２００に出力する。Ａ／Ｄ１４０ｃは、後述するＤ／Ａ１５０を介して入力されるアナログの再生音声信号をデジタルの再生音声信号に変換し、雑音除去装置２００に出力する。 The A / D 140 a converts an analog audio signal input via the audio acquisition microphone 110 into a digital audio signal and outputs the digital audio signal to the noise removal apparatus 200. The A / D 140 b converts an analog noise signal input via the noise acquisition microphone 120 into a digital noise signal and outputs the digital noise signal to the noise removal apparatus 200. The A / D 140 c converts an analog reproduced audio signal input via a D / A 150 described later into a digital reproduced audio signal, and outputs the digital reproduced audio signal to the noise removal apparatus 200.

音声認識部１６０は、雑音除去装置２００から出力される音声信号の認識処理を実行し、認識結果をロボットコントローラー１７０に送出する。例えば、音声認識部１６０は、公知の音声認識手法を用いて、利用者の発した音声に含まれる単語や文節などを抽出する。そして、音声認識部１６０は、抽出した単語や文節などを認識結果としてロボットコントローラー１７０に送出する。 The speech recognition unit 160 executes recognition processing of the speech signal output from the noise removal device 200 and sends the recognition result to the robot controller 170. For example, the voice recognition unit 160 extracts words or phrases included in the voice uttered by the user using a known voice recognition method. Then, the voice recognition unit 160 sends the extracted word or phrase to the robot controller 170 as a recognition result.

ロボットコントローラー１７０は、音声認識部１６０から送出された音声認識結果に応じてデジタルの再生音声信号を生成し、生成した再生音声信号をＤ／Ａ１５０に送出する。例えば、ロボットコントローラー１７０は、音声認識部１６０から送出された単語や文節に対応する応答（単語や文節）を特定し、特定した応答を再生する再生音声信号を生成する。また、ロボットコントローラー１７０は、再生音声信号をＤ／Ａ１５０に送出する場合に、サービス提供ロボット１００を音源とする音声が再生されることを示す音声再生フラグを後述する雑音除去装置２００（フィルタ係数推定器２２２，２３２）に出力する。例えば、ロボットコントローラー１７０は、音声再生状態にある場合には、“Ｔｒｕｅ（＝音声再生）”を音声再生フラグとして出力し、音声無再生状態にある場合には、“Ｆａｌｓｅ（＝音声無再生）”を音声再生フラグとして出力する。 The robot controller 170 generates a digital playback audio signal according to the voice recognition result sent from the voice recognition unit 160 and sends the generated playback voice signal to the D / A 150. For example, the robot controller 170 identifies a response (word or phrase) corresponding to a word or phrase sent from the speech recognition unit 160, and generates a playback audio signal that reproduces the identified response. In addition, when the robot controller 170 sends a playback sound signal to the D / A 150, the noise removal device 200 (filter coefficient estimation), which will be described later, indicates a sound playback flag indicating that the sound using the service providing robot 100 as a sound source is played back. Output to the units 222 and 232). For example, the robot controller 170 outputs “True (= sound playback)” as a sound playback flag when in the sound playback state, and “False (= no sound playback) when in the sound non-playback state. "Is output as an audio reproduction flag.

Ｄ／Ａ１５０は、後述するロボットコントローラー１７０から送出されたデジタルの再生音声信号をアナログの信号に変換し、音声再生スピーカ１３０に送出する。 The D / A 150 converts a digital playback audio signal sent from a robot controller 170 (to be described later) into an analog signal and sends it to the audio playback speaker 130.

雑音除去装置２００は、後述するノイズキャンセル部２２０およびエコーキャンセル部２３０により、Ａ／Ｄ１４０ａから出力された音声信号から雑音成分および再生音声成分を除去した音声信号を音声認識部１６０に出力する。雑音除去装置２００は、例えば、図２に示すように、音声入力検出器２１０、ノイズキャンセル部２２０およびエコーキャンセル部２３０を有する。 The noise removal apparatus 200 outputs to the voice recognition unit 160 a voice signal obtained by removing a noise component and a reproduced voice component from the voice signal output from the A / D 140a by a noise cancellation unit 220 and an echo cancellation unit 230 described later. The noise removal apparatus 200 includes, for example, a voice input detector 210, a noise cancellation unit 220, and an echo cancellation unit 230, as shown in FIG.

音声入力検出器２１０は、Ａ／Ｄ１４０ａから出力された音声信号およびＡ／Ｄ１４０ｂから出力された雑音信号を用いて、雑音信号に音声信号が混入しているか否かを検出する。そして、音声入力検出器２１０は、雑音信号に音声信号が混入しているか否かを示す音声入力フラグを、後述するノイズキャンセル部２２０（フィルタ係数推定器２２２）に送出する。 The voice input detector 210 uses the voice signal output from the A / D 140a and the noise signal output from the A / D 140b to detect whether the voice signal is mixed in the noise signal. Then, the voice input detector 210 sends a voice input flag indicating whether or not a voice signal is mixed in the noise signal to a noise cancellation unit 220 (filter coefficient estimator 222) described later.

図３は、実施例２に係る音声入力検出器の構成を示す図である。例えば、同図に示すように、音声入力検出器２１０は、ディレイタップ２１１ａ，２１１ｂと、フレーム分割処理部２１２ａ，２１２ｂと、相互相関検出器２１３と、信号レベル比較器２１４と、フラグ生成器２１５を有する。 FIG. 3 is a diagram illustrating the configuration of the voice input detector according to the second embodiment. For example, as shown in the figure, the audio input detector 210 includes delay taps 211a and 211b, frame division processing units 212a and 212b, a cross-correlation detector 213, a signal level comparator 214, and a flag generator 215. Have

ディレイタップ２１１ａ，２１１ｂは、既知の遅延（例えば、Ａ／Ｄにおける遅延差や伝送経路での遅延など）を調整する。例えば、ディレイタップ２１１ａは、Ａ／Ｄ１４０ａから出力された音声信号について、Ａ／Ｄ１４０ａにおける遅延差などを調整し、フレーム分割処理部２１２ａに送出する。また、例えば、ディレイタップ２１１ｂは、Ａ／Ｄ１４０ｂから出力された雑音信号について、Ａ／Ｄ１４０ｂにおける遅延差などを調整し、フレーム分割処理部２１２ｂに送出する。 The delay taps 211a and 211b adjust a known delay (for example, a delay difference in A / D or a delay in a transmission path). For example, the delay tap 211a adjusts a delay difference or the like in the A / D 140a for the audio signal output from the A / D 140a, and sends it to the frame division processing unit 212a. For example, the delay tap 211b adjusts the delay difference in the A / D 140b with respect to the noise signal output from the A / D 140b, and sends it to the frame division processing unit 212b.

フレーム分割処理部２１２ａ，２１２ｂは、ディレイタップ２１１ａ，２１１ｂから送出された信号を分割し、相互相関検出器２１３および信号レベル比較器２１４にそれぞれ送出する。例えば、フレーム分割処理部２１２ａは、ディレイタップ２１１ａから送出された音声信号を、数サンプル（例えば、５１２サンプル）を１フレームとして逐次分割する。なお、フレーム分割処理部２１２ａ，２１２ｂ以降の処理は、全てフレーム単位で処理する。 The frame division processing units 212a and 212b divide the signals sent from the delay taps 211a and 211b and send them to the cross-correlation detector 213 and the signal level comparator 214, respectively. For example, the frame division processing unit 212a sequentially divides the audio signal transmitted from the delay tap 211a using several samples (for example, 512 samples) as one frame. The processes after the frame division processing units 212a and 212b are all processed in units of frames.

そして、フレーム分割処理部２１２ａは、分割した各音声信号を相互相関検出器２１３および信号レベル比較器２１４にそれぞれ送出する。同様に、フレーム分割処理部２１２ｂは、例えば、ディレイタップ２１１ｂから送出された雑音信号を、数サンプル（例えば、５１２サンプル）を１フレームとして逐次分割する。そして、フレーム分割処理部２１２ｂは、分割した各音声信号を相互相関検出器２１３および信号レベル比較器２１４にそれぞれ送出する。 Then, the frame division processing unit 212a sends the divided audio signals to the cross correlation detector 213 and the signal level comparator 214, respectively. Similarly, the frame division processing unit 212b sequentially divides the noise signal transmitted from the delay tap 211b, for example, with several samples (for example, 512 samples) as one frame. Then, the frame division processing unit 212b sends the divided audio signals to the cross-correlation detector 213 and the signal level comparator 214, respectively.

相互相関検出器２１３は、フレーム分割処理部２１２ａから送出された音声信号と、フレーム分割処理部２１２ｂから送出された雑音信号との間の相関性を示す相互相関値を算出する。 The cross-correlation detector 213 calculates a cross-correlation value indicating a correlation between the audio signal transmitted from the frame division processing unit 212a and the noise signal transmitted from the frame division processing unit 212b.

例えば、相互相関検出器２１３は、音声信号と雑音信号とを所定の位相で重ね合わせ、重ね合わせた所定の位相を中心とする前後５０サンプルの範囲で音声信号と雑音信号との相互相関演算を行い、相互相関値を算出する。そして、相互相関検出器２１３は、算出した相互相関値が最大値となる相関最大位置情報を取得する。さらに、相互相関検出器２１３は、相互相関値の最大値が予め定めた閾値よりも大きいか否かを判定する。判定の結果、相互相関値の最大値が予め定めた閾値よりも大きい場合は、相互相関検出器２１３は、“Ｔｒｕｅ（＝相関有り）”を相関有無情報として後述するフラグ生成器２１５に送出する。これとは反対に、判定の結果、相互相関値の最大値が予め定めた閾値よりも小さい場合は、“Ｆａｌｓｅ（＝相関無し）”を相関有無情報として後述するフラグ生成器２１５に送出する。 For example, the cross-correlation detector 213 superimposes the audio signal and the noise signal with a predetermined phase, and performs a cross-correlation operation between the audio signal and the noise signal in a range of 50 samples before and after the superimposed predetermined phase. To calculate a cross-correlation value. Then, the cross-correlation detector 213 acquires the maximum correlation position information at which the calculated cross-correlation value becomes the maximum value. Further, the cross-correlation detector 213 determines whether or not the maximum value of the cross-correlation value is larger than a predetermined threshold value. As a result of the determination, if the maximum value of the cross-correlation value is larger than a predetermined threshold value, the cross-correlation detector 213 sends “True (= correlation)” to the flag generator 215 described later as correlation presence / absence information. . On the contrary, if the maximum cross-correlation value is smaller than a predetermined threshold as a result of the determination, “False (= no correlation)” is sent to the flag generator 215 described later as correlation presence / absence information.

信号レベル比較器２１４は、フレーム分割処理部２１２ａから送出される音声信号の信号レベル（例えば、電力値）とフレーム分割処理部２１２ｂから送出される雑音信号の信号レベル（例えば、電力値）とを比較する。例えば、図３に示すように、信号レベル比較器２１４は、二乗平均演算器２１４ａおよびパワー比較器２１４ｂを有する。 The signal level comparator 214 compares the signal level (eg, power value) of the audio signal sent from the frame division processing unit 212a and the signal level (eg, power value) of the noise signal sent from the frame division processing unit 212b. Compare. For example, as shown in FIG. 3, the signal level comparator 214 includes a root mean square calculator 214a and a power comparator 214b.

二乗平均演算器２１４ａは、フレーム分割処理部２１２ａから送出された音声信号の電圧値、およびフレーム分割処理部２１２ｂから送出された雑音信号の電圧値をそれぞれ二乗し、音声信号および雑音信号の電力値をそれぞれ算出する。そして、二乗平均演算器２１４ａは、音声信号および雑音信号の電力値を後述するパワー比較器２１４ｂに送出する。 The root mean square calculator 214a squares the voltage value of the audio signal sent from the frame division processing unit 212a and the voltage value of the noise signal sent from the frame division processing unit 212b, respectively, and the power value of the audio signal and noise signal Are calculated respectively. Then, the root mean square calculator 214a sends the power values of the audio signal and the noise signal to the power comparator 214b described later.

パワー比較器２１４ｂは、相互相関検出器２１３により取得された相関最大位置情報に対応する位相で、二乗平均演算器２１４ａから送出された音声信号および雑音信号を重ね合わせる。そして、パワー比較器２１４ｂは、相関最大位置情報に対応する位相で音声信号と雑音信号とを重ね合わせた時の、音声信号の電力値と雑音信号の電力値の平均値の差を算出し、算出した差が予め定めた閾値より大きいか否かを判定する。判定の結果、算出した差が予め定めた閾値より大きい場合には、パワー比較器２１４ｂは、“Ｔｒｕｅ（＝レベル差有り）”をレベル比較情報として後述するフラグ生成器２１５に送出する。これとは反対に、判定の結果、算出した差が予め定めた閾値以下である場合には、パワー比較器２１４ｂは、 “Ｆａｌｓｅ（＝レベル差無し）”をレベル比較情報として後述するフラグ生成器２１５に送出する。なお、平均値はフレーム単位で処理する。 The power comparator 214b superimposes the audio signal and the noise signal transmitted from the root mean square calculator 214a at a phase corresponding to the correlation maximum position information acquired by the cross correlation detector 213. Then, the power comparator 214b calculates a difference between an average value of the power value of the sound signal and the power value of the noise signal when the sound signal and the noise signal are superimposed with a phase corresponding to the correlation maximum position information, It is determined whether or not the calculated difference is greater than a predetermined threshold. As a result of the determination, if the calculated difference is larger than a predetermined threshold, the power comparator 214b sends “True (= with level difference)” to the flag generator 215 described later as level comparison information. On the other hand, if the calculated difference is equal to or smaller than a predetermined threshold as a result of the determination, the power comparator 214b uses “False (= no level difference)” as a flag generator to be described later as level comparison information. 215. The average value is processed in units of frames.

フラグ生成器２１５は、相互相関検出器２１３および信号レベル比較器２１４から送出される相関有無情報及びレベル比較情報の組合せと、生成する音声入力フラグの内容との対応関係を予め定義した判定テーブルを有する。そして、フラグ生成器２１５は、この判定テーブルに従って、雑音信号に音声信号が混入しているか否かを示す音声入力フラグを生成し、生成したフラグを後述するノイズキャンセル部２２０に送出する。なお、音声入力フラグは、分割したフレーム単位で出力し、例えば、１フレームを５１２サンプルで分割した場合は、５１２サンプルが同じフラグを出力する。 The flag generator 215 has a determination table that predefines the correspondence between the combination of correlation presence / absence information and level comparison information sent from the cross-correlation detector 213 and the signal level comparator 214 and the contents of the voice input flag to be generated. Have. Then, the flag generator 215 generates a sound input flag indicating whether or not a sound signal is mixed in the noise signal according to this determination table, and sends the generated flag to the noise canceling unit 220 described later. The audio input flag is output in units of divided frames. For example, when one frame is divided into 512 samples, the same flag is output for 512 samples.

図４は、実施例２に係る音声入力フラグの判定テーブル例を示す図である。同図に示すように、判定テーブルには、一番左の列の項目として相互相関検出器２１３から送出される相関有無情報（最大相互相関値＞しきい値）が定義される。例えば、相互相関検出器２１３から送出される相関有無情報が「Ｔｒｕｅ」である場合には、雑音信号と音声信号との間に相関があることを示し、相関有無情報が「Ｆａｌｓｅ」である場合には、雑音信号と音声信号との間に相関がないことを示す。 FIG. 4 is a diagram illustrating an example of a voice input flag determination table according to the second embodiment. As shown in the figure, the determination table defines correlation presence / absence information (maximum cross-correlation value> threshold value) sent from the cross-correlation detector 213 as an item in the leftmost column. For example, when the correlation presence / absence information transmitted from the cross-correlation detector 213 is “True”, it indicates that there is a correlation between the noise signal and the audio signal, and the correlation presence / absence information is “False”. Indicates that there is no correlation between the noise signal and the audio signal.

また、図４に示すように、判定テーブルには、真ん中の列の項目として信号レベル比較器２１４から送出されるレベル比較情報（二乗平均差＞しきい値）が定義される。例えば、信号レベル比較器２１４から送出されるレベル比較情報が「Ｔｒｕｅ」である場合には、雑音信号と音声信号との間にレベル差があることを示し、レベル比較情報が「Ｆａｌｓｅ」である場合には、雑音信号と音声信号との間にレベル差がないことを示す。 Also, as shown in FIG. 4, level comparison information (root mean square difference> threshold value) sent from the signal level comparator 214 is defined in the determination table as an item in the middle column. For example, when the level comparison information sent from the signal level comparator 214 is “True”, it indicates that there is a level difference between the noise signal and the audio signal, and the level comparison information is “False”. In this case, there is no level difference between the noise signal and the audio signal.

また、図４に示すように、判定テーブルには、一番右の列の項目として、フラグ生成器２１５が生成すべき音声入力フラグの種別が定義される。例えば、音声入力フラグが「Ｔｒｕｅ」である場合には、雑音信号に音声信号が混入していることを示し、音声入力フラグが「Ｆａｌｓｅ」である場合には、雑音信号に音声信号が混入していないことを示す。 As shown in FIG. 4, the type of the voice input flag to be generated by the flag generator 215 is defined in the determination table as an item in the rightmost column. For example, when the audio input flag is “True”, it indicates that the audio signal is mixed in the noise signal, and when the audio input flag is “False”, the audio signal is mixed in the noise signal. Indicates not.

そして、図４に示すように、判定テーブルには、相互相関検出器２１３および信号レベル比較器２１４から送出される相関有無情報及びレベル比較情報が共に「Ｔｒｕｅ」である場合にのみ、生成すべき音声入力フラグの種別を「Ｔｒｕｅ」とするように定義されている。 As shown in FIG. 4, the determination table should be generated only when the correlation presence / absence information and the level comparison information sent from the cross-correlation detector 213 and the signal level comparator 214 are both “True”. The type of the voice input flag is defined as “True”.

例えば、フラグ生成器２１５は、相互相関検出器２１３および信号レベル比較器２１４から送出される相関有無情報及びレベル比較情報が共に“Ｔｒｕｅ”である場合には、図４に示す判定テーブル最上段の行に定義された情報に従う。すなわち、フラグ生成器２１５は、雑音信号への音声信号の混入があることを示す“Ｔｒｕｅ”の音声入力フラグを生成する。 For example, when the correlation presence / absence information and the level comparison information sent from the cross-correlation detector 213 and the signal level comparator 214 are both “True”, the flag generator 215 has the highest level in the determination table shown in FIG. Follow the information defined in the row. That is, the flag generator 215 generates an audio input flag of “True” indicating that an audio signal is mixed into a noise signal.

また、例えば、フラグ生成器２１５は、相互相関検出器２１３および信号レベル比較器２１４から送出される相関有無情報及びレベル比較情報が共に“Ｆａｌｓｅ”である場合には、図４に示す判定テーブル４段目の行に定義された情報に従う。すなわち、フラグ生成器２１５は、雑音信号への音声信号の混入がないことを示す“Ｆａｌｓｅ”の音声入力フラグを生成する。 Further, for example, when the correlation presence / absence information and the level comparison information transmitted from the cross-correlation detector 213 and the signal level comparator 214 are both “False”, the flag generator 215 determines the determination table 4 shown in FIG. Follow the information defined in the second row. That is, the flag generator 215 generates a “False” audio input flag indicating that no audio signal is mixed into the noise signal.

また、例えば、フラグ生成器２１５は、相互相関検出器２１３および信号レベル比較器２１４から送出される相関有無情報及びレベル比較情報のいずれか一方が“Ｆａｌｓｅ”である場合には、図４に示す判定テーブル２，３段目の行に定義された情報に従う。すなわち、フラグ生成器２１５は、雑音信号への音声信号の混入がないことを示す“Ｆａｌｓｅ”の音声入力フラグを生成する。 Further, for example, when either one of the correlation presence information and the level comparison information sent from the cross correlation detector 213 and the signal level comparator 214 is “False”, the flag generator 215 is shown in FIG. According to the information defined in the decision table 2 and the third row. That is, the flag generator 215 generates a “False” audio input flag indicating that no audio signal is mixed into the noise signal.

ノイズキャンセル部２２０は、図２に示すように、ＦＩＲ（Finite impulse response）フィルタ２２１およびフィルタ係数推定器２２２を有する。 As shown in FIG. 2, the noise cancellation unit 220 includes an FIR (Finite impulse response) filter 221 and a filter coefficient estimator 222.

ＦＩＲフィルタ２２１は、フィルタ係数推定器２２２から送出されるノイズキャンセル（ＮＣ）用のフィルタ係数を用いて、Ａ／Ｄ１４０ａから出力される音声信号から雑音成分を除去する。なお、ノイズキャンセル（ＮＣ）用のフィルタ係数は、音声信号に含まれる雑音成分を「０」にするように、雑音信号を適応する場合の伝達関数の係数として用いる。 The FIR filter 221 uses the noise cancel (NC) filter coefficient sent from the filter coefficient estimator 222 to remove a noise component from the audio signal output from the A / D 140a. Note that the filter coefficient for noise cancellation (NC) is used as a coefficient of a transfer function when the noise signal is adapted so that the noise component included in the audio signal is “0”.

フィルタ係数推定器２２２は、Ａ／Ｄ１４０ｂから出力される雑音信号に基づいて、ノイズキャンセル（ＮＣ）用のフィルタ係数の更新を行い、更新したフィルタ係数をＦＩＲフィルタ２２１に送出する。また、フィルタ係数推定器２２２は、音声入力検出器２１０から送出される音声入力フラグ、およびロボットコントローラー１７０から送出される音声再生フラグに基づいて、ノイズキャンセル（ＮＣ）用のフィルタ係数の更新を制御する。 The filter coefficient estimator 222 updates the filter coefficient for noise cancellation (NC) based on the noise signal output from the A / D 140 b, and sends the updated filter coefficient to the FIR filter 221. Further, the filter coefficient estimator 222 controls the update of the filter coefficient for noise cancellation (NC) based on the voice input flag sent from the voice input detector 210 and the voice reproduction flag sent from the robot controller 170. To do.

例えば、フィルタ係数推定器２２２は、音声再生フラグに対応付けて、音声再生フラグの示す音声再生状態および音声再生状態に応じた対応動作を予め定義した実行判定テーブルを有する。そして、フィルタ係数推定器２２２は、この実行判定テーブルに従って、ノイズキャンセル用のフィルタ係数の更新を行う。 For example, the filter coefficient estimator 222 has an execution determination table in which a sound reproduction state indicated by the sound reproduction flag and a corresponding operation corresponding to the sound reproduction state are defined in advance in association with the sound reproduction flag. Then, the filter coefficient estimator 222 updates the filter coefficient for noise cancellation according to the execution determination table.

図５は、実施例２に係るフィルタ係数更新の実行判定テーブル例を示す図である。同図に示すように、実行判定テーブルには、一番左の列の項目として音声再生フラグの種別、真ん中の列の項目として音声再生フラグの示す音声再生の状態、一番右の列の項目としてフィルタ係数推定器２２２の対応動作が定義される。例えば、実行判定テーブルには、音声再生フラグ「Ｔｒｕｅ」に対応付けて、音声再生状態「音声再生」、対応動作「係数更新停止」が定義される。また、例えば、同図に示すように、実行判定テーブルには、音声再生フラグ「Ｆａｌｓｅ」に対応付けて、音声再生状態「音声無再生」、対応動作「係数更新」が定義される。なお、実行判定テーブルには、音声再生状態（「音声再生」、「音声無再生」）は、必ずしも定義されている必要はない。 FIG. 5 is a diagram illustrating an example of a filter coefficient update execution determination table according to the second embodiment. As shown in the figure, in the execution determination table, the type of the audio reproduction flag is the item in the leftmost column, the state of the audio reproduction indicated by the audio reproduction flag is the item in the middle column, and the item in the rightmost column The corresponding operation of the filter coefficient estimator 222 is defined as For example, in the execution determination table, an audio reproduction state “audio reproduction” and a corresponding operation “coefficient update stop” are defined in association with the audio reproduction flag “True”. Also, for example, as shown in the figure, in the execution determination table, a sound reproduction state “no sound reproduction” and a corresponding operation “coefficient update” are defined in association with the sound reproduction flag “False”. In the execution determination table, the audio playback state (“audio playback”, “no audio playback”) is not necessarily defined.

例えば、フィルタ係数推定器２２２は、フィルタ係数の更新を実行するタイミングで、音声再生フラグ“Ｔｒｕｅ”を入力済みである場合には、同図に示す実行判定テーブルに基づいて音声再生状態にあると判定する。そして、フィルタ係数推定器２２２は、同図に示す実行判定テーブルに従って、ノイズキャンセル（ＮＣ）用のフィルタ係数の更新停止を決定する。また、例えば、フィルタ係数推定器２２２は、フィルタ係数更新を実行するタイミングで、音声再生フラグ“Ｆａｌｓｅ”を入力済みである場合には、同図に示す実行判定テーブルに基づいて音声無再生状態にあると判定する。そして、フィルタ係数推定器２２２は、同図に示す実行判定テーブルに従って、ノイズキャンセル（ＮＣ）用のフィルタ係数の更新を行う。 For example, when the filter coefficient estimator 222 has already input the sound reproduction flag “True” at the timing of executing the update of the filter coefficient, the filter coefficient estimator 222 is in the sound reproduction state based on the execution determination table shown in FIG. judge. Then, the filter coefficient estimator 222 determines to stop updating the filter coefficient for noise cancellation (NC) according to the execution determination table shown in FIG. Also, for example, if the audio reproduction flag “False” has already been input at the timing of executing the filter coefficient update, the filter coefficient estimator 222 enters the no audio reproduction state based on the execution determination table shown in FIG. Judge that there is. Then, the filter coefficient estimator 222 updates the filter coefficient for noise cancellation (NC) according to the execution determination table shown in FIG.

また、フィルタ係数推定器２２２は、音声入力フラグに対応付けて、音声入力フラグの示す音声入力状態（雑音信号への音声信号の混入の有無）および音声入力状態に応じた対応動作を予め定義したもう一つの実行判定テーブルを有する。そして、フィルタ係数推定器２２２は、この実行判定テーブルに従って、ノイズキャンセル用のフィルタ係数の更新を行う。 In addition, the filter coefficient estimator 222 predefines the corresponding action according to the voice input state (whether the voice signal is mixed into the noise signal) and the voice input state indicated by the voice input flag in association with the voice input flag. Another execution determination table is provided. Then, the filter coefficient estimator 222 updates the filter coefficient for noise cancellation according to the execution determination table.

図６は、実施例２に係るフィルタ係数更新の実行判定テーブル例を示す図である。同図に示すように、実行判定テーブルには、一番左の列の項目として音声入力フラグの種別、真ん中の列の項目として雑音信号への音声信号の混入の有無を示す音声入力混入状態、一番右の列の項目としてフィルタ係数推定器２２２の対応動作が定義される。例えば、実行判定テーブルには、音声入力フラグ「Ｔｒｕｅ」に対応付けて、音声入力状態「音声入力混入有り」、対応動作「係数更新停止」が定義される。また、例えば、同図に示すように、実行判定テーブルには、音声入力フラグ「Ｆａｌｓｅ」に対応付けて、音声入力混入状態「音声入力混入なし」、対応動作「係数更新」が定義される。 FIG. 6 is a diagram illustrating an example of a filter coefficient update execution determination table according to the second embodiment. As shown in the figure, in the execution determination table, as the leftmost column item, the type of the voice input flag, the middle column item, the voice input mixed state indicating whether or not the voice signal is mixed into the noise signal, The corresponding operation of the filter coefficient estimator 222 is defined as an item in the rightmost column. For example, in the execution determination table, the voice input state “voice input mixed” and the corresponding action “coefficient update stop” are defined in association with the voice input flag “True”. Also, for example, as shown in the figure, in the execution determination table, a voice input mixed state “no voice input mixed” and a corresponding operation “coefficient update” are defined in association with the voice input flag “False”.

例えば、フィルタ係数推定器２２２は、フィルタ係数更新を実行するタイミングで、音声入力フラグ“Ｔｒｕｅ”を入力済みである場合には、同図に示す実行判定テーブルに基づいて、音声入力混入有り（雑音信号への音声信号の混入有）の状態にあると判定する。そして、フィルタ係数推定器２２２は、同図に示す実行判定テーブルに従って、ノイズキャンセル（ＮＣ）用のフィルタ係数の更新停止を決定する。また、例えば、フィルタ係数推定器２２２は、フィルタ係数更新を実行するタイミングで、音声入力フラグ“Ｆａｌｓｅ”を入力済みである場合には、同図に示す実行判定テーブルに基づいて、音声入力混入なし（雑音信号への音声信号の混入無）の状態にあると判定する。そして、フィルタ係数推定器２２２は、同図に示す実行判定テーブルに従って、ノイズキャンセル（ＮＣ）用のフィルタ係数の更新を行う。 For example, when the filter coefficient estimator 222 has already input the voice input flag “True” at the timing of executing the filter coefficient update, there is voice input mixing (noise) based on the execution determination table shown in FIG. It is determined that the audio signal is mixed into the signal. Then, the filter coefficient estimator 222 determines to stop updating the filter coefficient for noise cancellation (NC) according to the execution determination table shown in FIG. Also, for example, when the filter coefficient estimator 222 has already input the voice input flag “False” at the timing of executing the filter coefficient update, there is no voice input mixing based on the execution determination table shown in FIG. It is determined that the state is (no mixing of the audio signal into the noise signal). Then, the filter coefficient estimator 222 updates the filter coefficient for noise cancellation (NC) according to the execution determination table shown in FIG.

上記したフィルタ係数の更新についてまとめると、例えば、フィルタ係数推定器２２２は、フィルタ係数の更新を実行するタイミングで、少なくとも、音声再生フラグ“Ｔｒｕｅ”、音声入力フラグ“Ｔｒｕｅ”のいずれか一方を入力済みである場合には、フィルタ係数の更新を停止する。 Summarizing the update of the filter coefficient described above, for example, the filter coefficient estimator 222 inputs at least one of the audio reproduction flag “True” and the audio input flag “True” at the timing of executing the update of the filter coefficient. If it has been completed, the updating of the filter coefficient is stopped.

なお、フィルタ係数推定器２２２は、例えば、ノイズキャンセル（ＮＣ）用のフィルタ係数の更新を停止した場合には、前回使用したフィルタ係数をそのままＦＩＲフィルタ２２１に送出する。なお、フィルタ係数推定器２２２によりフィルタ係数の更新が停止されている間、ＦＩＲフィルタ２２１は、前回使用したフィルタ係数を用いて、音声信号から雑音成分を除去する。 Note that the filter coefficient estimator 222 sends the filter coefficient used last time to the FIR filter 221 as it is, for example, when updating of the filter coefficient for noise cancellation (NC) is stopped. Note that while the filter coefficient update is stopped by the filter coefficient estimator 222, the FIR filter 221 removes a noise component from the audio signal using the filter coefficient used last time.

エコーキャンセル部２３０は、図２に示すように、ＦＩＲ（Finite impulse response）フィルタ２３１およびフィルタ係数推定器２３２を有する。 As shown in FIG. 2, the echo cancellation unit 230 includes an FIR (Finite impulse response) filter 231 and a filter coefficient estimator 232.

ＦＩＲフィルタ２３１は、フィルタ係数推定器２３２から送出されるエコーキャンセル（ＥＣ）用のフィルタ係数を用いて、Ａ／Ｄ１４０ａから出力される音声信号から再生音声成分を除去する。なお、エコーキャンセル（ＥＣ）用のフィルタ係数は、音声信号に含まれる再生音声成分を「０」にするように、再生音声信号を適応する場合の伝達関数の係数として用いる。 The FIR filter 231 uses the echo cancel (EC) filter coefficient sent from the filter coefficient estimator 232 to remove the reproduced audio component from the audio signal output from the A / D 140a. Note that the filter coefficient for echo cancellation (EC) is used as a coefficient of a transfer function when the reproduced audio signal is adapted so that the reproduced audio component included in the audio signal is set to “0”.

フィルタ係数推定器２３２は、Ａ／Ｄ１４０ｃから出力される再生音声信号に基づいて、エコーキャンセル（ＥＣ）用のフィルタ係数の更新を行い、更新したフィルタ係数をＦＩＲフィルタ２３１に送出する。また、フィルタ係数推定器２３２は、ロボットコントローラー１７０から送出される音声再生フラグに基づいて、エコーキャンセル（ＥＣ）用のフィルタ係数の更新を制御する。なお、フィルタ係数推定器２３２は、図には示さないが、上述したフィルタ係数推定器２２２と同様の実行判定テーブル（例えば、図５参照）を有する。 The filter coefficient estimator 232 updates the filter coefficient for echo cancellation (EC) based on the reproduced audio signal output from the A / D 140c, and sends the updated filter coefficient to the FIR filter 231. Further, the filter coefficient estimator 232 controls the update of the filter coefficient for echo cancellation (EC) based on the sound reproduction flag sent from the robot controller 170. Note that the filter coefficient estimator 232 has an execution determination table (see, for example, FIG. 5) similar to the filter coefficient estimator 222 described above, although not shown in the figure.

例えば、フィルタ係数推定器２３２は、フィルタ係数の更新を実行するタイミングで、音声再生フラグ“Ｔｒｕｅ”を入力済みである場合には、音声再生状態にあると判定し、フィルタ係数の更新を行う。また、フィルタ係数推定器２３２は、フィルタ係数更新を実行するタイミングで、音声再生フラグ“Ｆａｌｓｅ”を入力済みである場合には、音声無再生状態にあると判定し、フィルタ係数の更新を停止する。 For example, if the audio reproduction flag “True” has already been input at the timing of executing the update of the filter coefficient, the filter coefficient estimator 232 determines that the audio reproduction state is in effect and updates the filter coefficient. Further, if the audio reproduction flag “False” has already been input at the timing of executing the filter coefficient update, the filter coefficient estimator 232 determines that there is no audio reproduction state and stops updating the filter coefficient. .

なお、フィルタ係数推定器２３２は、例えば、エコーキャンセル（ＥＣ）用のフィルタ係数の更新を停止した場合には、前回使用したフィルタ係数をそのままＦＩＲフィルタ２３１に送出する。なお、フィルタ係数推定器２３２によりフィルタ係数の更新が停止されている間、ＦＩＲフィルタ２３１は、前回使用したフィルタ係数を用いて、音声信号から再生音声成分を除去する。 Note that the filter coefficient estimator 232 sends the filter coefficient used last time to the FIR filter 231 as it is, for example, when updating of the filter coefficient for echo cancellation (EC) is stopped. Note that while the update of the filter coefficient is stopped by the filter coefficient estimator 232, the FIR filter 231 removes the reproduced audio component from the audio signal using the filter coefficient used last time.

なお、音声入力検出器２１０、ノイズキャンセル部２２０およびエコーキャンセル部２３０は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などの集積回路である。あるいは、音声入力検出器２１０、ノイズキャンセル部２２０およびエコーキャンセル部２３０は、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）などの電子回路であってもよい。 The voice input detector 210, the noise cancellation unit 220, and the echo cancellation unit 230 are integrated circuits such as an ASIC (Application Specific Integrated Circuit) and an FPGA (Field Programmable Gate Array). Alternatively, the audio input detector 210, the noise cancellation unit 220, and the echo cancellation unit 230 may be electronic circuits such as a CPU (Central Processing Unit) and an MPU (Micro Processing Unit).

［雑音除去装置による処理（実施例２）］
図７〜図９は、実施例２に係る雑音除去装置による処理の流れを示す図である。まず、図７を用いて、音声入力検出器２１０による処理の流れを説明する。同図に示すように、音声入力検出器２１０は、Ａ／Ｄ１４０ａから出力された音声信号およびＡ／Ｄ１４０ｂから出力された雑音信号を受け付けると（ステップＳ１肯定）、雑音信号に音声信号が混入しているか否かを検出する（ステップＳ２）。 [Processing by Noise Eliminator (Example 2)]
7 to 9 are diagrams illustrating the flow of processing by the noise removal device according to the second embodiment. First, the flow of processing by the voice input detector 210 will be described with reference to FIG. As shown in the figure, when the voice input detector 210 receives the voice signal output from the A / D 140a and the noise signal output from the A / D 140b (Yes in step S1), the voice signal is mixed into the noise signal. Is detected (step S2).

そして、音声入力検出器２１０は、雑音信号に音声信号が混入していると判定した場合には（ステップＳ２肯定）、雑音信号への音声信号の混入があることを示す音声入力フラグ“Ｔｒｕｅ”をノイズキャンセル部２２０に送出する（ステップＳ３）。一方、音声入力検出器２１０は、雑音信号に音声信号が混入していないと判定した場合には（ステップＳ２否定）、雑音信号への音声信号の混入がないことを示す音声入力フラグ“Ｆａｌｓｅ”をノイズキャンセル部２２０に送出する（ステップＳ４）。 If the sound input detector 210 determines that a sound signal is mixed in the noise signal (Yes in step S2), the sound input flag “True” indicating that the sound signal is mixed in the noise signal. Is sent to the noise canceling unit 220 (step S3). On the other hand, when the voice input detector 210 determines that the voice signal is not mixed in the noise signal (No in step S2), the voice input flag “False” indicating that the voice signal is not mixed in the noise signal. Is sent to the noise canceling unit 220 (step S4).

次に、図８を用いて、ノイズキャンセル部２２０による処理の流れを説明する。同図に示すように、フィルタ係数推定器２２２は、フィルタ係数の更新を実行するタイミングで、入力済みである音声入力フラグまたは音声再生フラグの少なくとも一方が“Ｔｒｕｅ”である場合には（ステップＳ１肯定）、次のように動作する。すなわち、フィルタ係数推定器２２２は、ノイズキャンセル（ＮＣ）用のフィルタ係数の更新を停止する（ステップＳ２）。そして、フィルタ係数推定器２２２は、前回使用したノイズキャンセル（ＮＣ）用のフィルタ係数をＦＩＲフィルタ２２１に送出する（ステップＳ３）。その結果、フィルタ係数推定器２２２によりフィルタ係数の更新が停止されている間、ＦＩＲフィルタ２２１は、前回使用したフィルタ係数を用いて、音声信号から雑音成分を除去する。 Next, the flow of processing by the noise cancellation unit 220 will be described with reference to FIG. As shown in the figure, the filter coefficient estimator 222 performs the update of the filter coefficient, and if at least one of the input audio input flag and the audio reproduction flag is “True” (step S1). Yes), it works as follows. That is, the filter coefficient estimator 222 stops updating the filter coefficient for noise cancellation (NC) (step S2). Then, the filter coefficient estimator 222 sends the previously used filter coefficient for noise cancellation (NC) to the FIR filter 221 (step S3). As a result, while the update of the filter coefficient is stopped by the filter coefficient estimator 222, the FIR filter 221 removes a noise component from the speech signal using the filter coefficient used last time.

ここで、ステップＳ１の説明に戻る。フィルタ係数推定器２２２は、フィルタ係数の更新を実行するタイミングで、入力済みである音声入力フラグまたは音声再生フラグの少なくとも一方が“Ｔｒｕｅ”ではない場合（双方がＦａｌｓｅである場合）には（ステップＳ１否定）、次のように動作する。すなわち、フィルタ係数推定器２２２は、ノイズキャンセル（ＮＣ）用のフィルタ係数を更新する（ステップＳ４）。そして、フィルタ係数推定器２２２は、更新後のノイズキャンセル（ＮＣ）用のフィルタ係数をＦＩＲフィルタ２２１に送出する（ステップＳ５）。その結果、ＦＩＲフィルタ２２１は、フィルタ係数推定器２２２により更新されたフィルタ係数を用いて、音声信号から雑音成分を除去する。 Here, the description returns to step S1. When the filter coefficient estimator 222 executes the update of the filter coefficient, and at least one of the input audio input flag and the audio reproduction flag is not “True” (when both are False) (step S <b> 3) S1 negative), the following operation is performed. That is, the filter coefficient estimator 222 updates the filter coefficient for noise cancellation (NC) (step S4). Then, the filter coefficient estimator 222 sends the updated filter coefficient for noise cancellation (NC) to the FIR filter 221 (step S5). As a result, the FIR filter 221 uses the filter coefficient updated by the filter coefficient estimator 222 to remove noise components from the audio signal.

続いて、図９を用いて、エコーキャンセル部２３０による処理の流れを説明する。同図に示すように、フィルタ係数推定器２３２は、フィルタ係数の更新を実行するタイミングで、入力済みである音声再生フラグが“Ｔｒｕｅ”の場合には（ステップＳ１肯定）、次のように動作する。すなわち、フィルタ係数推定器２３２は、エコーキャンセル（ＥＣ）用のフィルタ係数を更新する（ステップＳ２）。そして、フィルタ係数推定器２３２は、更新後のエコーキャンセル（ＥＣ）用のフィルタ係数をＦＩＲフィルタ２３１に送出する（ステップＳ３）。その結果、ＦＩＲフィルタ２３１は、フィルタ係数推定器２３２により更新されたフィルタ係数を用いて、音声信号から再生音声成分を除去する。 Next, the flow of processing by the echo cancellation unit 230 will be described with reference to FIG. As shown in the figure, the filter coefficient estimator 232 operates as follows when the input audio reproduction flag is “True” at the timing of executing the update of the filter coefficient (Yes in step S1). To do. That is, the filter coefficient estimator 232 updates the filter coefficient for echo cancellation (EC) (step S2). Then, the filter coefficient estimator 232 sends the updated filter coefficient for echo cancellation (EC) to the FIR filter 231 (step S3). As a result, the FIR filter 231 uses the filter coefficient updated by the filter coefficient estimator 232 to remove the reproduced audio component from the audio signal.

ここで、ステップＳ１の説明に戻る。フィルタ係数推定器２３２は、フィルタ係数の更新を実行するタイミングで、入力済みである音声再生フラグが“Ｔｒｕｅ”ではない場合（Ｆａｌｓｅである場合）には（ステップＳ１否定）、次のように動作する。すなわち、フィルタ係数推定器２３２は、エコーキャンセル（ＥＣ）用のフィルタ係数の更新を停止する（ステップＳ４）。そして、フィルタ係数推定器２３２は、前回使用したエコーキャンセル（ＥＣ）用のフィルタ係数をＦＩＲフィルタ２３１に送出する（ステップＳ５）。その結果、フィルタ係数推定器２３２によりフィルタ係数の更新が停止されている間、ＦＩＲフィルタ２３１は、前回使用したフィルタ係数を用いて、音声信号から再生音声成分を除去する。 Here, the description returns to step S1. The filter coefficient estimator 232 operates as follows when the input audio reproduction flag is not “True” at the timing of executing the update of the filter coefficient (No in Step S1). To do. That is, the filter coefficient estimator 232 stops updating the filter coefficient for echo cancellation (EC) (step S4). Then, the filter coefficient estimator 232 sends the previously used filter coefficient for echo cancellation (EC) to the FIR filter 231 (step S5). As a result, while the filter coefficient estimator 232 stops updating the filter coefficient, the FIR filter 231 removes the reproduced audio component from the audio signal using the filter coefficient used last time.

［実施例２による効果］
上述してきたように、実施例２によれば、雑音除去装置２００は、雑音信号に音声信号が混入しているか否かを判定して、混入していると判定した場合には音声入力フラグ“Ｔｒｕｅ”を生成する。そして、雑音除去装置２００は、ノイズキャンセル用のフィルタ係数更新のタイミングで、入力済みである音声再生フラグまたは音声入力フラグの少なくとも一方が“Ｔｒｕｅ”の場合には、フィルタ係数の更新を停止する。 [Effects of Example 2]
As described above, according to the second embodiment, the noise removal apparatus 200 determines whether or not a sound signal is mixed in the noise signal. If it is determined that the sound signal is mixed, the sound input flag “ True ”is generated. The noise removing apparatus 200 stops updating the filter coefficient when at least one of the input audio reproduction flag and audio input flag is “True” at the timing of updating the noise cancellation filter coefficient.

すなわち、雑音除去装置２００は、雑音信号に音声信号が混入している場合には、フィルタ係数の更新を停止する。このとき、雑音除去装置２００は、音声信号と雑音信号との相関の度合いが高いかどうかを判定することにより、雑音信号に音声信号が混入しているか否かを判定する。したがって、雑音信号に音声信号が混入しているか否かを検知する精度を上げることができ、雑音信号に音声信号が混入している状態でのフィルタ係数の更新を回避できる。よって、結果的に、音声信号から雑音音声の成分を精度よく除去できる。 That is, the noise removal apparatus 200 stops updating the filter coefficient when the audio signal is mixed in the noise signal. At this time, the noise removal apparatus 200 determines whether or not the audio signal is mixed in the noise signal by determining whether or not the degree of correlation between the audio signal and the noise signal is high. Therefore, it is possible to improve the accuracy of detecting whether or not a sound signal is mixed in the noise signal, and to avoid updating the filter coefficient in a state where the sound signal is mixed in the noise signal. Therefore, as a result, it is possible to accurately remove noise audio components from the audio signal.

また、雑音除去装置２００は、エコーキャンセル用のフィルタ係数更新のタイミングで、入力済みである音声再生フラグが“Ｔｒｕｅ”の場合には、フィルタ係数の更新を停止する。 In addition, when the audio reproduction flag that has already been input is “True” at the timing of updating the filter coefficient for echo cancellation, the noise removal apparatus 200 stops updating the filter coefficient.

すなわち、雑音除去装置２００は、音声再生が行われていない状態でのエコーキャンセル用のフィルタ係数の更新を回避できる。よって、結果的に、音声信号から再生音声成分を精度よく除去できる。 That is, the noise removal apparatus 200 can avoid updating the filter coefficient for echo cancellation in a state where audio reproduction is not performed. Therefore, as a result, the reproduced audio component can be accurately removed from the audio signal.

なお、上述してきた実施例２の雑音除去装置２００は、エコーキャンセルが必須の処理でなければ、エコーキャンセル部２３０を有する必要はなく、ノイズキャンセル部２２０のみを有すればよい。 Note that the noise removal apparatus 200 according to the second embodiment described above does not need to include the echo canceling unit 230 and need only include the noise canceling unit 220 unless echo cancellation is an essential process.

また、上記の実施例２において、音声取得用マイクロホン１１０に音声を入力することが可能な範囲に存在する人を検出した場合に、ロボットコントローラー１７０から音声入力フラグをノイズキャンセル部２２０に出力するようにしてもよい。 Further, in the above-described second embodiment, when a person who exists in a range where sound can be input to the sound acquisition microphone 110 is detected, a sound input flag is output from the robot controller 170 to the noise canceling unit 220. It may be.

図１０は、実施例３に係る構成を示す図である。同図に示すように、サービス提供ロボット１００は、音声取得用マイクロホン（MIC_S）１１０、雑音取得用マイクロホン（MIC_N）１２０および音声再生スピーカ１３０を有する。さらに、サービス提供ロボット１００は、同図に示すように、Ａ／Ｄ（アナログデジタル変換器）１４０ａ〜１４０ｃ、Ｄ／Ａ（デジタルアナログ変換器）１５０、音声認識部１６０およびロボットコントローラー１７０を有する。また、同図に示すように、雑音除去装置２００は、ノイズキャンセル部２２０およびエコーキャンセル部２３０を有する。 FIG. 10 is a diagram illustrating a configuration according to the third embodiment. As shown in the figure, the service providing robot 100 includes an audio acquisition microphone (MIC_S) 110, a noise acquisition microphone (MIC_N) 120, and an audio reproduction speaker 130. Furthermore, the service providing robot 100 includes A / D (analog / digital converters) 140a to 140c, a D / A (digital / analog converter) 150, a voice recognition unit 160, and a robot controller 170, as shown in FIG. As shown in the figure, the noise removal apparatus 200 includes a noise cancellation unit 220 and an echo cancellation unit 230.

ここで、雑音除去装置２００は、音声入力検出器２１０を有さない点が実施例２とは異なる。また、サービス提供ロボット１００が、音声取得用マイクロホン１１０に音声を入力することが可能な範囲に存在する人を検出する人検出部１８０を新たに有する点が実施例２とは異なる。 Here, the noise removing apparatus 200 is different from the second embodiment in that the voice input detector 210 is not included. The service providing robot 100 is different from the second embodiment in that the service providing robot 100 newly includes a person detection unit 180 that detects a person existing in a range in which voice can be input to the voice acquisition microphone 110.

人検出部１８０は、ビジョン（カメラ）を有し、音声取得用マイクロホン１１０に音声を入力することが可能な範囲に存在する人を検出する。例えば、人検出部１８０は、ビジョン（カメラ）を用いて、音声取得用マイクロホン１１０の指向方向の一定距離内（例えば、１００ｃｍ）に人を検出すると、音声入力フラグ“Ｔｒｕｅ”を出力すべき旨をロボットコントローラー１７０に送出する。なお、人検出部１８０は、エッジ抽出やパターンマッチングなどの既存技術を用いて、ビジョンにより撮影した画像データについて画像認識を実行し、音声取得用マイクロホン１１０に音声を入力することが可能な範囲に存在する人を検出する。 The person detection unit 180 has a vision (camera) and detects a person who exists in a range where sound can be input to the sound acquisition microphone 110. For example, if the person detection unit 180 detects a person within a certain distance (for example, 100 cm) in the direction of the sound acquisition microphone 110 using a vision (camera), the person detection unit 180 should output the sound input flag “True”. Is sent to the robot controller 170. It should be noted that the human detection unit 180 performs image recognition on the image data captured by the vision using existing techniques such as edge extraction and pattern matching, and enters a range in which sound can be input to the sound acquisition microphone 110. Detect existing people.

また、人検出部１８０は、ビジョン（カメラ）ではなく、音声取得用マイクロホンの指向方向に存在する物体を検出可能な位置に取り付けられた赤外線センサーや超音波センサーなどの距離測定センサー（距離センサー）を有することもできる。この場合に、人検出部１８０は、距離センサーを用いて、既定の距離以内に物体がある場合には、一定時間物体との距離を計測する。そして、人検出部１８０は、一定時間計測した物体との距離に変化がある場合には、音声入力フラグ“Ｔｒｕｅ”を出力すべき旨をロボットコントローラー１７０に送出する。一方、人検出部１８０は、一定時間計測した物体の距離が変わらない場合には、音声入力フラグ“Ｆａｌｓｅ” を出力すべき旨をロボットコントローラー１７０に送出する。なお、人検出部１８０は、上記したビジョン（カメラ）や距離センサーに限らず、リアルタイムに人検知する装置であれば、どのような装置でも用いることができ、それらの装置を単体あるいは組み合わせで用いることもできる。 The human detection unit 180 is not a vision (camera), but a distance measurement sensor (distance sensor) such as an infrared sensor or an ultrasonic sensor attached at a position where an object existing in the direction of the sound acquisition microphone can be detected. Can also be included. In this case, the person detection unit 180 uses a distance sensor to measure the distance from the object for a certain period of time when the object is within a predetermined distance. Then, when there is a change in the distance from the object measured for a certain period of time, the human detection unit 180 sends to the robot controller 170 that the voice input flag “True” should be output. On the other hand, when the distance of the object measured for a certain period of time does not change, the human detection unit 180 sends to the robot controller 170 that the voice input flag “False” should be output. The human detection unit 180 is not limited to the above-described vision (camera) and distance sensor, and any device can be used as long as it detects a person in real time, and these devices are used alone or in combination. You can also

ロボットコントローラー１７０は、人検出部１８０から音声入力フラグを出力すべき旨を入力すると、ノイズキャンセル部２２０に音声入力フラグを出力する。 When the robot controller 170 inputs that the voice input flag should be output from the human detection unit 180, the robot controller 170 outputs the voice input flag to the noise cancellation unit 220.

ノイズキャンセル部２２０は、ロボットコントローラー１７０から音声入力フラグを入力すると、上述した実施例２と同様に動作する。すなわち、フィルタ係数推定器２２２は、ノイズキャンセル用のフィルタ係数更新のタイミングで、ロボットコントローラー１７０から音声入力フラグを入力済みである場合には、ノイズキャンセル用のフィルタ係数の更新を停止する。 When the voice cancel flag is input from the robot controller 170, the noise cancellation unit 220 operates in the same manner as in the second embodiment. That is, the filter coefficient estimator 222 stops updating the filter coefficient for noise cancellation when the voice input flag has already been input from the robot controller 170 at the timing of updating the filter coefficient for noise cancellation.

また、上記の実施例２で説明した雑音除去装置２００をハンズフリー電話３００に同様に適用することができる。 Further, the noise removal apparatus 200 described in the second embodiment can be similarly applied to the hands-free telephone 300.

［雑音除去装置の構成（実施例４）］
図１１は、実施例４に係る構成を示す図である。同図に示すように、ハンズフリー電話３００は、音声取得用マイクロホン（MIC_S）３１０、雑音取得用マイクロホン（MIC_N）３２０および音声再生スピーカ３３０を有する。さらに、ハンズフリー電話３００は、同図に示すように、雑音除去装置２００、Ａ／Ｄ（アナログデジタル変換器）３４０ａ，３４０ｂ、Ｄ／Ａ（デジタルアナログ変換器）３５０を有する。また、同図に示すように、雑音除去装置２００は、音声入力検出器２１０、ノイズキャンセル部２２０、エコーキャンセル部２３０および受話音声入力検出器２４０を有する。 [Configuration of Noise Reduction Device (Example 4)]
FIG. 11 is a diagram illustrating a configuration according to the fourth embodiment. As shown in the figure, the hands-free telephone 300 includes a voice acquisition microphone (MIC_S) 310, a noise acquisition microphone (MIC_N) 320, and a voice reproduction speaker 330. Furthermore, the hands-free telephone 300 includes a noise removing device 200, A / D (analog / digital converters) 340a and 340b, and a D / A (digital / analog converter) 350, as shown in FIG. As shown in the figure, the noise removal apparatus 200 includes a voice input detector 210, a noise cancellation unit 220, an echo cancellation unit 230, and a received voice input detector 240.

ここで、雑音除去装置２００は、図１１に示すように、受話音声入力検出器２４０を有する点が、実施例２に係る雑音除去装置２００とは異なる。 Here, as shown in FIG. 11, the noise removal apparatus 200 is different from the noise removal apparatus 200 according to the second embodiment in that it includes a received voice input detector 240.

受話音声入力検出器２４０は、音声再生スピーカ３３０を介して出力された遠端話者信号に対応する受話音声が、音声信号に混入しているか否かを検出する。なお、受話音声入力検出器２４０は、実施例２で説明した音声入力検出器２１０の処理と同様の処理を行って、音声信号に受話音声が混入しているか否かを検出する。 The received voice input detector 240 detects whether or not the received voice corresponding to the far-end speaker signal output via the voice reproduction speaker 330 is mixed in the voice signal. The received voice input detector 240 performs the same process as the process performed by the voice input detector 210 described in the second embodiment, and detects whether the received voice is mixed in the voice signal.

図１２は、実施例４に係る受話音声入力検出器２４０の構成を示す図である。例えば、同図に示すように、受話音声入力検出器２４０は、ディレイタップ２４１ａ，２４１ｂと、フレーム分割処理部２４２ａ，２４２ｂと、相互相関検出器２４３と、信号レベル比較器２４４と、フラグ生成器２４５を有する。なお、ディレイタップ２４１ａ，２４１ｂ、フレーム分割処理部２４２ａ，２４２ｂ、相互相関検出器２４３、信号レベル比較器２４４およびフラグ生成器２４５の処理は、上述した実施例２の音声入力検出器２１０と同様であるので、以下に簡単に説明する。 FIG. 12 is a diagram illustrating the configuration of the received voice input detector 240 according to the fourth embodiment. For example, as shown in the figure, the received voice input detector 240 includes delay taps 241a and 241b, frame division processing units 242a and 242b, a cross-correlation detector 243, a signal level comparator 244, and a flag generator. H.245. The processing of the delay taps 241a and 241b, the frame division processing units 242a and 242b, the cross-correlation detector 243, the signal level comparator 244, and the flag generator 245 is the same as that of the voice input detector 210 of the second embodiment. Because there is, it is explained briefly below.

ディレイタップ２４１ａ，２４１ｂは、既知の遅延（例えば、Ａ／Ｄにおける遅延差や伝送経路での遅延など）を調整する。フレーム分割処理部２４２ａ，２４２ｂは、ディレイタップ２４１ａ，２４１ｂから送出された信号を分割し、相互相関検出器２４３および信号レベル比較器２４４にそれぞれ送出する。 The delay taps 241a and 241b adjust a known delay (for example, a delay difference in A / D or a delay in a transmission path). The frame division processing units 242a and 242b divide the signals sent from the delay taps 241a and 241b and send them to the cross correlation detector 243 and the signal level comparator 244, respectively.

相互相関検出器２４３は、フレーム分割処理部２４２ａから送出された音声信号と、フレーム分割処理部２４２ｂから送出された遠端話者信号との間の相関性を示す相互相関値を算出する。そして、相互相関検出器２４３は、相互相関値の最大値と閾値との比較結果に応じて、“Ｔｒｕｅ（＝相関有り）”あるいは“Ｆａｌｓｅ（＝相関無し）”の相関有無情報をフラグ生成器２４５に送出する。 The cross-correlation detector 243 calculates a cross-correlation value indicating the correlation between the voice signal transmitted from the frame division processing unit 242a and the far-end speaker signal transmitted from the frame division processing unit 242b. Then, the cross-correlation detector 243 sets the correlation presence / absence information of “True (= correlation)” or “False (= no correlation)” as a flag generator according to the comparison result between the maximum value of the cross-correlation value and the threshold value. To H.245.

信号レベル比較器２４４は、例えば、図１２に示すように、二乗平均演算器２４４ａおよびパワー比較器２４４ｂを有する。信号レベル比較器２４４は、フレーム分割処理部２４２ａから送出される音声信号の信号レベル（例えば、電力値）とフレーム分割処理部２４２ｂから送出される遠端話者信号の信号レベル（例えば、電力値）とを比較する。そして、信号レベルの比較結果に応じて、信号レベル比較器２４４は、“Ｔｒｕｅ（＝レベル差有り）”あるいは“Ｆａｌｓｅ（＝レベル差無し）”のレベル比較情報をフラグ生成器２４５に送出する。 For example, as shown in FIG. 12, the signal level comparator 244 includes a root mean square calculator 244a and a power comparator 244b. The signal level comparator 244 is a signal level (for example, power value) of the audio signal transmitted from the frame division processing unit 242a and a signal level (for example, power value) of the far-end speaker signal transmitted from the frame division processing unit 242b. ). Then, according to the signal level comparison result, the signal level comparator 244 sends “True (= level difference present)” or “False (= no level difference)” level comparison information to the flag generator 245.

フラグ生成器２４５は、相互相関検出器２４３および信号レベル比較器２４４から送出された相関有無情報及びレベル比較情報に基づいて、音声信号に遠端話者信号が混入しているか否かを示す受話音声入力フラグを生成し、生成したフラグをノイズキャンセル部２２０に送出する。フラグ生成器２４５は、相関有無情報及びレベル比較情報に基づいて、音声信号への遠端話者信号の混入があることを示す“Ｔｒｕｅ”、あるいは音声信号への遠端話者信号の混入がないことを示す“Ｆａｌｓｅ”の受話音声入力フラグをノイズキャンセル部２２０に送出する。 The flag generator 245 indicates whether or not the far-end speaker signal is mixed in the voice signal based on the correlation presence / absence information and the level comparison information sent from the cross-correlation detector 243 and the signal level comparator 244. A voice input flag is generated, and the generated flag is sent to the noise cancellation unit 220. Based on the presence / absence of correlation information and the level comparison information, the flag generator 245 indicates “True” indicating that the far-end speaker signal is mixed in the voice signal, or whether the far-end speaker signal is mixed in the voice signal. The received voice input flag of “False” indicating that there is no message is sent to the noise canceling unit 220.

［雑音除去装置による処理（実施例４）］
図１３〜図１５は、実施例４に係る雑音除去装置による処理の流れを示す図である。まず、図１３を用いて、受話音声入力検出器２４０による処理の流れを説明する。同図に示すように、受話音声入力検出器２４０は、Ａ／Ｄ３４０ａから出力された音声信号および遠端話者信号を入力すると（ステップＳ１肯定）、音声信号に遠端話者信号が混入しているか否かを検出する（ステップＳ２）。 [Processing by Noise Reduction Device (Example 4)]
FIGS. 13 to 15 are diagrams illustrating a flow of processing performed by the noise removal device according to the fourth embodiment. First, the flow of processing by the received voice input detector 240 will be described with reference to FIG. As shown in the figure, when the received voice input detector 240 inputs the voice signal and the far-end speaker signal output from the A / D 340a (Yes in step S1), the far-end talker signal is mixed into the voice signal. Is detected (step S2).

そして、受話音声入力検出器２４０は、音声信号に遠端話者信号が混入していると判定した場合には（ステップＳ２肯定）、音声信号への遠端話者信号の混入があることを示す受話音声入力フラグ“Ｔｒｕｅ”をノイズキャンセル部２２０に送出する（ステップＳ３）。一方、音声入力検出器２４０は、音声信号に遠端話者信号が混入していないと判定した場合には（ステップＳ２否定）、音声信号への遠端話者信号の混入がないことを示す音声入力フラグ“Ｆａｌｓｅ”をノイズキャンセル部２２０に送出する（ステップＳ４）。 When the received voice input detector 240 determines that the far-end speaker signal is mixed in the voice signal (Yes in step S2), it indicates that the far-end speaker signal is mixed in the voice signal. The received voice input flag “True” is sent to the noise canceling unit 220 (step S3). On the other hand, if the voice input detector 240 determines that the far-end speaker signal is not mixed in the voice signal (No in step S2), it indicates that the far-end speaker signal is not mixed in the voice signal. The voice input flag “False” is sent to the noise canceling unit 220 (step S4).

次に、図１４を用いて、ノイズキャンセル部２２０による処理の流れを説明する。同図に示すように、フィルタ係数推定器２２２は、フィルタ係数の更新を実行するタイミングで、入力済みである音声入力フラグまたは受話音声入力フラグの少なくとも一方が“Ｔｒｕｅ”の場合には（ステップＳ１肯定）、次のように動作する。すなわち、フィルタ係数推定器２２２は、ノイズキャンセル（ＮＣ）用のフィルタ係数の更新を停止する（ステップＳ２）。そして、フィルタ係数推定器２２２は、前回使用したノイズキャンセル（ＮＣ）用のフィルタ係数をＦＩＲフィルタ２２１に送出する（ステップＳ３）。その結果、フィルタ係数推定器２２２によりフィルタ係数の更新が停止されている間、ＦＩＲフィルタ２２１は、前回使用したフィルタ係数を用いて、音声信号から雑音成分を除去する。 Next, the flow of processing by the noise cancellation unit 220 will be described with reference to FIG. As shown in the figure, when the filter coefficient estimator 222 executes the update of the filter coefficient and at least one of the input voice input flag and the received voice input flag is “True” (step S1). Yes), it works as follows. That is, the filter coefficient estimator 222 stops updating the filter coefficient for noise cancellation (NC) (step S2). Then, the filter coefficient estimator 222 sends the previously used filter coefficient for noise cancellation (NC) to the FIR filter 221 (step S3). As a result, while the update of the filter coefficient is stopped by the filter coefficient estimator 222, the FIR filter 221 removes a noise component from the speech signal using the filter coefficient used last time.

ここで、ステップＳ１の説明に戻る。フィルタ係数推定器２２２は、フィルタ係数の更新を実行するタイミングで、入力済みである音声入力フラグまたは受話音声入力フラグの少なくとも一方が“Ｔｒｕｅ”ではない場合（双方がＦａｌｓｅである場合）には（ステップＳ１否定）、次のように動作する。すなわち、フィルタ係数推定器２２２は、ノイズキャンセル（ＮＣ）用のフィルタ係数を更新する（ステップＳ４）。そして、フィルタ係数推定器２２２は、更新後のノイズキャンセル（ＮＣ）用のフィルタ係数をＦＩＲフィルタ２２１に送出する（ステップＳ５）。その結果、ＦＩＲフィルタ２２１は、フィルタ係数推定器２２２により更新されたフィルタ係数を用いて、音声信号から雑音成分を除去する。 Here, the description returns to step S1. When the filter coefficient estimator 222 executes the update of the filter coefficient and at least one of the input voice input flag and the received voice input flag is not “True” (when both are False), In step S1, the operation is as follows. That is, the filter coefficient estimator 222 updates the filter coefficient for noise cancellation (NC) (step S4). Then, the filter coefficient estimator 222 sends the updated filter coefficient for noise cancellation (NC) to the FIR filter 221 (step S5). As a result, the FIR filter 221 uses the filter coefficient updated by the filter coefficient estimator 222 to remove noise components from the audio signal.

なお、実施例４に係るノイズキャンセル部２２０は、上述した実施例２と同様に、音声再生フラグが存在しえる場合（例えば、ハンズフリー電話３００がコンテンツ再生機能等を有する場合）には、音声再生フラグも加味してフィルタ係数の更新を制御してもよい。例えば、フィルタ係数更新のタイミングで、入力済みである音声入力フラグ、受話音声入力フラグまたは音声再生フラグの少なくともいずれか一つが“Ｔｒｕｅ”である時には、ノイズキャンセル用のフィルタ係数の更新を停止する。 Note that the noise canceling unit 220 according to the fourth embodiment, like the second embodiment described above, is configured to play a sound when an audio playback flag can be present (for example, when the hands-free phone 300 has a content playback function or the like). The update of the filter coefficient may be controlled in consideration of the reproduction flag. For example, when at least one of the input voice input flag, received voice input flag, or voice reproduction flag is “True” at the filter coefficient update timing, the update of the noise cancellation filter coefficient is stopped.

続いて、図１５を用いて、エコーキャンセル部２３０による処理の流れを説明する。同図に示すように、フィルタ係数推定器２３２は、フィルタ係数の更新を実行するタイミングで、入力済みである受話音声入力フラグが“Ｔｒｕｅ”の場合には（ステップＳ１肯定）、次のように動作する。すなわち、フィルタ係数推定器２３２は、エコーキャンセル（ＥＣ）用のフィルタ係数を更新する（ステップＳ２）。そして、フィルタ係数推定器２３２は、更新後のエコーキャンセル（ＥＣ）用のフィルタ係数をＦＩＲフィルタ２３１に送出する（ステップＳ３）。その結果、ＦＩＲフィルタ２３１は、フィルタ係数推定器２３２により更新されたフィルタ係数を用いて、音声信号から受話音声成分を除去する。 Next, the flow of processing by the echo cancellation unit 230 will be described with reference to FIG. As shown in the figure, the filter coefficient estimator 232 performs the update of the filter coefficient, and if the input received voice input flag is “True” (Yes at Step S1), the following is performed. Operate. That is, the filter coefficient estimator 232 updates the filter coefficient for echo cancellation (EC) (step S2). Then, the filter coefficient estimator 232 sends the updated filter coefficient for echo cancellation (EC) to the FIR filter 231 (step S3). As a result, the FIR filter 231 removes the received voice component from the voice signal using the filter coefficient updated by the filter coefficient estimator 232.

ここで、ステップＳ１の説明に戻る。フィルタ係数推定器２３２は、フィルタ係数の更新を実行するタイミングで、入力済みである受話音声入力フラグが“Ｔｒｕｅ”ではない場合（Ｆａｌｓｅである場合）には（ステップＳ１否定）、次のように動作する。すなわち、フィルタ係数推定器２３２は、エコーキャンセル（ＥＣ）用のフィルタ係数の更新を停止する（ステップＳ４）。そして、フィルタ係数推定器２３２は、前回使用したエコーキャンセル（ＥＣ）用のフィルタ係数をＦＩＲフィルタ２３１に送出する（ステップＳ５）。その結果、フィルタ係数推定器２３２によりフィルタ係数の更新が停止されている間、ＦＩＲフィルタ２３１は、前回使用したフィルタ係数を用いて、音声信号から受話音声成分を除去する。 Here, the description returns to step S1. The filter coefficient estimator 232 performs the update of the filter coefficient, and when the received voice input flag that has already been input is not “True” (in the case of False) (No in step S1), the following is performed. Operate. That is, the filter coefficient estimator 232 stops updating the filter coefficient for echo cancellation (EC) (step S4). Then, the filter coefficient estimator 232 sends the previously used filter coefficient for echo cancellation (EC) to the FIR filter 231 (step S5). As a result, while the filter coefficient update is stopped by the filter coefficient estimator 232, the FIR filter 231 removes the received voice component from the voice signal using the filter coefficient used last time.

すなわち、実施例４に係るエコーキャンセル部２３０は、上述した実施例２とは異なり、音声再生フラグではなく、受話音声入力フラグに基づいて、エコーキャンセル用のフィルタ係数の更新を制御する。 That is, unlike the second embodiment described above, the echo cancellation unit 230 according to the fourth embodiment controls the update of the filter coefficient for echo cancellation based on the received voice input flag instead of the voice reproduction flag.

［実施例４による効果］
上述してきたように、実施例４によれば、雑音除去装置２００は、音声信号に遠端話者信号が混入しているか否かを検出し、検出結果に応じた受話音声入力フラグを生成する。そして、音声入力フラグおよび受話音声入力フラグに基づいてフィルタ係数の更新を制御する。例えば、音声信号に遠端話者信号が混入している場合には、ノイズキャンセル用のフィルタ係数の更新を停止する。 [Effects of Example 4]
As described above, according to the fourth embodiment, the noise removal apparatus 200 detects whether or not the far-end speaker signal is mixed in the voice signal, and generates a received voice input flag corresponding to the detection result. . Then, the updating of the filter coefficient is controlled based on the voice input flag and the received voice input flag. For example, when the far-end speaker signal is mixed in the voice signal, the update of the noise canceling filter coefficient is stopped.

すなわち、雑音除去装置２００は、音声信号に遠端話者信号が混入している場合には、フィルタ係数の更新を停止する。このとき、雑音除去装置２００は、音声信号と遠端話者信号との相関の度合いが高いかどうかを判定することにより、音声信号に遠端話者信号が混入しているか否かを判定する。したがって、音声信号に遠端話者信号が混入しているか否かを検知する精度を上げることができ、音声信号に遠端話者信号が混入している状態でのフィルタ係数の更新を回避できる。よって、結果的に、音声信号から雑音成分を精度よく除去できる。 That is, the noise removal apparatus 200 stops updating the filter coefficient when the far-end speaker signal is mixed in the voice signal. At this time, the noise removal apparatus 200 determines whether or not the far-end speaker signal is mixed in the speech signal by determining whether or not the degree of correlation between the speech signal and the far-end speaker signal is high. . Therefore, it is possible to improve the accuracy of detecting whether or not the far-end speaker signal is mixed in the voice signal, and to avoid updating the filter coefficient in a state where the far-end speaker signal is mixed in the voice signal. . As a result, the noise component can be accurately removed from the audio signal.

また、雑音除去装置２００は、音声信号に遠端話者信号が混入していない状態でのエコーキャンセル用のフィルタ係数の更新を回避できるので、音声信号から受話音声成分を精度よく除去できる。 In addition, since the noise removal apparatus 200 can avoid updating the filter coefficient for echo cancellation when the far-end speaker signal is not mixed in the speech signal, the received speech component can be accurately removed from the speech signal.

また、前述した実施例２の音声取得用マイクロホン１１０の代わりに、ビームフォーム型マイクロホンを適用した場合には、次のような問題が考えられる。すなわち、音声再生スピーカ１３０からビームフォーム型マイクロホンに回り込む音声の特性（エコー特性）は、ビームフォーム型マイクロホンの指向方向によって異なる。そのため、例えば、ビームフォーム型マイクロホンの指向方向を移動した直後などでは、エコーキャンセル部２３０におけるフィルタ係数更新の追従に遅延が発生する結果、音声信号が劣化してしまう。 Further, when a beamform type microphone is applied instead of the voice acquisition microphone 110 of the second embodiment, the following problems can be considered. That is, the characteristic (echo characteristic) of the sound that circulates from the sound reproduction speaker 130 to the beamform microphone varies depending on the directivity direction of the beamform microphone. For this reason, for example, immediately after moving the directivity direction of the beam form microphone, a delay occurs in the follow-up of the filter coefficient update in the echo canceling unit 230, resulting in deterioration of the audio signal.

また、ビームフォーム型マイクロホンの指向方向によって、ビームフォーム型マイクロホンに入力される雑音の特性が異なることが考えられる。そのため、上述したエコーキャンセル部２３０と同様に、ビームフォーム型マイクロホンの指向方向を移動した直後などでは、ノイズキャンセル部２２０においてもフィルタ係数更新の追従に遅延が発生し、音声信号が劣化する恐れがある。 In addition, it is conceivable that the characteristics of noise input to the beamform microphone differ depending on the directivity direction of the beamform microphone. Therefore, similar to the echo canceling unit 230 described above, immediately after the beamform microphone is moved in the directing direction, the noise canceling unit 220 also has a delay in tracking the filter coefficient update, and the audio signal may be deteriorated. is there.

そこで、以下の実施例５では、音声取得用マイクロホン１１０の代わりに、ビームフォーム型マイクロホンを適用した場合に、フィルタ係数更新の追従に発生する遅延に対処する実施形態を説明する。 Therefore, in a fifth embodiment below, an embodiment will be described that deals with a delay that occurs in the follow-up of the filter coefficient update when a beamform microphone is applied instead of the sound acquisition microphone 110.

［雑音除去装置の構成（実施例５）］
図１６は、実施例５に係る構成を示す図である。同図に示すように、サービス提供ロボット１００は、音声取得用ビームフォーム型マイクロホン（MIC_SB）１９１、雑音取得用マイクロホン（MIC_N）１２０および音声再生スピーカ１３０を有する。さらに、サービス提供ロボット１００は、同図に示すように、Ａ／Ｄ１４０ａ〜１４０ｃ、Ｄ／Ａ１５０、音声認識部１６０、ロボットコントローラー１７０、人検出部１８０およびアレイマイク制御部１９２を有する。 [Configuration of Noise Reduction Device (Embodiment 5)]
FIG. 16 is a diagram illustrating a configuration according to the fifth embodiment. As shown in the figure, the service providing robot 100 includes an audio acquisition beamform microphone (MIC_SB) 191, a noise acquisition microphone (MIC_N) 120, and an audio reproduction speaker 130. Furthermore, the service providing robot 100 includes A / Ds 140a to 140c, a D / A 150, a voice recognition unit 160, a robot controller 170, a human detection unit 180, and an array microphone control unit 192, as shown in FIG.

音声取得用ビームフォーム型マイクロホン１９１は、指向性を有し、主にサービス提供ロボット１００の利用者から発せられた発話音声の入力を受け付ける。 The beamform type microphone 191 for voice acquisition has directivity and accepts input of uttered voices mainly emitted from users of the service providing robot 100.

雑音取得用マイクロホン１２０は、主にサービス提供ロボット１００を取り巻く環境内に流れているアナウンスやＢＧＭなど、利用者から発せられる発話音声以外の環境音の入力を受け付ける。音声再生スピーカ１３０は、サービス提供ロボット１００にて再生される音声を利用者に向けて出力する。 The noise acquisition microphone 120 accepts input of environmental sounds other than uttered voices such as announcements and BGM that are mainly flowing in the environment surrounding the service providing robot 100. The audio reproduction speaker 130 outputs the audio reproduced by the service providing robot 100 to the user.

Ａ／Ｄ１４０ａは、音声取得用ビームフォーム型マイクロホン１９１を介して入力されるアナログの音声信号をデジタルの音声信号に変換し、雑音除去装置２００に出力する。Ａ／Ｄ１４０ｂは、雑音取得用マイクロホン１２０を介して入力されるアナログの雑音信号をデジタルの雑音信号に変換し、雑音除去装置２００に出力する。Ａ／Ｄ１４０ｃは、後述するＤ／Ａ１５０を介して入力されるアナログの再生音声信号をデジタルの再生音声信号に変換し、雑音除去装置２００に出力する。 The A / D 140 a converts an analog audio signal input via the audio acquisition beamform microphone 191 into a digital audio signal and outputs the digital audio signal to the noise removal apparatus 200. The A / D 140 b converts an analog noise signal input via the noise acquisition microphone 120 into a digital noise signal and outputs the digital noise signal to the noise removal apparatus 200. The A / D 140 c converts an analog reproduced audio signal input via a D / A 150 described later into a digital reproduced audio signal, and outputs the digital reproduced audio signal to the noise removal apparatus 200.

音声認識部１６０は、雑音除去装置２００から出力される音声信号の認識処理を実行し、認識結果をロボットコントローラー１７０に送出する。 The speech recognition unit 160 executes recognition processing of the speech signal output from the noise removal device 200 and sends the recognition result to the robot controller 170.

ロボットコントローラー１７０は、音声認識部１６０から送出された音声認識結果に応じてデジタルの再生音声信号を生成し、生成した再生音声信号をＤ／Ａ１５０に送出する。また、ロボットコントローラー１７０は、再生音声信号をＤ／Ａ１５０に送出する場合に、サービス提供ロボット１００を音源とする音声が再生されることを示す音声再生フラグを後述する雑音除去装置２００（フィルタ係数推定器２２２，２３２）に出力する。例えば、ロボットコントローラー１７０は、音声再生状態にある場合には、“Ｔｒｕｅ（＝音声再生）”を音声再生フラグとして出力し、音声無再生状態にある場合には、“Ｆａｌｓｅ（＝音声無再生）”を音声再生フラグとして出力する。 The robot controller 170 generates a digital playback audio signal according to the voice recognition result sent from the voice recognition unit 160 and sends the generated playback voice signal to the D / A 150. In addition, when the robot controller 170 sends a playback sound signal to the D / A 150, the noise removal device 200 (filter coefficient estimation), which will be described later, indicates a sound playback flag indicating that the sound using the service providing robot 100 as a sound source is played back. Output to the units 222 and 232). For example, the robot controller 170 outputs “True (= sound playback)” as a sound playback flag when in the sound playback state, and “False (= no sound playback) when in the sound non-playback state. "Is output as an audio reproduction flag.

また、ロボットコントローラー１７０は、後述する人検出部１８０から送出された位置情報（音声取得用ビームフォーム型マイクロホン１９１に音声入力可能な範囲に検出した人の位置情報）に応じて、アレイマイク制御部１９２に指向方向の設定指示を送出する。 In addition, the robot controller 170 performs an array microphone control unit according to position information (position information of a person detected in a range where sound can be input to the sound acquisition beamform microphone 191) sent from the person detection unit 180 described later. A direction setting instruction is sent to 192.

アレイマイク制御部１９２は、音声取得用ビームフォーム型マイクロホン１９１に対して音声入力を受け付ける指向方向を設定することにより、音声取得用ビームフォーム型マイクロホン１９１の指向方向を制御する。例えば、アレイマイク制御部１９２は、ロボットコントローラー１７０から送出された指向方向の設定指示に応じて、音声取得用ビームフォーム型マイクロホン１９１に対して指向方向を設定する。そして、アレイマイク制御部１９２は、音声取得用ビームフォーム型マイクロホン１９１の指向方向を示すビームフォーミング制御情報を後述するノイズキャンセル部２２０およびエコーキャンセル部２３０にそれぞれ送出する。 The array microphone control unit 192 controls the directivity direction of the sound acquisition beamform microphone 191 by setting the directivity direction for receiving sound input to the sound acquisition beamform microphone 191. For example, the array microphone control unit 192 sets the directivity direction for the sound acquisition beamform microphone 191 in response to the directivity direction setting instruction sent from the robot controller 170. Then, the array microphone control unit 192 sends beamforming control information indicating the directivity direction of the sound acquisition beamform type microphone 191 to the noise canceling unit 220 and the echo canceling unit 230 described later.

人検出部１８０は、ビジョン（カメラ）や、赤外線センサーあるいは超音波センサーなどの距離測定センサー（距離センサー）を有し、音声取得用ビームフォーム型マイクロホン１９１に音声を入力することが可能な範囲に存在する人を検出する。例えば、人検出部１８０は、ビジョン（カメラ）や距離センサーを用いて、音声取得用ビームフォーム型マイクロホン１９１の指向方向の一定距離内（例えば、１００ｃｍ）に人を検出すると、検出した人の位置情報をロボットコントローラー１７０に出力する。 The human detection unit 180 includes a vision (camera), a distance measurement sensor (distance sensor) such as an infrared sensor or an ultrasonic sensor, and is within a range in which sound can be input to the beamform microphone 191 for sound acquisition. Detect existing people. For example, when the person detection unit 180 detects a person within a certain distance (for example, 100 cm) in the direction of the sound acquisition beamform microphone 191 using a vision (camera) or a distance sensor, the position of the detected person is detected. Information is output to the robot controller 170.

なお、音声取得用ビームフォーム型マイクロホン１９１が、音声の入力受付方向に対して指向方向を自動追従させる機能を有する場合には、指向方向を示すビームフォーミング制御情報を後述するアレイマイク制御部１９２に送出することもできる。アレイマイク制御部１９２は、音声取得用ビームフォーム型マイクロホン１９１から送出された指向方向を後述するノイズキャンセル部２２０およびエコーキャンセル部２３０にそれぞれ送出する。 When the beamform type microphone 191 for voice acquisition has a function of automatically following the directivity direction with respect to the voice input reception direction, beamforming control information indicating the directivity direction is transmitted to the array microphone control unit 192 described later. It can also be sent out. The array microphone control unit 192 sends the directivity directions sent from the sound acquisition beamform microphone 191 to the noise canceling unit 220 and the echo canceling unit 230, which will be described later.

雑音除去装置２００は、後述するノイズキャンセル部２２０およびエコーキャンセル部２３０により、Ａ／Ｄ１４０ａから出力された音声信号から雑音成分および再生音声成分を除去した音声信号を音声認識部１６０に出力する。雑音除去装置２００は、例えば、図１６に示すように、音声入力検出器２１０、ノイズキャンセル部２２０およびエコーキャンセル部２３０を有する。 The noise removal apparatus 200 outputs to the voice recognition unit 160 a voice signal obtained by removing a noise component and a reproduced voice component from the voice signal output from the A / D 140a by a noise cancellation unit 220 and an echo cancellation unit 230 described later. The noise removal apparatus 200 includes, for example, a voice input detector 210, a noise cancellation unit 220, and an echo cancellation unit 230, as shown in FIG.

例えば、音声入力検出器２１０は、上述した実施例２と同様に、音声信号と雑音信号とが最も相関が高い位相を検出する。そして、音声入力検出器２１０は、最も相関が高い位相で音声信号と雑音信号とを重ね合わせ、互いの信号の電力値の平均値の差を算出し、算出した差が所定の閾値を超えているか否かを判定する。そして、音声入力検出器２１０は、判定結果に基づいて、雑音信号に音声信号が混入しているか否かを示す音声入力フラグを生成し、生成した音声入力フラグをノイズキャンセル部２２０（フィルタ係数推定器２２２）に送出する。例えば、音声入力検出器２１０は、雑音信号に音声信号が混入している場合には、“Ｔｒｕｅ（＝混入有り）”を音声入力フラグとして送出し、雑音信号に音声信号が混入していない場合には、“Ｆａｌｓｅ（＝混入無）”を音声入力フラグとして送出する。 For example, the voice input detector 210 detects the phase in which the voice signal and the noise signal have the highest correlation, as in the second embodiment. Then, the audio input detector 210 superimposes the audio signal and the noise signal with the phase having the highest correlation, calculates the difference between the average values of the power values of the signals, and the calculated difference exceeds a predetermined threshold value. It is determined whether or not. Then, the voice input detector 210 generates a voice input flag indicating whether or not a voice signal is mixed in the noise signal based on the determination result, and uses the generated voice input flag as the noise cancellation unit 220 (filter coefficient estimation). Device 222). For example, when the audio signal is mixed in the noise signal, the audio input detector 210 sends “True (= mixed)” as the audio input flag, and the audio signal is not mixed in the noise signal. "False" (= no mixing) is transmitted as a voice input flag.

ノイズキャンセル部２２０は、図１６に示すように、ＦＩＲフィルタ２２１、フィルタ係数推定器２２２およびフィルタ係数初期値メモリ２２３を有する。 As shown in FIG. 16, the noise cancellation unit 220 includes an FIR filter 221, a filter coefficient estimator 222, and a filter coefficient initial value memory 223.

フィルタ係数初期値メモリ２２３は、音声取得用ビームフォーム型マイクロホン１９１の指向方向に対応付けて、各指向方向ごとに、予め設定可能なノイズキャンセル（ＮＣ）用のフィルタ係数の初期値をそれぞれ記憶する。 The filter coefficient initial value memory 223 stores an initial value of a filter coefficient for noise cancellation (NC) that can be set in advance for each directivity direction in association with the directivity direction of the beamform microphone 191 for sound acquisition. .

フィルタ係数推定器２２２は、Ａ／Ｄ１４０ｂから出力される雑音信号に基づいて、ノイズキャンセル（ＮＣ）用のフィルタ係数の更新を行い、更新したフィルタ係数をＦＩＲフィルタ２２１に送出する。例えば、フィルタ係数推定器２２２は、まず、アレイマイク制御部１９２から送出されたビームフォーミング制御情報（指向方向を示す情報）に対応するノイズキャンセル（ＮＣ）用のフィルタ係数の初期値をフィルタ係数初期値メモリ２２３から読み込む。 The filter coefficient estimator 222 updates the filter coefficient for noise cancellation (NC) based on the noise signal output from the A / D 140 b, and sends the updated filter coefficient to the FIR filter 221. For example, the filter coefficient estimator 222 first determines the initial value of the filter coefficient for noise cancellation (NC) corresponding to the beamforming control information (information indicating the pointing direction) sent from the array microphone control unit 192 as the filter coefficient initial value. Read from the value memory 223.

そして、フィルタ係数推定器２２２は、ロボットコントローラー１７０から送出される音声再生フラグに基づいて、エコーキャンセル（ＥＣ）用のフィルタ係数の更新を制御する。例えば、フィルタ係数推定器２２２は、フィルタ係数の更新を実行するタイミングで、音声再生フラグ“Ｆａｌｓｅ”および音声入力フラグ“Ｆａｌｓｅ”の双方を入力済みである場合には、「雑音信号への音声信号の混入無」および「音声無再生状態」と判定する。そして、フィルタ係数推定器２２２は、フィルタ係数初期値メモリ２２３から読み込んだ初期値を用いて、音声信号に含まれる雑音成分を「０」にするように、ノイズキャンセル（ＮＣ）用のフィルタ係数の更新を行う。フィルタ係数の更新後、フィルタ係数推定器２２２は、更新したノイズキャンセル（ＮＣ）用のフィルタ係数をＦＩＲフィルタ２２１に送出する。 The filter coefficient estimator 222 controls the update of the filter coefficient for echo cancellation (EC) based on the sound reproduction flag sent from the robot controller 170. For example, if both the audio reproduction flag “False” and the audio input flag “False” have already been input at the timing of executing the update of the filter coefficient, the filter coefficient estimator 222 indicates that “the audio signal to the noise signal”. "No mixing" and "no sound reproduction state". Then, the filter coefficient estimator 222 uses the initial value read from the filter coefficient initial value memory 223 so that the noise component included in the audio signal is set to “0”. Update. After the filter coefficient is updated, the filter coefficient estimator 222 sends the updated noise cancellation (NC) filter coefficient to the FIR filter 221.

なお、フィルタ係数推定器２２２は、更新したフィルタ係数をビームフォーミング制御情報（指向方向を示す情報）に対応付けてフィルタ係数初期値メモリ２２３に上書き更新する。 Note that the filter coefficient estimator 222 overwrites and updates the updated filter coefficient in the filter coefficient initial value memory 223 in association with the beamforming control information (information indicating the directivity direction).

また、例えば、フィルタ係数推定器２２２は、フィルタ係数の更新を実行するタイミングで、少なくとも、音声再生フラグ“Ｔｒｕｅ”、音声入力フラグ“Ｔｒｕｅ”のいずれか一方を入力済みである場合には、フィルタ係数の更新を停止する。なお、フィルタ係数の更新を停止した場合には、フィルタ係数推定器２２２は、前回使用したノイズキャンセル（ＮＣ）用のフィルタ係数をＦＩＲフィルタ２２１に送出する。 In addition, for example, when the filter coefficient estimator 222 has input at least one of the audio reproduction flag “True” and the audio input flag “True” at the timing of executing the update of the filter coefficient, Stop updating coefficients. When the update of the filter coefficient is stopped, the filter coefficient estimator 222 sends the previously used filter coefficient for noise cancellation (NC) to the FIR filter 221.

エコーキャンセル部２３０は、図１６に示すように、ＦＩＲフィルタ２３１、フィルタ係数推定器２３２およびフィルタ係数初期値メモリ２３３を有する。 As shown in FIG. 16, the echo cancellation unit 230 includes an FIR filter 231, a filter coefficient estimator 232, and a filter coefficient initial value memory 233.

フィルタ係数初期値メモリ２３３は、音声取得用ビームフォーム型マイクロホン１９１の指向方向に対応付けて、各指向方向ごとに、予め設定可能なエコーキャンセル（ＥＣ）用のフィルタ係数の初期値をそれぞれ記憶する。 The filter coefficient initial value memory 233 stores an initial value of a filter coefficient for echo cancellation (EC) that can be set in advance for each directivity direction in association with the directivity direction of the beamform microphone 191 for sound acquisition. .

フィルタ係数推定器２３２は、Ａ／Ｄ１４０ｃから出力される再生音声信号に基づいて、エコーキャンセル（ＥＣ）用のフィルタ係数の更新を行い、更新したフィルタ係数をＦＩＲフィルタ２３１に送出する。例えば、フィルタ係数推定器２３２は、まず、アレイマイク制御部１９２から送出されたビームフォーミング制御情報（指向方向を示す情報）に対応するエコーキャンセル（ＥＣ）用のフィルタ係数の初期値をフィルタ係数初期値メモリ２３３から読み込む。 The filter coefficient estimator 232 updates the filter coefficient for echo cancellation (EC) based on the reproduced audio signal output from the A / D 140c, and sends the updated filter coefficient to the FIR filter 231. For example, the filter coefficient estimator 232 first determines the initial value of the filter coefficient for echo cancellation (EC) corresponding to the beamforming control information (information indicating the pointing direction) sent from the array microphone control unit 192 as the filter coefficient initial value. Read from the value memory 233.

そして、フィルタ係数推定器２３２は、ロボットコントローラー１７０から送出される音声再生フラグに基づいて、エコーキャンセル（ＥＣ）用のフィルタ係数の更新を制御する。例えば、フィルタ係数推定器２３２は、フィルタ係数の更新を実行するタイミングで、音声再生フラグ“Ｔｒｕｅ”を入力済みである場合には、「音声再生状態」と判定する。そして、フィルタ係数推定器２３２は、フィルタ係数初期値メモリ２３３から読み込んだ初期値を用いて、音声信号に含まれる再生音声成分を「０」にするように、エコーキャンセル（ＥＣ）用のフィルタ係数の更新を行う。フィルタ係数の更新後、フィルタ係数推定器２３２は、更新したエコーキャンセル（ＥＣ）用のフィルタ係数をＦＩＲフィルタ２３１に送出する。 The filter coefficient estimator 232 controls the update of the filter coefficient for echo cancellation (EC) based on the sound reproduction flag sent from the robot controller 170. For example, the filter coefficient estimator 232 determines “audio reproduction state” when the audio reproduction flag “True” has already been input at the timing of updating the filter coefficient. Then, the filter coefficient estimator 232 uses the initial value read from the filter coefficient initial value memory 233 to set the reproduced audio component included in the audio signal to “0”, so that the filter coefficient for echo cancellation (EC) is set. Update. After updating the filter coefficient, the filter coefficient estimator 232 sends the updated filter coefficient for echo cancellation (EC) to the FIR filter 231.

なお、フィルタ係数推定器２３２は、更新したフィルタ係数をビームフォーミング制御情報（指向方向を示す情報）に対応付けてフィルタ係数初期値メモリ２３３に上書き更新する。 The filter coefficient estimator 232 updates the filter coefficient initial value memory 233 by overwriting the updated filter coefficient in association with the beamforming control information (information indicating the directivity direction).

また、フィルタ係数推定器２３２は、フィルタ係数更新を実行するタイミングで、音声再生フラグ“Ｆａｌｓｅ”を入力済みである場合には、「音声無再生状態」と判定し、フィルタ係数の更新を停止する。なお、フィルタ係数の更新を停止した場合には、フィルタ係数推定器２３２は、前回使用したエコーキャンセル（ＥＣ）用のフィルタ係数をＦＩＲフィルタ２３１に送出する。 In addition, when the audio reproduction flag “False” has already been input at the timing of executing the filter coefficient update, the filter coefficient estimator 232 determines “no sound reproduction state” and stops updating the filter coefficient. . When the update of the filter coefficient is stopped, the filter coefficient estimator 232 sends the previously used filter coefficient for echo cancellation (EC) to the FIR filter 231.

［実施例５による効果］
上述してきたように、実施例５によれば、雑音除去装置２００は、フィルタ係数初期値メモリ２２３に、音声取得用ビームフォーム型マイクロホン１９１について予め設定可能な指向方向に対応するノイズキャンセル用のフィルタ係数の初期値を記憶しておく。また、雑音除去装置２００は、フィルタ係数初期値メモリ２３３は、音声取得用ビームフォーム型マイクロホン１９１について予め設定可能な指向方向に対応するエコーキャンセル用のフィルタ係数の初期値を記憶しておく。そして、ビームフォーム型マイクロホンの指向方向を移動した直後などであっても、指向方向に対応したフィルタ係数の初期値を用いて、ノイズキャンセル用およびエコーキャンセル用のフィルタ係数の更新を行う。このようなことから、フィルタ係数更新の追従に発生する遅延を回避することができる。 [Effects of Example 5]
As described above, according to the fifth embodiment, the noise removal apparatus 200 stores the filter for noise cancellation corresponding to the directivity direction that can be set in advance in the filter coefficient initial value memory 223 with respect to the beamform microphone 191 for voice acquisition. The initial value of the coefficient is stored. In the noise removal apparatus 200, the filter coefficient initial value memory 233 stores the initial value of the filter coefficient for echo cancellation corresponding to the directivity direction that can be set in advance for the beamform microphone 191 for sound acquisition. Then, even immediately after moving the directional direction of the beamform microphone, the filter coefficient for noise cancellation and echo cancellation is updated using the initial value of the filter coefficient corresponding to the directional direction. For this reason, it is possible to avoid a delay that occurs in tracking the filter coefficient update.

また、以下の実施例６では、主に雑音取得用マイクロホンから雑音を入力するＡ／Ｄ１４０ｂで、サービス提供ロボット１００を音源とする再生音声も入力する場合の一実施形態について説明する。実施例６に係る雑音除去装置２００は、物理的あるいは機能的に統合した一つの処理部で、ノイズまたはエコーキャンセル用のフィルタ係数の更新を実行する。 Further, in the sixth embodiment described below, an embodiment will be described in which playback audio using the service providing robot 100 as a sound source is also input by the A / D 140b that mainly inputs noise from a noise acquisition microphone. The noise removal apparatus 200 according to the sixth embodiment updates a filter coefficient for noise or echo cancellation by a single processing unit integrated physically or functionally.

図１７は、実施例６に係る構成を示す図である。同図に示すように、サービス提供ロボット１００は、音声取得用マイクロホン（MIC_S）１１０、雑音取得用マイクロホン（MIC_N）１２０および音声再生スピーカ１３０を有する。さらに、サービス提供ロボット１００は、同図に示すように、Ａ／Ｄ１４０ａ，１４０ｂ、Ｄ／Ａ１５０、音声認識部１６０およびロボットコントローラー１７０を有する。 FIG. 17 is a diagram illustrating a configuration according to the sixth embodiment. As shown in the figure, the service providing robot 100 includes an audio acquisition microphone (MIC_S) 110, a noise acquisition microphone (MIC_N) 120, and an audio reproduction speaker 130. Furthermore, the service providing robot 100 includes A / Ds 140a and 140b, a D / A 150, a voice recognition unit 160, and a robot controller 170, as shown in FIG.

Ａ／Ｄ１４０ａは、音声取得用マイクロホン１１０を介して入力されるアナログの音声信号をデジタルの音声信号に変換し、雑音除去装置２００に出力する。 The A / D 140 a converts an analog audio signal input via the audio acquisition microphone 110 into a digital audio signal and outputs the digital audio signal to the noise removal apparatus 200.

Ａ／Ｄ１４０ｂは、雑音取得用マイクロホン１２０を介して入力されるアナログの雑音信号をデジタルの雑音信号に変換し、雑音除去装置２００に出力する。また、Ａ／Ｄ１４０ｂは、後述するＤ／Ａ１５０を介して入力されるアナログの再生音声信号をデジタルの音声信号に変換し、雑音除去装置２００に出力する。 The A / D 140 b converts an analog noise signal input via the noise acquisition microphone 120 into a digital noise signal and outputs the digital noise signal to the noise removal apparatus 200. In addition, the A / D 140 b converts an analog reproduced audio signal input via a D / A 150 described later into a digital audio signal, and outputs the digital audio signal to the noise removal apparatus 200.

音声認識部１６０は、雑音除去装置２００から受け付ける音声信号の認識処理を実行し、認識結果をロボットコントローラー１７０に送出する。 The speech recognition unit 160 executes speech signal recognition processing received from the noise removal device 200 and sends the recognition result to the robot controller 170.

ロボットコントローラー１７０は、音声認識部１６０から送出された音声認識結果に応じてデジタルの再生音声信号を生成し、生成した再生音声信号をＤ／Ａ１５０に送出する。また、ロボットコントローラー１７０は、再生音声信号をＤ／Ａ１５０に送出する場合に、サービス提供ロボット１００を音源とする音声が再生されることを示す音声再生フラグを後述する雑音除去装置２００（フィルタ係数推定器２５２）に出力する。例えば、ロボットコントローラー１７０は、音声再生状態にある場合には、“Ｔｒｕｅ（＝音声再生）”を音声再生フラグとして出力し、音声無再生状態にある場合には、“Ｆａｌｓｅ（＝音声無再生）”を音声再生フラグとして出力する。 The robot controller 170 generates a digital playback audio signal according to the voice recognition result sent from the voice recognition unit 160 and sends the generated playback voice signal to the D / A 150. In addition, when the robot controller 170 sends a playback sound signal to the D / A 150, the noise removal device 200 (filter coefficient estimation), which will be described later, indicates a sound playback flag indicating that the sound using the service providing robot 100 as a sound source is played back. To the device 252). For example, the robot controller 170 outputs “True (= sound playback)” as a sound playback flag when in the sound playback state, and “False (= no sound playback) when in the sound non-playback state. "Is output as an audio reproduction flag.

雑音除去装置２００は、後述するノイズキャンセル／エコーキャンセル部２５０により、Ａ／Ｄ１４０ａから出力された音声信号から雑音成分および再生音声成分を除去した音声信号を音声認識部１６０に出力する。雑音除去装置２００は、例えば、図１７に示すように、音声入力検出器２１０、ノイズキャンセル／エコーキャンセル部２５０を有する。 The noise removal apparatus 200 outputs, to the speech recognition unit 160, a speech signal obtained by removing a noise component and a reproduced speech component from the speech signal output from the A / D 140a by a noise cancellation / echo cancellation unit 250 described later. For example, as shown in FIG. 17, the noise removal apparatus 200 includes a voice input detector 210 and a noise cancellation / echo cancellation unit 250.

音声入力検出器２１０は、Ａ／Ｄ１４０ａから出力された音声信号およびＡ／Ｄ１４０ｂから出力された雑音信号を用いて、雑音信号に音声信号が混入しているか否かを検出する。そして、音声入力検出器２１０は、雑音信号に音声信号が混入しているか否かを示す音声入力フラグを、後述するノイズキャンセル／エコーキャンセル部２５０（フィルタ係数推定器２５２）に送出する。 The voice input detector 210 uses the voice signal output from the A / D 140a and the noise signal output from the A / D 140b to detect whether the voice signal is mixed in the noise signal. Then, the voice input detector 210 sends a voice input flag indicating whether or not a voice signal is mixed in the noise signal to a noise cancellation / echo cancellation unit 250 (filter coefficient estimator 252) described later.

例えば、音声入力検出器２１０は、上述した実施例２と同様に、音声信号と雑音信号とが最も相関が高い位相を検出する。そして、音声入力検出器２１０は、最も相関が高い位相で音声信号と雑音信号とを重ね合わせ、互いの信号の電力値の平均値の差を算出し、算出した差が所定の閾値を超えているか否かを判定する。そして、音声入力検出器２１０は、判定結果に基づいて、雑音信号に音声信号が混入しているか否かを示す音声入力フラグを生成し、生成した音声入力フラグをノイズキャンセル／エコーキャンセル部２５０（フィルタ係数推定器２５２）に送出する。例えば、音声入力検出器２１０は、雑音信号に音声信号が混入している場合には、“Ｔｒｕｅ（＝混入有り）”を音声入力フラグとして送出し、雑音信号に音声信号が混入していない場合には、“Ｆａｌｓｅ（＝混入無）”を音声入力フラグとして送出する。 For example, the voice input detector 210 detects the phase in which the voice signal and the noise signal have the highest correlation, as in the second embodiment. Then, the audio input detector 210 superimposes the audio signal and the noise signal with the phase having the highest correlation, calculates the difference between the average values of the power values of the signals, and the calculated difference exceeds a predetermined threshold value. It is determined whether or not. Then, the voice input detector 210 generates a voice input flag indicating whether or not a voice signal is mixed in the noise signal based on the determination result, and uses the generated voice input flag as a noise cancel / echo cancel unit 250 ( To the filter coefficient estimator 252). For example, when the audio signal is mixed in the noise signal, the audio input detector 210 sends “True (= mixed)” as the audio input flag, and the audio signal is not mixed in the noise signal. "False" (= no mixing) is transmitted as a voice input flag.

ノイズキャンセル／エコーキャンセル部２５０は、図１７に示すように、ＦＩＲフィルタ２５１、フィルタ係数推定器２５２およびフィルタ係数初期値メモリ２５３を有する。 As shown in FIG. 17, the noise cancellation / echo cancellation unit 250 includes an FIR filter 251, a filter coefficient estimator 252, and a filter coefficient initial value memory 253.

ＦＩＲフィルタ２５１は、フィルタ係数推定器２５２から送出されるノイズキャンセル（ＮＣ）用またはエコーキャンセル（ＥＣ）用のフィルタ係数を用いて、Ａ／Ｄ１４０ａから出力される音声信号から雑音成分または再生音声成分を除去する。なお、ノイズキャンセル（ＮＣ）用のフィルタ係数は、音声信号に含まれる雑音成分を「０」にするように、雑音信号を適応する場合の伝達関数の係数として用いる。また、エコーキャンセル（ＥＣ）用のフィルタ係数は、音声信号に含まれる再生音声成分を「０」にするように、再生音声信号を適応する場合の伝達関数の係数として用いる。 The FIR filter 251 uses a noise cancel (NC) or echo cancel (EC) filter coefficient sent from the filter coefficient estimator 252 to generate a noise component or a reproduced voice component from the voice signal output from the A / D 140a. Remove. Note that the filter coefficient for noise cancellation (NC) is used as a coefficient of a transfer function when the noise signal is adapted so that the noise component included in the audio signal is “0”. Further, the filter coefficient for echo cancellation (EC) is used as a coefficient of a transfer function when the reproduced audio signal is adapted so that the reproduced audio component included in the audio signal is set to “0”.

フィルタ係数初期値メモリ２５３は、ノイズキャンセル（ＮＣ）用のフィルタ係数の初期値と、エコーキャンセル（ＥＣ）用のフィルタ係数の初期値とをそれぞれ記憶する。 The filter coefficient initial value memory 253 stores an initial value of a filter coefficient for noise cancellation (NC) and an initial value of a filter coefficient for echo cancellation (EC).

フィルタ係数推定器２５２は、Ａ／Ｄ１４０ｂから出力される雑音信号または再生音声信号に基づいて、ノイズキャンセル（ＮＣ）用またはエコーキャンセル（ＥＣ）用のフィルタ係数の更新を行い、更新したフィルタ係数をＦＩＲフィルタ２５１に送出する。 The filter coefficient estimator 252 updates the filter coefficient for noise cancellation (NC) or echo cancellation (EC) on the basis of the noise signal or reproduced audio signal output from the A / D 140b, and uses the updated filter coefficient. Send to the FIR filter 251.

図１８に、フラグの状態とフィルタ係数推定器２５２の動作との対応関係を示す。図１８は、実施例６に係る雑音除去装置を説明する図である。同図の列の項目は、フィルタ係数推定器２５２に入力される信号の種別を示し、同図の行の項目は、入力される信号に応じたフラグの内容、ロードするフィルタ係数の初期値の種類およびフィルタ係数の更新の実否を示す。 FIG. 18 shows a correspondence relationship between the flag state and the operation of the filter coefficient estimator 252. FIG. 18 is a diagram illustrating the noise removal device according to the sixth embodiment. The items in the column in the figure indicate the types of signals input to the filter coefficient estimator 252, and the items in the row in the figure indicate the flag contents corresponding to the input signals and the initial values of the filter coefficients to be loaded. Indicates whether the type and filter coefficient are updated.

例えば、図１８に示す列の項目の一番左に示すように、フィルタ係数推定器２５２に雑音信号のみが入力される場合のフラグの内容は、音声入力フラグ「ＯＦＦ」、音声再生フラグ「ＯＦＦ」となる。なお、音声入力フラグ「ＯＮ」は、雑音信号に音声信号が混入していることを示すフラグであり、音声入力フラグ「ＯＦＦ」は、雑音信号に音声信号が混入していないことを示すフラグである。音声再生フラグ「ＯＮ」は、音声信号に再生音声信号が混入していることを示すフラグであり、音声再生フラグ「ＯＦＦ」は、音声信号に再生音声信号が混入していないことを示すフラグである。 For example, as shown in the leftmost column item in FIG. 18, the contents of the flag when only the noise signal is input to the filter coefficient estimator 252 are the audio input flag “OFF” and the audio reproduction flag “OFF”. " The voice input flag “ON” is a flag indicating that the voice signal is mixed in the noise signal, and the voice input flag “OFF” is a flag indicating that the voice signal is not mixed in the noise signal. is there. The audio reproduction flag “ON” is a flag indicating that the reproduced audio signal is mixed in the audio signal, and the audio reproduction flag “OFF” is a flag indicating that the reproduced audio signal is not mixed in the audio signal. is there.

そして、例えば、フィルタ係数推定器２５２に雑音信号のみが入力される場合に、フィルタ係数初期値メモリ２５３からロードするフィルタ係数の初期値の種別は、「ＮＣ」となる。なお、「ＮＣ」はノイズキャンセル用のフィルタ係数の初期値を示す。また、フィルタ係数推定器２５２に雑音信号のみが入力される場合のフィルタ係数の更新の実否は、「ＯＮ」となる。なお、フィルタ係数の更新「ＯＮ」は、フィルタの更新を実行すべきことを示し、フィルタ係数の更新「ＯＦＦ」は、フィルタの更新を停止すべきことを示す。 For example, when only the noise signal is input to the filter coefficient estimator 252, the type of the initial value of the filter coefficient loaded from the filter coefficient initial value memory 253 is “NC”. “NC” represents an initial value of a filter coefficient for noise cancellation. In addition, when only the noise signal is input to the filter coefficient estimator 252, whether or not the filter coefficient is updated is “ON”. The filter coefficient update “ON” indicates that the filter should be updated, and the filter coefficient update “OFF” indicates that the filter update should be stopped.

また、例えば、図１８に示す列の項目の左から２番目に示すように、フィルタ係数推定器２５２に音声信号のみが入力される場合のフラグの内容は、音声入力フラグ「ＯＮ」、音声再生フラグ「ＯＦＦ」となる。また、フィルタ係数推定器２５２に音声信号のみが入力される場合に、フィルタ係数初期値メモリ２５３からロードするフィルタ係数の初期値の種別は、「ＮＣ」となる。また、フィルタ係数推定器２５２に音声信号のみが入力される場合のフィルタ係数の更新の実否は、「ＯＦＦ」となる。 Further, for example, as shown in the second column from the left in the column shown in FIG. 18, the content of the flag when only the audio signal is input to the filter coefficient estimator 252 is the audio input flag “ON”, the audio reproduction The flag becomes “OFF”. When only the audio signal is input to the filter coefficient estimator 252, the type of the initial value of the filter coefficient loaded from the filter coefficient initial value memory 253 is “NC”. Also, whether or not the filter coefficient is updated when only the audio signal is input to the filter coefficient estimator 252 is “OFF”.

また、例えば、図１８に示す列の項目の左から３番目に示すように、フィルタ係数推定器２５２に再生音声信号のみが入力される場合のフラグの内容は、音声入力フラグ「ＯＦＦ」、音声再生フラグ「ＯＮ」となる。また、フィルタ係数推定器２５２に再生音声信号のみが入力される場合に、フィルタ係数初期値メモリ２５３からロードするフィルタ係数の初期値の種別は、「ＥＣ」となる。なお、「ＥＣ」はエコーキャンセル用のフィルタ係数の初期値を示す。また、フィルタ係数推定器２５２に再生音声信号のみが入力される場合のフィルタ係数の更新の実否は、「ＯＮ」となる。 Further, for example, as shown in the third column from the left in the column shown in FIG. 18, the contents of the flag when only the reproduced audio signal is input to the filter coefficient estimator 252 are the audio input flag “OFF”, the audio The reproduction flag is “ON”. When only the reproduced audio signal is input to the filter coefficient estimator 252, the type of the initial value of the filter coefficient loaded from the filter coefficient initial value memory 253 is “EC”. “EC” indicates an initial value of a filter coefficient for echo cancellation. Further, whether or not the filter coefficient is updated when only the reproduced audio signal is input to the filter coefficient estimator 252 is “ON”.

また、例えば、図１８に示す列の項目の左から４番目に示すように、フィルタ係数推定器２５２に音声信号および再生音声信号が入力される場合のフラグの内容は、音声入力フラグ「ＯＮ」、音声再生フラグ「ＯＮ」となる。また、フィルタ係数推定器２５２に音声信号および再生音声信号が入力される場合に、フィルタ係数初期値メモリ２５３からロードするフィルタ係数の初期値の種別は、「ＥＣ」となる。また、フィルタ係数推定器２５２に音声信号および再生音声信号が入力される場合のフィルタ係数の更新の実否は、「ＯＦＦ」となる。 For example, as shown in the fourth column from the left in the column shown in FIG. 18, the content of the flag when the audio signal and the reproduced audio signal are input to the filter coefficient estimator 252 is the audio input flag “ON”. The audio reproduction flag is “ON”. When the audio signal and the reproduced audio signal are input to the filter coefficient estimator 252, the type of the initial value of the filter coefficient loaded from the filter coefficient initial value memory 253 is “EC”. Further, whether the filter coefficient is updated or not when the audio signal and the reproduced audio signal are input to the filter coefficient estimator 252 is “OFF”.

フィルタ係数推定器２５２は、フィルタ係数の更新を実行する場合に、図１８に示す対応関係に従って、フィルタ係数初期値メモリ２５３から、ノイズキャンセル（ＮＣ）用あるいはエコーキャンセル（ＥＣ）用のフィルタ係数の初期値を読み込む。そして、フィルタ係数推定器２５２は、読み込んだフィルタ係数の初期値を用いて、同図に示す対応関係に従ったフィルタ係数の更新を行う。 When updating the filter coefficients, the filter coefficient estimator 252 reads the filter coefficients for noise cancellation (NC) or echo cancellation (EC) from the filter coefficient initial value memory 253 according to the correspondence shown in FIG. Read the initial value. Then, the filter coefficient estimator 252 updates the filter coefficient in accordance with the correspondence shown in the figure, using the read initial value of the filter coefficient.

例えば、フィルタ係数推定器２５２は、雑音信号のみが入力される場合には、図１８に示す対応関係に従って、ノイズキャンセル用のフィルタ係数の初期値をロードする。そして、フィルタ係数推定器２５２は、ノイズキャンセル用のフィルタ係数の初期値を用いて、ノイズキャンセル用のフィルタ係数を更新する。 For example, when only the noise signal is input, the filter coefficient estimator 252 loads the initial value of the filter coefficient for noise cancellation according to the correspondence shown in FIG. Then, the filter coefficient estimator 252 updates the filter coefficient for noise cancellation using the initial value of the filter coefficient for noise cancellation.

また、例えば、フィルタ係数推定器２５２は、音声信号のみが入力される場合には、図１８に示す対応関係に従って、ノイズキャンセル用のフィルタ係数の初期値をロードする。そして、フィルタ係数推定器２５２は、ノイズキャンセル用のフィルタ係数の初期値を用いたノイズキャンセル用のフィルタ係数の更新は行わず、ロードしたノイズキャンセル用のフィルタ係数の初期値をＦＩＲフィルタ２５１に送出する。ＦＩＲフィルタ２５１は、ノイズキャンセル用のフィルタ係数の初期値を用いて、音声信号から雑音成分を除去する。 For example, when only the audio signal is input, the filter coefficient estimator 252 loads the initial value of the filter coefficient for noise cancellation according to the correspondence relationship shown in FIG. Then, the filter coefficient estimator 252 does not update the noise canceling filter coefficient using the initial value of the noise canceling filter coefficient, and sends the loaded initial value of the noise canceling filter coefficient to the FIR filter 251. To do. The FIR filter 251 removes a noise component from the audio signal using the initial value of the filter coefficient for noise cancellation.

また、例えば、フィルタ係数推定器２５２は、再生音声信号のみが入力される場合には、図１８に示す対応関係に従って、エコーキャンセル用のフィルタ係数の初期値をロードする。そして、フィルタ係数推定器２５２は、エコーキャンセル用のフィルタ係数の初期値を用いて、エコーキャンセル用のフィルタ係数を更新する。 Further, for example, when only the reproduced audio signal is input, the filter coefficient estimator 252 loads the initial value of the filter coefficient for echo cancellation according to the correspondence shown in FIG. Then, the filter coefficient estimator 252 updates the filter coefficient for echo cancellation using the initial value of the filter coefficient for echo cancellation.

また、例えば、フィルタ係数推定器２５２は、音声信号および音声再生信号の双方が入力される場合には、図１８に示す対応関係に従って、エコーキャンセル用のフィルタ係数の初期値をロードする。そして、フィルタ係数推定器２５２は、エコーキャンセル用のフィルタ係数の初期値を用いたエコーキャンセル用のフィルタ係数の更新は行わず、ロードしたエコーキャンセル用のフィルタ係数の初期値をＦＩＲフィルタ２５１に送出する。ＦＩＲフィルタ２５１は、エコーキャンセル用のフィルタ係数の初期値を用いて、音声信号から再生音声成分を除去する。 For example, when both the audio signal and the audio reproduction signal are input, the filter coefficient estimator 252 loads the initial value of the filter coefficient for echo cancellation according to the correspondence relationship shown in FIG. Then, the filter coefficient estimator 252 does not update the filter coefficient for echo cancellation using the initial value of the filter coefficient for echo cancellation, and sends the initial value of the loaded filter coefficient for echo cancellation to the FIR filter 251. To do. The FIR filter 251 removes the reproduced audio component from the audio signal using the initial value of the filter coefficient for echo cancellation.

なお、フィルタ係数推定器２５２は、更新したフィルタ係数をフィルタ係数初期値メモリ２５３に上書き更新する。 The filter coefficient estimator 252 overwrites and updates the updated filter coefficient in the filter coefficient initial value memory 253.

［実施例６による効果］
上述してきたように、実施例６によれば、主に雑音取得用マイクロホン１２０から雑音を入力するＡ／Ｄ１４０ｂで、サービス提供ロボット１００を音源とする再生音声も入力する場合にも、上述した実施例２と同様にしてフィルタ係数を更新できる。 [Effects of Example 6]
As described above, according to the sixth embodiment, the A / D 140b that mainly inputs noise from the noise acquisition microphone 120 also performs the above-described implementation even when the reproduction sound using the service providing robot 100 as a sound source is input. The filter coefficient can be updated in the same manner as in Example 2.

以下、本願の開示する雑音除去装置の他の実施形態を説明する。 Hereinafter, other embodiments of the noise removal device disclosed in the present application will be described.

（装置構成等）
例えば、図２に示した雑音除去装置２００の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、雑音除去装置２００の分散・統合の具体的形態は図示のものに限られず、例えば、ノイズキャンセル部２２０のＦＩＲフィルタ２２１とフィルタ係数推定器２２２とを機能的または物理的に統合する。このように、雑音除去装置２００の全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 (Equipment configuration etc.)
For example, each component of the noise removal apparatus 200 shown in FIG. 2 is functionally conceptual and does not necessarily need to be physically configured as illustrated. That is, the specific form of the dispersion / integration of the noise removal apparatus 200 is not limited to that shown in the figure. For example, the FIR filter 221 and the filter coefficient estimator 222 of the noise cancellation unit 220 are functionally or physically integrated. As described above, all or a part of the noise removal apparatus 200 can be configured to be functionally or physically distributed / integrated in arbitrary units according to various loads or usage conditions.

以上の実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）主に第２の音声が入力される第２の入力部から取得した第２の信号に基づいて更新されるフィルタ係数を用いて、主に第１の音声が入力される第１の入力部から取得した第１の信号から前記第２の音声の成分を除去する除去部と、
前記第１の信号と前記第２の信号との間の相関性を示す相関値を算出し、当該算出された相関値が所定の閾値を超えるか否かを判定するとともに、前記第１の信号の出力レベルと前記第２の信号の出力レベルとを比較して前記第１の信号の出力レベルと前記第２の信号の出力レベルとの差が所定の閾値を超えるか否かを判定し、前記相関値が所定の閾値を超えるものと判定し、かつ前記出力レベルの差が所定の閾値を超えるものと判定した場合には、前記第２の信号に前記第１の音声成分が混入している旨を検知する混入検知部と、
前記混入検知部により前記第２の信号に前記第１の音声の成分が混入している旨が検知された場合に、前記フィルタ係数の更新を停止するように制御する制御部と
を有することを特徴とする雑音除去装置。 (Supplementary Note 1) The first input of the first sound mainly using the filter coefficient updated based on the second signal acquired from the second input unit to which the second sound is input. A removing unit that removes the second audio component from the first signal acquired from the input unit;
A correlation value indicating a correlation between the first signal and the second signal is calculated, it is determined whether the calculated correlation value exceeds a predetermined threshold, and the first signal The output level of the second signal is compared with the output level of the second signal to determine whether the difference between the output level of the first signal and the output level of the second signal exceeds a predetermined threshold, If it is determined that the correlation value exceeds a predetermined threshold and the output level difference exceeds a predetermined threshold, the first audio component is mixed into the second signal. A contamination detection unit for detecting that there is,
A controller that controls to stop updating the filter coefficient when the mixing detection unit detects that the first audio component is mixed in the second signal. A featured noise removal device.

（付記２）雑音除去装置を有する音声出力装置から再生出力される第３の音声に対応する信号である第３の信号に基づいて更新されるフィルタ係数を用いて、前記第１の入力部から出力される第１の信号から前記第３の音声の成分を除去する再生音除去部と、
前記第３の音声の再生を検知する再生検知部と、
をさらに有し、
前記制御部は、前記再生検知部により前記第３の音声の再生が検知された場合に、前記第２の信号に基づいて更新されるフィルタ係数の更新を停止するように制御するとともに、前記第３の信号に基づいて更新されるフィルタ係数を更新することを特徴とする付記１に記載の雑音除去装置。 (Supplementary Note 2) Using the filter coefficient updated based on the third signal, which is a signal corresponding to the third sound reproduced and output from the sound output device having the noise removal device, from the first input unit A reproduced sound removing unit for removing the component of the third sound from the output first signal;
A reproduction detector for detecting reproduction of the third sound;
Further comprising
The control unit controls to stop the update of the filter coefficient that is updated based on the second signal when the reproduction detection unit detects the reproduction of the third sound. The noise removal apparatus according to appendix 1, wherein the filter coefficient updated based on the signal 3 is updated.

（付記３）主に第２の音声が入力される第２の入力部から取得した第２の信号に基づいて更新されるフィルタ係数を用いて、主に第１の音声が入力される第１の入力部から取得した第１の信号から前記第２の音声の成分を除去する除去部と、
前記第１の入力部に前記第１の音声を入力することが可能な範囲に存在する人を検出する人検出部と、
前記人検出部により人が検出された場合には、前記フィルタ係数の更新を停止するように制御する制御部と
を有することを特徴とする付記１に記載の雑音除去装置。 (Additional remark 3) The 1st audio | voice is mainly input 1st using the filter coefficient updated based on the 2nd signal acquired from the 2nd input part from which the 2nd audio | voice is mainly input. A removing unit that removes the second audio component from the first signal acquired from the input unit;
A person detection unit for detecting a person existing in a range in which the first voice can be input to the first input unit;
The noise removal apparatus according to appendix 1, further comprising: a control unit that controls to stop updating the filter coefficient when a person is detected by the human detection unit.

（付記４）指向性を有し、主に第１の音声が入力される第１の入力部と、
主に第２の音声が入力される第２の入力部と、
主に雑音除去装置を音源とする第３の音声が入力される第３の入力部と、
前記第１の入力部の指向方向に対応付けて、各指向方向ごとに、前記第２の入力部から取得した第２の信号に基づいて更新される第１のフィルタ係数の初期値をそれぞれ記憶する第１の記憶部と、
前記第１の入力部の指向方向に対応付けて、各指向方向ごとに、前記第３の入力部から取得した第３の信号に基づいて更新される第２のフィルタ係数の初期値をそれぞれ記憶する第２の記憶部と、
前記第１の入力部に前記第１の音声を入力することが可能な範囲に存在する人を検出する人検出部と、
前記人検出部により人が検出された場合に、当該検出された方向に前記第１の入力部の指向方向を向けるように制御する方向制御部と、
前記第２の信号に前記第１の音声成分が混入しているか否かを検知する混入検知部と、
前記第３の音声の再生を検知する再生検知部と、
前記混入検知部による検知結果および前記再生検知部による検知結果に応じて、前記方向制御部により制御された指向方向に対応する前記第１のフィルタ係数の初期値を前記第１の記憶部から読み込んで、当該読み込んだ初期値を用いて第１のフィルタ係数を更新し、更新した第１のフィルタ係数を当該読み込んだ初期値の代わりに前記第１の記憶部に格納する第１の更新部と、
前記混入検知部による検知結果および前記再生検知部による検知結果に応じて、前記方向制御部により制御された指向方向に対応する前記第２のフィルタ係数の初期値を前記第２の記憶部から読み込んで、当該読み込んだ初期値を用いて第２のフィルタ係数を更新し、更新した第２のフィルタ係数を当該読み込んだ初期値の代わりに前記第２の記憶部に格納する第２の更新部と
を有することを特徴とする雑音除去装置。 (Supplementary Note 4) A first input unit having directivity and mainly receiving a first sound;
A second input unit mainly receiving a second sound;
A third input unit to which a third sound mainly using a noise removing device as a sound source is input;
An initial value of the first filter coefficient updated based on the second signal acquired from the second input unit is stored for each directivity direction in association with the directivity direction of the first input unit. A first storage unit
The initial value of the second filter coefficient updated based on the third signal acquired from the third input unit is stored for each directivity direction in association with the directivity direction of the first input unit. A second storage unit to
A person detection unit for detecting a person existing in a range in which the first voice can be input to the first input unit;
A direction control unit that controls to direct the pointing direction of the first input unit in the detected direction when a person is detected by the person detection unit;
A mixing detector that detects whether or not the first audio component is mixed in the second signal;
A reproduction detector for detecting reproduction of the third sound;
The initial value of the first filter coefficient corresponding to the directivity direction controlled by the direction control unit is read from the first storage unit according to the detection result by the mixing detection unit and the detection result by the regeneration detection unit. A first updating unit that updates the first filter coefficient using the read initial value and stores the updated first filter coefficient in the first storage unit instead of the read initial value; ,
The initial value of the second filter coefficient corresponding to the directivity direction controlled by the direction control unit is read from the second storage unit according to the detection result by the mixing detection unit and the detection result by the regeneration detection unit. A second updating unit that updates the second filter coefficient using the read initial value and stores the updated second filter coefficient in the second storage unit instead of the read initial value; A noise removal apparatus comprising:

（付記５）主に第１の音声が入力される第１の入力部と、
第２の音声および雑音除去装置を音源とする第３の音声が入力される第２の入力部と、
前記第２の入力部から取得した第２の信号に基づいて更新される第１のフィルタ係数の初期値を記憶する第１の記憶部と、
前記第２の入力部から取得した第３の信号に基づいて更新される第２のフィルタ係数の初期値を記憶する第２の記憶部と、
前記第２の信号に前記第１の音声成分が混入しているか否かを検知する混入検知部と、
前記第３の音声の再生を検知する再生検知部と、
前記混入検知部により前記第２の信号に前記第１の音声成分が混入していないことが検知された場合には、前記第１のフィルタ係数の初期値を前記第１の記憶部から読み込んで、当該読み込んだ初期値を用いて第１のフィルタ係数を更新し、当該更新した第１のフィルタ係数を前記第１の記憶部に格納し、前記再生検知部により前記第３の音声の再生が検知された場合には、前記第２のフィルタ係数の初期値を前記第２の記憶部から読み込んで、当該読み込んだ初期値を用いて第２のフィルタ係数を更新し、当該更新した第２のフィルタ係数を前記第２の記憶部に格納する更新部と
を有することを特徴とする雑音除去装置。 (Additional remark 5) The 1st input part into which the 1st audio | voice is mainly input,
A second input unit to which a third sound using the second sound and the noise removing device as a sound source is input;
A first storage unit that stores an initial value of a first filter coefficient that is updated based on a second signal acquired from the second input unit;
A second storage unit that stores an initial value of a second filter coefficient that is updated based on a third signal acquired from the second input unit;
A mixing detector that detects whether or not the first audio component is mixed in the second signal;
A reproduction detector for detecting reproduction of the third sound;
When the mixing detection unit detects that the first audio component is not mixed in the second signal, the initial value of the first filter coefficient is read from the first storage unit. The first filter coefficient is updated using the read initial value, the updated first filter coefficient is stored in the first storage unit, and the reproduction detection unit reproduces the third sound. If detected, the initial value of the second filter coefficient is read from the second storage unit, the second filter coefficient is updated using the read initial value, and the updated second value And an updating unit that stores the filter coefficient in the second storage unit.

１００サービス提供ロボット
１１０音声取得用マイクロホン
１２０雑音取得用マイクロホン
１３０音声再生スピーカ
１４０ａ〜１４０ｃＡ／Ｄ（アナログデジタル変換器）
１５０Ｄ／Ａ（デジタルアナログ変換器）
１６０音声認識部
１７０ロボットコントローラー
１８０人検出部
１９１音声取得用ビームフォーム型マイクロホン
１９２アレイマイク制御部
２００雑音除去装置
２１０音声入力検出器
２１１ａ，２１１ｂディレイタップ
２１２ａ，２１２ｂフレーム分割処理部
２１３相互相関検出器
２１４信号レベル比較器
２１４ａ二乗平均演算器
２１４ｂパワー比較器
２１５フラグ生成器
２２０ノイズキャンセル部
２２１ＦＩＲ（Finite impulse response）フィルタ
２２２フィルタ係数推定器
２２３フィルタ係数初期値メモリ
２３０エコーキャンセル部
２３１ＦＩＲ（Finite impulse response）フィルタ
２３２フィルタ係数推定器
２３３フィルタ係数初期値メモリ
２４０受話音声入力検出器
２４１ａ，２４１ｂディレイタップ
２４２ａ，２４２ｂフレーム分割処理部
２４３相互相関検出器
２４４信号レベル比較器
２４４ａ二乗平均演算器
２４４ｂパワー比較器
２４５フラグ生成器
２５０ノイズキャンセル／エコーキャンセル部
２５１ＦＩＲフィルタ
２５２フィルタ係数推定器
２５３フィルタ係数初期値メモリ
３００ハンズフリー電話
３１０音声取得用マイクロホン
３２０雑音取得用マイクロホン
３３０音声再生スピーカ
３４０ａ，３４０ｂＡ／Ｄ（アナログデジタル変換器）
３５０Ｄ／Ａ（デジタルアナログ変換器） DESCRIPTION OF SYMBOLS 100 Service provision robot 110 Sound acquisition microphone 120 Noise acquisition microphone 130 Sound reproduction speaker 140a-140c A / D (analog-digital converter)
150 D / A (digital / analog converter)
DESCRIPTION OF SYMBOLS 160 Speech recognition part 170 Robot controller 180 Human detection part 191 Beam form type | mold microphone 192 for voice acquisition Array microphone control part 200 Noise removal apparatus 210 Voice input detector 211a, 211b Delay tap 212a, 212b Frame division | segmentation process part 213 Cross correlation detector 214 signal level comparator 214a root mean square calculator 214b power comparator 215 flag generator 220 noise canceling unit 221 FIR (Finite impulse response) filter 222 filter coefficient estimator 223 filter coefficient initial value memory 230 echo canceling unit 231 FIR (Finite impulse) response) filter 232 filter coefficient estimator 233 filter coefficient initial value memory 240 received voice input detector 241a, 241b delay tap 242a, 242b Frame division processing unit 243 Cross correlation detector 244 Signal level comparator 244a Root mean square calculator 244b Power comparator 245 Flag generator 250 Noise cancellation / echo cancellation unit 251 FIR filter 252 Filter coefficient estimator 253 Filter coefficient initial value memory 300 Hands-free telephone 310 Microphone for sound acquisition 320 Microphone for noise acquisition 330 Sound reproduction speaker 340a, 340b A / D (analog / digital converter)
350 D / A (digital / analog converter)

Claims

The first input unit is input using the first filter coefficient updated based on the second signal output corresponding to the second voice input by the second input unit. The second audio component is removed from the first signal output corresponding to the audio, and the second signal is updated based on the third signal corresponding to the third audio output from the audio output device. A removing unit that removes the third audio component from the first signal using the filter coefficient of
A correlation value indicating a correlation between the first signal and the second signal is calculated, it is determined whether the calculated correlation value exceeds a predetermined threshold, and the first signal The output level of the second signal is compared with the output level of the second signal to determine whether the difference between the output level of the first signal and the output level of the second signal exceeds a predetermined threshold, it is determined that the correlation value exceeds a predetermined threshold value, and when the difference of the output level is judged to exceed a predetermined threshold, the components of the first audio is included in the second signal And the fact that the third signal is output is detected from the audio output device, thereby detecting that the third audio component is included in the first signal. A detection unit;
When the detection unit detects that the second signal includes the first audio component , or the detection unit includes the third audio component. If the fact that is detected, controls so as to stop updating the first filter coefficients, it contains components of the third speech to the first signal by the detecting unit And a control unit that controls to stop the update of the second filter coefficient when it is not detected .

The removal unit directs the first input unit according to the position of the speaker detected by the detection unit that detects the position of the speaker who speaks toward the first input unit which is a beamform microphone. The first filter coefficient and the second filter coefficient that are updated for each directivity direction of the first input unit controlled by a directivity direction control unit that controls the direction, and controlled by the directivity direction control unit Using the first filter coefficient and the second filter coefficient corresponding to the directivity direction of the first input unit, the second sound component and the third sound component from the first signal. Remove each component
The noise removal apparatus according to claim 1, wherein