JP2009216835A

JP2009216835A - Sound echo canceling device, in-vehicle device and sound echo canceling method

Info

Publication number: JP2009216835A
Application number: JP2008058686A
Authority: JP
Inventors: Kentaro Koga; 健太郎古賀; Yasuo Ariki; 康雄有木; Tetsuya Takiguchi; 哲也滝口
Original assignee: Denso Ten Ltd; Kobe University NUC
Current assignee: Denso Ten Ltd; Kobe University NUC
Priority date: 2008-03-07
Filing date: 2008-03-07
Publication date: 2009-09-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide a sound echo canceling device canceling sound echo with high accuracy, even when an environment in which sound echo canceling for extracting an utterance signal uttered by a person is performed, is complicated. <P>SOLUTION: In the environment in which speech recognition for extracting the utterance signal uttered by a person from an observation signal for indicating a collected utterance signal is performed, the sound echo canceling device performs the steps of: measuring a transfer function for each assumed situation beforehand; holding the measured transfer function; receiving a reference signal for indicating a speech signal with a plurality of channels as it is, which is outputted in the environment with the plurality of channels; estimating the sound echo by using the transfer function which is held for each of the plurality of received channels; canceling the sound echo by using each estimated sound echo; and selecting a result in which a speech likelihood for expressing intensity of characteristic of uttered speech, from the results in which the sound echo is canceled. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、集音された音声信号を示す観測信号から人により発話された発話信号を抽出するための音響エコー除去を行う音響エコー除去装置、車載装置および音響エコー除去方法に関する。 The present invention relates to an acoustic echo removal apparatus, an in-vehicle apparatus, and an acoustic echo removal method for performing acoustic echo removal for extracting an utterance signal uttered by a person from an observation signal indicating a collected sound signal.

従来より、車内向け音声認識システムでは、車内で音楽がスピーカから出力されている状況で音声認識を行うと、音声認識マイクに認識対象の音声と音声以外の音声信号（音響エコー）が混入し、音声認識の妨げとなる。そこで、音響エコーキャンセラによって音響エコーを除去し、音声認識率を確保する様々な技術が開示されている。 Conventionally, in a car voice recognition system, when voice recognition is performed in a situation where music is being output from the speaker in the car, the voice to be recognized and a voice signal (acoustic echo) other than the voice are mixed into the voice recognition microphone. This hinders speech recognition. Therefore, various techniques for removing a sound echo by an acoustic echo canceller and ensuring a speech recognition rate are disclosed.

例えば、図７に示すように、スピーカから出力される２ｃｈの参照信号を１ｃｈにまとめ（モノラル化）、１ｃｈにまとめた参照信号を適応フィルタに入力して車内の音響エコーを推定し、推定した音響エコーをマイクに集音された全ての音声を示す観測信号からキャンセルして、その結果を適応フィルタにフィードバックする音声エコーキャンセラ装置が開示されている（非特許文献１参照）。 For example, as shown in FIG. 7, the 2ch reference signals output from the speakers are combined into 1ch (monaural), and the reference signals combined into 1ch are input to the adaptive filter to estimate the acoustic echo in the vehicle and estimate An audio echo canceller that cancels acoustic echo from an observation signal indicating all sounds collected by a microphone and feeds back the result to an adaptive filter is disclosed (see Non-Patent Document 1).

具体的には、音声エコーキャンセラ装置は、マイクにより集音された観測信号に適応フィルタを適用して音響エコーを推定し、観測信号から推定した音響エコーを削除して発話された発話音声のみを抽出し、抽出した音声を音声認識エンジンに出力する。そして、音声エコーキャンセラ装置は、観測信号から音響エコーを削除したキャンセル結果を用いて適応フィルタを更新していくことで、より高精度に発話音声の抽出を行う。 Specifically, the voice echo canceller applies an adaptive filter to the observation signal collected by the microphone to estimate the acoustic echo, deletes the estimated acoustic echo from the observation signal, and only the uttered speech spoken. Extract and output the extracted speech to a speech recognition engine. Then, the speech echo canceller apparatus extracts the speech speech with higher accuracy by updating the adaptive filter using the cancellation result obtained by deleting the acoustic echo from the observation signal.

より具体的には、音声エコーキャンセラ装置は、スピーカ（ＦＲ、ＲＲ）から出力されている参照信号（Ｒｃｈ）と、スピーカ（ＦＬ、ＲＬ）とから出力されている参照信号（Ｌｃｈ）とを一つにまとめて、適応フィルタに入力する。つまり、ここでは、音声エコーキャンセラ装置は、４つのスピーカから出力された音響エコーを１ｃｈの音響エコーとして得ることとなる。そして、音声エコーキャンセラ装置は、一つにまとめられた１ｃｈの参照信号が入力された適応フィルタを用いて、音響エコーを推定し、観測信号から推定した音響エコーを削除して発話された発話音声のみを抽出し（音声認識）、抽出した音声を音声認識エンジンに出力するとともに、推定した音響エコーを用いて適応フィルタを更新する。 More specifically, the audio echo canceller apparatus combines the reference signal (Rch) output from the speakers (FR, RR) and the reference signal (Lch) output from the speakers (FL, RL). And input to the adaptive filter. That is, here, the audio echo canceller apparatus obtains acoustic echoes output from the four speakers as 1ch acoustic echoes. Then, the speech echo canceller apparatus estimates an acoustic echo using an adaptive filter to which the 1ch reference signal combined into one is input, deletes the estimated acoustic echo from the observed signal, and uttered speech spoken Only (speech recognition), and the extracted speech is output to the speech recognition engine, and the adaptive filter is updated using the estimated acoustic echo.

つまり、音声エコーキャンセラ装置は、推定した音響エコーを用いて適応フィルタを更新していくことで、適応フィルタの特性を実際の伝達特性に近づける（収束させる）ことができ、より高精度に発話音声の抽出を行うことができる。 In other words, the speech echo canceller can update the adaptive filter using the estimated acoustic echo, thereby bringing the adaptive filter characteristics closer to the actual transfer characteristics (converging), and utterance speech with higher accuracy. Can be extracted.

Abderrahman、“Speech Enhancement Using Multi-Reference Reduction in a Vehicle Environmen”、pp838-841、Interspeech 2007Abderrahman, “Speech Enhancement Using Multi-Reference Reduction in a Vehicle Environmen”, pp838-841, Interspeech 2007

しかしながら、上記した従来の技術は、人から発話された発話信号を抽出する音響エコー除去を行う環境が複雑な場合には、音響エコー除去精度が劣化するという課題があった。 However, the above-described conventional technique has a problem that the accuracy of acoustic echo removal deteriorates when the environment for performing acoustic echo removal for extracting a speech signal uttered by a person is complicated.

具体的には、従来技術では、参照信号を１ｃｈにまとめた後に、音響エコーを推定し、マイクにより集音された観測信号から推定した音響エコーを削除して音声認識を行うので、無響室など音の反射が少ない部屋などで伝達特性が比較的単純な場合では、適応フィルタの伝達特性で疑似できるため、音響エコーを十分にキャンセルできる。ところが、例えば、車内などの複雑な環境における複雑な伝達特性の場合では、推定した音響エコーにより適応フィルタを更新したとしても、適応フィルタの伝達特性が複雑な環境の伝達特性に近づかず、音響エコーを十分にキャンセルできない。その結果、音声認識の精度が劣化する。 Specifically, in the prior art, after collecting the reference signals into 1ch, the acoustic echo is estimated, and the acoustic echo estimated from the observation signal collected by the microphone is deleted to perform speech recognition. If the transfer characteristic is relatively simple, such as in a room where there is little reflection of sound, the acoustic echo can be canceled sufficiently because it can be simulated by the transfer characteristic of the adaptive filter. However, for example, in the case of complex transfer characteristics in a complicated environment such as in a car, even if the adaptive filter is updated by the estimated acoustic echo, the adaptive filter transfer characteristic does not approach the transfer characteristic of the complex environment, and the acoustic echo Can not be canceled enough. As a result, the accuracy of voice recognition deteriorates.

そこで、この発明は、上述した従来技術の課題を解決するためになされたものであり、人から発話された発話信号を抽出する音響エコー除去を行う環境が複雑な場合でも、高精度に音響エコーを除去することが可能である音響エコー除去装置、車載装置および音響エコー除去方法を提供することを目的とする。 Therefore, the present invention has been made to solve the above-described problems of the prior art, and even when the environment for performing acoustic echo removal for extracting speech signals uttered by a person is complicated, acoustic echoes can be accurately performed. An object of the present invention is to provide an acoustic echo removing device, an in-vehicle device, and an acoustic echo removing method capable of removing noise.

上述した課題を解決し、目的を達成するため、本発明は、集音された音声信号を示す観測信号から人により発話された発話信号を抽出するための音響エコー除去を行う環境内において、想定しうる状況それぞれに対して伝達特性を予め測定し、測定した伝達特性を保持する伝達特性保持手段と、前記環境内に複数チャンネルで出力される音声信号を示す参照信号を、前記複数チャンネルのまま受け付ける参照信号受付手段と、前記参照信号受付手段により受け付けられた複数チャンネルの参照信号それぞれに対して、前記伝達特性保持手段により保持される伝達特性それぞれを用いて、前記発話信号以外の音声を示す音響エコーを推定する音響エコー推定手段と、前記音響エコー推定手段により推定された音響エコーそれぞれを用いて、前記音響エコー除去を行う音響エコー除去手段と、前記音響エコー除去手段により音響エコー除去された結果から、前記発話信号の特徴の強さを表す音声尤度が最も高い結果を選択する音響エコー除去結果選択手段と、を備えたことを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention is assumed in an environment in which acoustic echo removal for extracting a speech signal uttered by a person from an observation signal indicating a collected speech signal is performed. The transfer characteristic holding means for measuring the transfer characteristic in advance for each possible situation and holding the measured transfer characteristic, and the reference signal indicating the audio signal output in a plurality of channels in the environment remain as the plurality of channels. The reference signal receiving means for receiving and the reference signals of the plurality of channels received by the reference signal receiving means for each of the transfer characteristics held by the transfer characteristic holding means, to indicate voice other than the speech signal Using acoustic echo estimation means for estimating acoustic echo, and each acoustic echo estimated by the acoustic echo estimation means, Acoustic echo removal means for performing echo echo removal, and acoustic echo removal result selection for selecting the result having the highest speech likelihood representing the strength of the feature of the speech signal from the result of acoustic echo removal by the acoustic echo removal means Means.

本発明によれば、人から発話された発話信号を抽出する音響エコー除去を行う環境が複雑な場合でも、高精度に音響エコーを除去することが可能である。 ADVANTAGE OF THE INVENTION According to this invention, even when the environment which performs the acoustic echo removal which extracts the speech signal uttered by the person is complicated, an acoustic echo can be removed with high precision.

以下に添付図面を参照して、この発明に係る音響エコー除去装置、車載装置および音響エコー除去方法の実施例を詳細に説明する。なお、以下では、本実施例に係る音響エコー除去装置の概要および特徴、音響エコー除去装置の構成および処理の流れを順に説明し、最後に本実施例に対する種々の変形例を説明する。 Exemplary embodiments of an acoustic echo removal device, an in-vehicle device, and an acoustic echo removal method according to the present invention will be described below in detail with reference to the accompanying drawings. In the following, the outline and features of the acoustic echo removal apparatus according to the present embodiment, the configuration of the acoustic echo removal apparatus and the flow of processing will be described in order, and finally various modifications to the present embodiment will be described.

［音響エコー除去装置の概要および特徴］
まず最初に、図１と図２とを用いて、実施例に係る音響エコー除去装置の概要および特徴を説明する。図１と図２は、実施例１に係る音響エコー除去装置の概要と特徴を説明するための図である。 [Outline and features of acoustic echo canceller]
First, the outline and characteristics of the acoustic echo removing apparatus according to the embodiment will be described with reference to FIGS. 1 and 2. 1 and 2 are diagrams for explaining the outline and features of the acoustic echo removal apparatus according to the first embodiment.

実施例１に係る音響エコー除去装置（音響エコーキャンセラ）は、人から発話された発話信号が抽出された音響エコー除去の結果を用いて各種処理を実行する音声認識エンジンに接続され、例えば、教室、車内、などの環境において集音された音声信号を示す観測信号から発話信号を抽出することを概要とするものであり、特に人から発話された発話信号を抽出する音響エコー除去を行う環境が複雑な場合でも、高精度に音響エコーを除去することが可能である点に主たる特徴がある。 The acoustic echo removal apparatus (acoustic echo canceller) according to the first embodiment is connected to a speech recognition engine that executes various processes using the result of acoustic echo removal from which a speech signal uttered by a person is extracted. It is intended to extract an utterance signal from an observation signal indicating an audio signal collected in an environment such as in a car, and in particular, an environment for performing acoustic echo removal for extracting an utterance signal uttered by a person. The main feature is that acoustic echo can be removed with high accuracy even in a complicated case.

この主たる特徴について、実施例１に係る音響エコー除去装置を車内に適用した例を用いて具体的に説明する。そして、この音響エコー除去装置は、音声認識エンジンに接続され、音声認識エンジンは、音響エコー除去装置から受け付けた音声認識結果に基づいて、カーナビゲーションの操作やスピーカの音量などの音声操作を実行する。 This main feature will be specifically described using an example in which the acoustic echo removing apparatus according to the first embodiment is applied to a vehicle interior. The acoustic echo removal apparatus is connected to a voice recognition engine, and the voice recognition engine performs a voice operation such as a car navigation operation or a speaker volume based on the voice recognition result received from the acoustic echo removal apparatus. .

このような状態において、実施例１に係る音響エコー除去装置は、集音された音声信号を示す観測信号から人により発話された発話信号を抽出するための音響エコー除去環境内において、想定しうる状況それぞれに対して伝達特性を予め測定し、測定した伝達特性を保持する。 In such a state, the acoustic echo removal apparatus according to the first embodiment can be assumed in an acoustic echo removal environment for extracting a speech signal uttered by a person from an observation signal indicating a collected speech signal. The transfer characteristics are measured in advance for each situation, and the measured transfer characteristics are held.

具体的に例を挙げて説明すると、音響エコー除去装置は、図１に示すように、音声認識を行う４つのスピーカ（FR、RR、FL、RL）が備えられた車内において、想定しうる１２パターンの乗員配置でインパルス応答を測定し、測定したインパルス応答に基づいて伝達特性を予め測定し、測定した伝達特性を保持する。なお、車内において想定しうる状況、つまり、インパルス応答が異なる結果となるであろう状況としては、上記した乗員配置以外にも、車の種類や物の配置などが考えられるが、ここでは、最もインパルス応答に違いが発生する状況として、乗員配置を例にして説明する。また、図１に示した乗員配置以外の乗員配置としては、運転席者不在の状況と後部座席の片側に２名が偏る状況とがあるが、音声操作を行う（音声認識を行う）車内状況としては考えにくいため、ここでは除外する。 Specifically, as shown in FIG. 1, the acoustic echo removal apparatus can be assumed in a vehicle equipped with four speakers (FR, RR, FL, RL) that perform voice recognition, as shown in FIG. The impulse response is measured with the occupant arrangement of the pattern, the transfer characteristic is measured in advance based on the measured impulse response, and the measured transfer characteristic is held. In addition, as the situation that can be assumed in the vehicle, that is, the situation where the impulse response will be different, in addition to the above-mentioned occupant arrangement, the type of car and the arrangement of objects can be considered. A situation where a difference occurs in the impulse response will be described by taking an occupant arrangement as an example. In addition, the passenger arrangements other than the passenger arrangement shown in FIG. 1 include a situation in which there is no driver seat and a situation in which two people are biased to one side of the rear seat. Because it is difficult to think, it is excluded here.

そして、音響エコー除去装置は、環境内に複数チャンネルで出力される音声信号を示す参照信号を、複数チャンネルのまま受け付ける。上記した例で具体的に説明すると、音響エコー除去装置は、図２に示すように、車の右側に備え付けられたスピーカ（FR、RR）から出力される参照信号（１ｃｈ）と、車の左側に備え付けられたスピーカ（FL、RL）から出力される参照信号（１ｃｈ）との２ｃｈからなる参照信号を、１ｃｈにまとめてモノラル信号にすることなく２ｃｈのステレオ信号のまま受け付ける。 The acoustic echo removal apparatus receives a reference signal indicating an audio signal output in a plurality of channels in the environment as it is in a plurality of channels. Specifically, in the above example, as shown in FIG. 2, the acoustic echo canceller is configured such that the reference signal (1ch) output from the speaker (FR, RR) provided on the right side of the car and the left side of the car. The reference signal consisting of 2ch with the reference signal (1ch) output from the speaker (FL, RL) provided in is received as a 2ch stereo signal without being integrated into 1ch as a monaural signal.

その後、音響エコー除去装置は、受け付けた複数チャンネルの参照信号それぞれに対して、保持するすべての伝達特性それぞれを用いて、発話音声以外の音声を示す音響エコーを推定する。上記した例で具体的に説明すると、音響エコー除去装置は、受け付けた左右のスピーカから出力される２ｃｈの参照信号それぞれに対して、保持する１２パターンの乗員配置の伝達特性それぞれを、音声認識を行うフィルタに入力して発話音声以外の音声を示す１２パターンの音響エコーを推定する。なお、図２では、説明上、図１に示した１２パターンの伝達特性に対して、FRに対応するフィルタと、RRに対応するフィルタと、FLに対応するフィルタと、FRに対応するフィルタとの４つのフィルタを用いて音響エコーを推定する場合を図示しているが、本発明はこれに限定されるものではなく、全てのスピーカを１つのフィルタに入力し、保持する１２パターンの伝達特性を一つずつ読み出して、音響エコーを推定することもできる。 Thereafter, the acoustic echo removal apparatus estimates an acoustic echo indicating a sound other than the speech sound by using all of the held transfer characteristics for each of the received reference signals of the plurality of channels. Specifically, in the above example, the acoustic echo canceller recognizes each of the transfer characteristics of the 12 patterns of occupant arrangement held for each of the 2ch reference signals output from the received left and right speakers. Twelve patterns of acoustic echoes indicating voices other than the spoken voice are estimated by inputting to the filter to be performed. In FIG. 2, for the sake of explanation, for the 12 patterns of transfer characteristics shown in FIG. 1, a filter corresponding to FR, a filter corresponding to RR, a filter corresponding to FL, and a filter corresponding to FR However, the present invention is not limited to this, and the transfer characteristics of 12 patterns in which all speakers are input and held in one filter are illustrated. Can be read out one by one to estimate the acoustic echo.

そして、音響エコー除去装置は、推定された音響エコーそれぞれを用いて、音響エコー除去を行う。上記した例で具体的に説明すると、音響エコー除去装置は、４つのスピーカから出力された参照信号（音響エコーを推定する際に用いた参照信号と同じタイミングでスピーカから出力された参照信号）と、当該参照信号が出力されている際に乗員により発話された発話信号とがマイクによりまとめて集音された信号である観測信号に対して、推定した１２パターンの音響エコーを適用したフィルタを用いて音響エコー除去を行い、１２パターンの音響エコー除去結果（発話信号抽出結果）を抽出する。 Then, the acoustic echo removal apparatus performs acoustic echo removal using each estimated acoustic echo. More specifically, in the above example, the acoustic echo removal apparatus includes reference signals output from four speakers (reference signals output from the speakers at the same timing as the reference signals used when estimating acoustic echoes), and A filter that applies estimated 12 patterns of acoustic echoes to an observation signal that is a signal collected by a microphone together with an utterance signal uttered by an occupant when the reference signal is output. Then, acoustic echo removal is performed, and 12 patterns of acoustic echo removal results (speech signal extraction results) are extracted.

続いて、音響エコー除去装置は、音響エコー除去された結果から、音響エコーを除去した後に残っている発話音声の特徴の強さを表す音声尤度が最も高い結果を選択する。上記した例で具体的に説明すると、音響エコー除去装置は、音響エコー除去された１２パターンの抽出結果それぞれに対して、ＭＦＣＣ（Mel Frequency Cepstrum Coefficient）などの関数を用いて、音響特徴量を表す音声尤度を算出し、算出した音声尤度が最も高い音響エコー除去結果を選択する。そして、音響エコー除去装置は、選択した音声尤度が最も高い結果を音声認識エンジンに出力し、音声認識エンジンは、受信した音声認識結果に基づいて、各種操作を実行する。 Subsequently, the acoustic echo removal apparatus selects a result having the highest speech likelihood that represents the strength of the feature of the speech that remains after the acoustic echo is removed from the result of acoustic echo removal. Specifically, in the above example, the acoustic echo removal apparatus represents acoustic feature quantities using a function such as MFCC (Mel Frequency Cepstrum Coefficient) for each of the extraction results of 12 patterns subjected to acoustic echo removal. The speech likelihood is calculated, and the acoustic echo removal result having the highest calculated speech likelihood is selected. Then, the acoustic echo removal apparatus outputs the selected result with the highest speech likelihood to the speech recognition engine, and the speech recognition engine executes various operations based on the received speech recognition result.

また、音響エコー除去装置は、上記した処理を発話された度に毎回実行してもよく、例えば、１０分おきに実行するなど、所定のタイミングで実行することもできる。 Further, the acoustic echo removal apparatus may execute the above-described processing every time it is spoken, or may be executed at a predetermined timing, for example, every 10 minutes.

このように、実施例１に係る音響エコー除去装置は、車内マルチスピーカ環境において、想定しうる伝達特性を予め算出し、２ｃｈの参照信号を独立に受け付け、受け付けた２ｃｈの参照信号と予め算出した伝達特性とから音響エコーを推定し、観測信号から音響エコーをキャンセルすることができる結果、上記した主たる特徴のごとく、人から発話された発話信号を抽出する音響エコー除去を行う環境が複雑な場合でも、高精度に音響エコーを除去することが可能である。 As described above, the acoustic echo canceller according to the first embodiment calculates in advance the transfer characteristics that can be assumed in the in-car multi-speaker environment, receives the 2ch reference signal independently, and calculates the received 2ch reference signal in advance. As a result of estimating acoustic echoes from transfer characteristics and canceling acoustic echoes from observed signals, as in the above main features, when the environment for performing acoustic echo removal that extracts speech signals uttered by people is complex However, it is possible to remove acoustic echoes with high accuracy.

［音響エコー除去装置の構成］
次に、図３を用いて、図１と図２に示した音響エコー除去装置の構成を説明する。図３は、実施例１に係る音響エコー除去装置の構成を示すブロック図である。図３に示すように、この音響エコー除去装置１０は、マイク１１と、伝達特性生成部１２と、伝達特性ＤＢ１３と、参照信号受付部２０と、音響エコー推定部２１と、音響エコー除去部２２と、音響エコー除去結果出力部２３とから構成される。 [Configuration of acoustic echo canceller]
Next, the configuration of the acoustic echo removal apparatus shown in FIGS. 1 and 2 will be described with reference to FIG. FIG. 3 is a block diagram illustrating the configuration of the acoustic echo removal apparatus according to the first embodiment. As shown in FIG. 3, the acoustic echo removal apparatus 10 includes a microphone 11, a transfer characteristic generation unit 12, a transfer characteristic DB 13, a reference signal reception unit 20, an acoustic echo estimation unit 21, and an acoustic echo removal unit 22. And an acoustic echo removal result output unit 23.

マイク１１は、車内において出力される各種信号を集音する。上記した例で具体的に説明すると、マイク１１は、後述する音響エコー推定部２１により音響エコーが推定される際に用いられた参照信号と同じタイミングで４つのスピーカ（２ｃｈ）から出力された参照信号と、当該参照信号と同じタイミングで人により発話された発話信号とをあわせて観測信号として集音し、集音した観測信号を後述する音響エコー除去部２２に出力する。例えば、２ｃｈの参照信号をそれぞれｘ_L、ｘ_R、車内環境ｉにおける各スピーカからマイク１１までの伝達特性をｈ⁽ⁱ⁾ _FL、ｈ⁽ⁱ⁾ _FR、ｈ⁽ⁱ⁾ _RL、ｈ⁽ⁱ⁾ _RR、ドライバの音声をｓとすると、マイク１１により集音される観測信号ｙ⁽ⁱ⁾は、式（１）と定義することができる。ここで、Ｎ⁽ⁱ⁾は、推定される音響エコーであり、式（２）と定義することができる。 The microphone 11 collects various signals output in the vehicle. Specifically, in the above example, the microphone 11 is a reference output from four speakers (2ch) at the same timing as the reference signal used when the acoustic echo is estimated by the acoustic echo estimation unit 21 described later. The signal and an utterance signal uttered by a person at the same timing as the reference signal are collected as an observation signal, and the collected observation signal is output to an acoustic echo removal unit 22 described later. For example, reference signals of 2ch are respectively x _L and x _R , and transmission characteristics from each speaker to the microphone 11 in the vehicle interior i are h ⁽ⁱ⁾ _FL , h ⁽ⁱ⁾ _FR , h ⁽ⁱ⁾ _RL , h ⁽ⁱ⁾ _Assuming that _{RR is} the voice of the driver and s, the observation signal y ⁽ⁱ⁾ collected by the microphone 11 can be defined as equation (1). Here, N ⁽ⁱ⁾ is an estimated acoustic echo and can be defined as Equation (2).

伝達特性生成部１２は、マイク１１により集音された音声信号を示す観測信号から人により発話された発話信号を抽出する音響エコー除去を行う環境内において、想定しうる状況それぞれに対して伝達特性を予め測定し、測定した伝達特性を保持する。上記した例で具体的に説明すると、伝達特性生成部１２は、音声認識を行う４つのスピーカ（FR、RR、FL、RL）が備えられた車内において、想定しうる１２パターンの乗員配置でインパルス応答を測定し、測定したインパルス応答に基づいて伝達特性を予め測定し（参考文献：佐藤、日本音響学会誌５８巻１０号、pp669-676、2002）、測定した伝達特性を後述する伝達特性ＤＢ１３に格納する。なお、この伝達特性生成部１２は、発話された度に毎回実行してもよく、例えば、１０分おきに実行するなど、所定のタイミングで実行することもできる。 The transfer characteristic generation unit 12 transfers transfer characteristics for each possible situation in an environment where acoustic echo removal is performed to extract an utterance signal uttered by a person from an observation signal indicating an audio signal collected by the microphone 11. Is measured in advance, and the measured transfer characteristics are retained. Specifically, in the above example, the transfer characteristic generation unit 12 generates an impulse with an occupant arrangement of 12 patterns that can be assumed in a vehicle equipped with four speakers (FR, RR, FL, RL) that perform voice recognition. The response is measured, and the transfer characteristic is measured in advance based on the measured impulse response (reference: Sato, Journal of the Acoustical Society of Japan, Vol.58, No.58, pp669-676, 2002). To store. The transfer characteristic generation unit 12 may be executed every time an utterance is made, or may be executed at a predetermined timing, for example, every ten minutes.

伝達特性ＤＢ１３は、伝達特性生成部１２により生成された伝達特性を記憶する。上記した例で具体的に説明すると、伝達特性ＤＢ１３は、伝達特性生成部１２により生成されて格納された想定しうる１２パターンの伝達特性を記憶する。 The transfer characteristic DB 13 stores the transfer characteristic generated by the transfer characteristic generation unit 12. More specifically, the transfer characteristic DB 13 stores 12 patterns of transfer characteristics that can be assumed and are generated and stored by the transfer characteristic generation unit 12.

参照信号受付部２０は、環境内に複数チャンネルで出力される音声信号を示す参照信号を、前記複数チャンネルのまま受け付ける。上記した例で具体的に説明すると、参照信号受付部２０は、車の右側に備え付けられたスピーカ（FR、RR）から出力される参照信号（１ｃｈ）と、車の左側に備え付けられたスピーカ（FL、RL）から出力される参照信号（１ｃｈ）との２ｃｈからなる参照信号を、１ｃｈにまとめてモノラル信号にすることなく２ｃｈのステレオ信号のまま受け付ける。そして、参照信号受付部２０は、受け付けた２ｃｈの参照信号を後述する音響エコー推定部２１に出力する。 The reference signal receiving unit 20 receives a reference signal indicating an audio signal output in a plurality of channels in the environment as it is. Specifically, in the above example, the reference signal receiving unit 20 includes a reference signal (1ch) output from a speaker (FR, RR) provided on the right side of the car and a speaker ( The reference signal consisting of 2 channels with the reference signal (1 channel) output from (FL, RL) is accepted as a 2ch stereo signal without consolidating it into 1 channel. Then, the reference signal reception unit 20 outputs the received 2ch reference signal to the acoustic echo estimation unit 21 described later.

音響エコー推定部２１は、参照信号受付部２０により受け付けられた複数チャンネルの参照信号それぞれに対して、保持するすべての伝達特性それぞれを用いて、発話音声以外の音声を示す音響エコーを推定する。上記した例で具体的に説明すると、音響エコー推定部２１は、参照信号受付部２０により受け付けられた左右のスピーカから出力される２ｃｈの参照信号それぞれに対して、伝達特性ＤＢ１３に保持される１２パターンの乗員配置の伝達特性それぞれを取得し、取得した伝達特性を後述する音響エコー除去部２２に入力して発話音声以外の音声を示す１２パターンの音響エコーを推定する。 The acoustic echo estimation unit 21 estimates an acoustic echo indicating speech other than the spoken speech by using all the transfer characteristics held for each of the reference signals of a plurality of channels received by the reference signal reception unit 20. Specifically, the acoustic echo estimation unit 21 is held in the transfer characteristic DB 13 for each of the 2ch reference signals output from the left and right speakers received by the reference signal receiving unit 20. Each of the transfer characteristics of the occupant arrangement of the pattern is acquired, and the acquired transfer characteristics are input to an acoustic echo removing unit 22 to be described later, and 12 patterns of acoustic echo indicating speech other than the spoken speech are estimated.

例えば、推定すべき音響エコーをＮ^’(i)、FL、RLスピーカに対応する伝達特性をｈ^’(i) _L、FR、RRスピーカに対応する伝達特性をｈ^’(i) _Rとした場合、音響エコー推定部２１は、式（３）（ｉ＝１〜１２：１２パターン）として、音響エコーを推定することができる。 For example, when N ^{′ (i)} is the acoustic echo to be estimated, h ^{′ (i)} _L is the transfer characteristic corresponding to the FL, RL speaker, and h ^{′ (i)} _R is the transfer characteristic corresponding to the FR, RR speaker. The acoustic echo estimation unit 21 can estimate the acoustic echo as Expression (3) (i = 1 to 12:12 pattern).

音響エコー除去部２２は、音響エコー推定部２１により推定された音響エコーそれぞれを用いて、音声認識を行うフィルタである。上記した例で具体的に説明すると、音響エコー除去部２２は、参照信号受付部２０により受け付けられた４つのスピーカから出力された参照信号と、当該参照信号が出力されている際に乗員により発話された発話信号とがマイク１１によりまとめて集音された信号である観測信号に対して、音響エコー推定部２１により推定された１２パターンの音響エコーを適用して音響エコー除去を行い、１２パターンの音響エコー除去結果（発話信号抽出結果）を抽出する。そして、音響エコー除去部２２は、抽出した１２パターンの音響エコー除去結果を後述する音響エコー除去結果出力部２３に出力する。例えば、マイク１１により観測信号式（１）が集音され、音響エコー推定部２１により音響エコーとして式（２）が推定され場合、音響エコー除去部２２は、音響エコー除去結果として式（４）に定義された式で算出することができる。 The acoustic echo removal unit 22 is a filter that performs speech recognition using each acoustic echo estimated by the acoustic echo estimation unit 21. Specifically, in the above example, the acoustic echo removing unit 22 utters the reference signals output from the four speakers received by the reference signal receiving unit 20 and the occupant when the reference signals are output. The acoustic echo is removed by applying 12 patterns of acoustic echo estimated by the acoustic echo estimation unit 21 to the observation signal that is a signal collected by the microphone 11 together with the uttered speech signal. The acoustic echo removal result (speech signal extraction result) is extracted. Then, the acoustic echo removal unit 22 outputs the extracted 12 patterns of acoustic echo removal results to the acoustic echo removal result output unit 23 described later. For example, when the observation signal equation (1) is collected by the microphone 11 and the equation (2) is estimated as the acoustic echo by the acoustic echo estimation unit 21, the acoustic echo removal unit 22 obtains the equation (4) as the acoustic echo removal result. It can be calculated by the formula defined in

音響エコー除去結果出力部２３は、音響エコー除去部２２により音響エコー除去された結果から、発話音声の特徴の強さを表す音声尤度が最も高い結果を選択する。上記した例で具体的に説明すると、音響エコー除去結果出力部２３は、音響エコー除去部２２により音響エコー除去された１２パターンの抽出結果それぞれに対して、ＭＦＣＣなどの関数を用いて、音響特徴量を表す音声尤度を算出し、算出した音声尤度が最も高い音響エコー除去結果を選択する。そして、音響エコー除去結果出力部２３は、選択した音声尤度が最も高い結果を音声認識エンジンに出力し、音声認識エンジンは、受信した音響エコー除去結果に基づいて、各種操作を実行する。 The acoustic echo removal result output unit 23 selects a result having the highest speech likelihood representing the strength of the feature of the uttered speech from the result of the acoustic echo removal performed by the acoustic echo removal unit 22. Specifically, in the above example, the acoustic echo removal result output unit 23 uses a function such as MFCC for each of the 12 patterns extracted by the acoustic echo removal unit 22 to obtain acoustic features. A speech likelihood representing the amount is calculated, and an acoustic echo removal result having the highest calculated speech likelihood is selected. Then, the acoustic echo removal result output unit 23 outputs the selected result with the highest speech likelihood to the speech recognition engine, and the speech recognition engine executes various operations based on the received acoustic echo removal result.

例えば、音響エコー除去結果出力部２３は、式（４）の左辺に示す音響エコー除去部２２により音響エコー除去された１２パターンの抽出結果それぞれに対して、ＭＦＣＣを計算する。そして、音響エコー除去結果出力部２３は、計算した音響特徴量であるＭＦＣＣを用いて、あらかじめ用意している音声のＧＭＭ（Gaussian Mixture Model）と比較して、音声尤度を算出する。話者のＭＦＣＣ特徴を「o」とすると、音声尤度P(0)は、式（５）の通り、W個の重みつき正規分布の和として求められる。W個の正規分布のうち「ω」番目の平均「μ_ω」、分散は「μ_σ」である。また、λ_ωは、式（６）となる。 For example, the acoustic echo removal result output unit 23 calculates the MFCC for each of the 12 patterns extracted from the acoustic echo removed by the acoustic echo removal unit 22 shown on the left side of Equation (4). Then, the acoustic echo removal result output unit 23 uses the MFCC, which is the calculated acoustic feature amount, to calculate speech likelihood compared with a speech GMM (Gaussian Mixture Model) prepared in advance. If the MFCC feature of the speaker is “o”, the speech likelihood P (0) is obtained as the sum of W weighted normal distributions as shown in Equation (5). Of the W normal distributions, the “ω” -th average “μ _ω ” and the variance are “μ _σ ”. Also, λ _ω is given by equation (6).

より具体的には、音響エコー除去部２２は、車内環境ｉにて観測された式（７）に示す観測信号ｙ⁽ⁱ⁾に対して、音響エコー推定部２１により推定された１２パターンの音響エコーを用いて、観測信号から音響エコーをキャンセルした１２パターンのキャンセル結果を算出する。そして、音響エコー除去結果出力部２３は、式（５）と式（６）とを用いて、１２パターンのキャンセル結果からＭＦＣＣ特徴量を算出する。ここで、各話者の音響モデルをλ_sとした場合、音響エコー除去結果出力部２３は、式（８）の左辺に示した値を算出する。このときの「ｉ」（例えば、ｉ＝４なら４パターン目）を、音響エコー除去結果出力部２３は、音声尤度が最も高い結果として選択し、選択した音声尤度が最も高い結果（例えば、ｉ＝４なら４パターン目を用いた場合の音響エコー除去結果）を音声認識エンジンに出力する。 More specifically, the acoustic echo removal unit 22 performs 12 patterns of sound estimated by the acoustic echo estimation unit 21 with respect to the observation signal y ⁽ⁱ⁾ represented by the equation (7) observed in the in-vehicle environment i. Using the echo, 12 patterns of cancellation results are calculated by canceling the acoustic echo from the observation signal. Then, the acoustic echo removal result output unit 23 uses the equations (5) and (6) to calculate the MFCC feature amount from the 12 pattern cancellation results. Here, when each speaker's acoustic model is λ _s , the acoustic echo removal result output unit 23 calculates the value shown on the left side of Equation (8). The acoustic echo removal result output unit 23 selects “i” at this time (for example, the fourth pattern if i = 4) as the result with the highest speech likelihood, and the result with the highest selected speech likelihood (for example, If i = 4, the acoustic echo removal result when the fourth pattern is used is output to the speech recognition engine.

［音響エコー除去装置による処理］
次に、図４を用いて、音響エコー除去装置による処理を説明する。図４は、実施例１に係る音響エコー除去装置における音響エコー除去処理の流れを示すフローチャートである。 [Processing by acoustic echo canceller]
Next, processing performed by the acoustic echo removal apparatus will be described with reference to FIG. FIG. 4 is a flowchart illustrating a flow of acoustic echo removal processing in the acoustic echo removal apparatus according to the first embodiment.

図４に示すように、音響エコー除去装置１０は、実車環境において想定しうる状況下で、予め測定した伝達特性をフィルタに適用する（ステップＳ１０１）。具体的に例を挙げると、音響エコー除去装置１０は、実車環境において想定しうる１２パターンの乗員配置で測定した１２パターンの伝達特性をフィルタに適用する。 As shown in FIG. 4, the acoustic echo removal apparatus 10 applies a transfer characteristic measured in advance to a filter under a situation that can be assumed in an actual vehicle environment (step S101). As a specific example, the acoustic echo removal apparatus 10 applies 12 patterns of transfer characteristics measured with 12 patterns of occupant arrangement that can be assumed in an actual vehicle environment to a filter.

そして、音響エコー除去装置１０は、受け付けた２ｃｈの参照信号をそのままフィルタに入力して音響エコーを推定する（ステップＳ１０２とステップＳ１０３）。具体的に例を挙げると、音響エコー除去装置１０は、受け付けた左右のスピーカから出力される２ｃｈの参照信号それぞれに対して、保持する１２パターンの乗員配置の伝達特性それぞれを、音声認識を行うフィルタに入力して発話音声以外の音声を示す１２パターンの音響エコーを推定する。 Then, the acoustic echo removal apparatus 10 inputs the received 2ch reference signal as it is to the filter and estimates acoustic echo (steps S102 and S103). As a specific example, the acoustic echo removal apparatus 10 performs speech recognition on each of the 12-pattern occupant placement transfer characteristics to be held for each of the 2ch reference signals output from the received left and right speakers. Twelve patterns of acoustic echoes indicating voices other than the spoken voice are input to the filter and estimated.

続いて、音響エコー除去装置１０は、推定した音響エコーを観測信号から削除する（ステップＳ１０４）。具体的に例を挙げると、音響エコー除去装置１０は、４つのスピーカから出力された参照信号（音響エコーを推定する際に用いた参照信号と同じタイミングでスピーカから出力された参照信号）と、当該参照信号が出力されている際に乗員により発話された発話信号とがマイクによりまとめて集音された信号である観測信号に対して、推定した１２パターンの音響エコーを適用したフィルタを用いて音響エコー除去を行い、１２パターンの音響エコー除去結果（発話信号抽出結果）を抽出する。 Subsequently, the acoustic echo removal apparatus 10 deletes the estimated acoustic echo from the observation signal (step S104). To give a specific example, the acoustic echo removal apparatus 10 includes reference signals output from four speakers (reference signals output from the speakers at the same timing as the reference signals used when estimating acoustic echoes), and Using an estimated 12-pattern acoustic echo filter applied to an observation signal that is a signal collected by a microphone together with an utterance signal uttered by an occupant when the reference signal is output Acoustic echo removal is performed, and 12 patterns of acoustic echo removal results (speech signal extraction results) are extracted.

そして、音響エコー除去装置１０は、音響エコー除去された結果から、発話音声の特徴の強さを表す音声尤度を算出する（ステップＳ１０５）。具体的に例を挙げると、音響エコー除去装置１０は、音響エコー除去された１２パターンの抽出結果それぞれに対して、ＭＦＣＣなどの関数を用いて、音響特徴量を表す音声尤度を算出する。 And the acoustic echo removal apparatus 10 calculates the speech likelihood showing the strength of the characteristic of speech sound from the result of acoustic echo removal (step S105). As a specific example, the acoustic echo removal apparatus 10 calculates a speech likelihood representing an acoustic feature amount using a function such as MFCC for each of the 12 patterns extracted from the acoustic echo.

その後、音響エコー除去装置１０は、算出した音声尤度が最も高い結果を選択して出力する（ステップＳ１０６）。具体的に例を挙げると、音響エコー除去装置１０は、音響特徴量を表す音声尤度を算出し、算出した音声尤度が最も高い音響エコー除去結果を選択して音声認識エンジンに出力する。 Thereafter, the acoustic echo removal apparatus 10 selects and outputs the result having the highest calculated speech likelihood (step S106). As a specific example, the acoustic echo removal apparatus 10 calculates a speech likelihood representing an acoustic feature, selects an acoustic echo removal result with the highest calculated speech likelihood, and outputs the result to the speech recognition engine.

［実施例１による効果］
このように、実施例１によれば、車内マルチスピーカ環境において、想定しうる伝達特性を予め算出し、２ｃｈの参照信号を独立に受け付け、受け付けた２ｃｈの参照信号と予め算出した伝達特性とから音響エコーを推定し、観測信号から音響エコーをキャンセルすることができる結果、人から発話された発話信号を抽出する音響エコー除去を行う環境が複雑な場合でも、高精度に音響エコーを除去することが可能である。 [Effects of Example 1]
As described above, according to the first embodiment, in an in-vehicle multi-speaker environment, an assumed transfer characteristic is calculated in advance, a 2ch reference signal is received independently, and the received 2ch reference signal and the previously calculated transfer characteristic are used. As a result of estimating the acoustic echo and canceling the acoustic echo from the observed signal, the acoustic echo can be removed with high accuracy even in a complicated environment for removing the acoustic echo that extracts the speech signal uttered by the person. Is possible.

さて、これまで本発明の実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。そこで、以下に示すように、（１）重さによる伝達特性の選択、（２）音声尤度の選択手法、（３）システム構成等、（４）プログラムにそれぞれ区分けして異なる実施例を説明する。 Although the embodiments of the present invention have been described so far, the present invention may be implemented in various different forms other than the embodiments described above. Therefore, as shown below, (1) selection of transfer characteristics by weight, (2) speech likelihood selection method, (3) system configuration, etc. To do.

（１）重さによる伝達特性の選択
例えば、実施例１では、予め測定した１２パターン全ての伝達特性を用いて、音響エコーを推定して音声認識を行う場合について説明したが、本発明はこれに限定されるものではなく、車内のシートに備え付けられた荷重センサにより検出された重さに基づいて乗員配置を推定し、推定した乗員配置の伝達特性を用いて音声認識を行うこともできる。 (1) Selection of transfer characteristics based on weight For example, in the first embodiment, a case has been described in which acoustic echoes are estimated and speech recognition is performed using transfer characteristics of all 12 patterns measured in advance. However, the present invention is not limited thereto, and it is also possible to estimate the occupant arrangement based on the weight detected by the load sensor provided on the seat in the vehicle, and perform voice recognition using the estimated occupant arrangement transfer characteristics.

具体的に例を挙げると、音響エコー除去装置は、図５に示すように、車内のシートそれぞれに備え付けられた荷重センサにより、運転席と後部座席に一人以上の重さあることを検出する。すると、音響エコー除去装置は、１２パターンの乗員配置から、現在の乗員配置として「状況８」と「状況１１」との２パターンを特定する。そして、音響エコー除去装置は、特定した２パターンのみ伝達特性を用いて、２パターンの音響エコーを推定して音声認識を行う。このようにすることで、１２パターン全ての状況を用いて音声認識を行う場合に比べて、処理を高速化することが可能である。なお、図５は、荷重センサを用いて伝達特性を推定する場合の例を示す図である。 Specifically, as shown in FIG. 5, the acoustic echo canceller detects that the driver's seat and the rear seat have one or more weights by means of load sensors provided on the respective seats in the vehicle. Then, the acoustic echo removal apparatus specifies two patterns of “situation 8” and “situation 11” as the current occupant arrangement from the 12 occupant arrangements. Then, the acoustic echo removal apparatus performs speech recognition by estimating the two patterns of acoustic echoes using the transfer characteristics of only the two specified patterns. By doing in this way, it is possible to speed up a process compared with the case where voice recognition is performed using the situation of all 12 patterns. FIG. 5 is a diagram illustrating an example of estimating transfer characteristics using a load sensor.

また、音響エコー除去装置は、音響エコーを推定するパラメータとしては、乗員配置や荷重の他に、気温、走行ノイズ、エアコン音、シート位置などをパラメータとして用いることができ、そうすることで、より正確に伝達特性を測定することができる結果、音声認識を行う環境が複雑な場合でも、より高精度に音響エコー除去を行うことが可能である。 In addition, the acoustic echo removal device can use, as parameters for estimating acoustic echo, parameters such as air temperature, running noise, air conditioner sound, seat position, etc. in addition to occupant placement and load. As a result of accurately measuring the transfer characteristics, it is possible to perform acoustic echo removal with higher accuracy even when the environment for speech recognition is complex.

（２）音声尤度の選択手法
また、実施例１では、予め測定した１２パターン全ての伝達特性を用いて、１２パターンの音響エコーを推定して音響エコー除去を行い、１２パターンの音響エコー除去結果から音声尤度の最も高い結果を選択する場合について説明したが、本発明はこれに限定されるものではない。 (2) Speech likelihood selection method In the first embodiment, 12 patterns of acoustic echoes are estimated by using all 12 patterns of transfer characteristics measured in advance, acoustic echo removal is performed, and 12 patterns of acoustic echo removal are performed. Although the case where the result having the highest speech likelihood is selected from the results has been described, the present invention is not limited to this.

例えば、音響エコー除去装置は、図６に示すように、まず、最小乗員（１名）と最大乗員（５名）とのそれぞれの伝達特性を用いて、音響エコーを推定して音声認識を行って音声尤度を算出する。そして、音響エコー除去装置は、乗員５名の場合よりも乗員１名の場合の方が音声尤度が大きい場合、次に、乗員（２名）と乗員（３名）とのそれぞれの伝達特性を用いて、音響エコーを推定して音声認識を行って音声尤度を算出する。そして、音響エコー除去装置は、乗員１名、乗員２名、乗員３名のそれぞれで最も大きい音声尤度を選択する。なお、図６は、音声尤度選択手法の例を示す図である。 For example, as shown in FIG. 6, the acoustic echo removal apparatus first performs acoustic recognition by estimating acoustic echoes using the transfer characteristics of the minimum occupant (1 person) and the maximum occupant (5 persons). To calculate the speech likelihood. Then, the acoustic echo canceling device has a higher likelihood of speech in the case of one occupant than in the case of five occupants, and then the transfer characteristics of each of the occupants (2 persons) and the occupants (3 persons). Is used to estimate acoustic echo and perform speech recognition to calculate speech likelihood. Then, the acoustic echo canceller selects the largest speech likelihood for each of one occupant, two occupants, and three occupants. FIG. 6 is a diagram illustrating an example of a speech likelihood selection method.

一方、音響エコー除去装置は、乗員１名の場合よりも乗員５名の場合の方が音声尤度が大きい場合、次に、乗員（３名）と乗員（４名）とのそれぞれの伝達特性を用いて、音響エコーを推定して音声認識を行って音声尤度を算出する。そして、音響エコー除去装置は、乗員３名、乗員４名、乗員５名のそれぞれで最も大きい音声尤度を選択する。 On the other hand, when the acoustic likelihood is larger in the case of five occupants than in the case of one occupant, the acoustic echo canceller next transfers the respective transfer characteristics of the occupants (3 persons) and the occupants (4 persons). Is used to estimate acoustic echo and perform speech recognition to calculate speech likelihood. Then, the acoustic echo canceller selects the largest speech likelihood for each of three passengers, four passengers, and five passengers.

このように、伝達特性の差がある状況から随時絞り込んで、実際の乗員配置に近い状況の音響エコーを推定し、音声認識を行って音声尤度を算出することで、１２パターン全てをまとめて処理する場合に比べて、処理負荷を軽減することが可能であるとともに、音響エコー除去を行う環境が複雑な場合でも、より高精度に音響エコー除去を行うことが可能である。 In this way, by narrowing down from the situation where there is a difference in transfer characteristics, estimating acoustic echoes in a situation close to the actual occupant arrangement, performing speech recognition, and calculating the speech likelihood, all 12 patterns are combined. Compared with processing, it is possible to reduce the processing load, and it is possible to perform acoustic echo removal with higher accuracy even when the environment in which acoustic echo removal is performed is complicated.

なお、実施例１や図６に示した手法はあくまで例であり、最大乗員８名や１０名などの場合も、上記した手法と同様に処理することができる。その場合、乗員配置は、１２パターン以上になり、全てのパターンを用いて音響エコーを推定し、音響エコー除去を行って音声尤度を算出することとなる。 Note that the method illustrated in the first embodiment and FIG. 6 is merely an example, and the processing can be performed in the same manner as the above-described method even when the maximum number of passengers is eight or ten. In that case, the occupant arrangement is 12 patterns or more, and acoustic echoes are estimated using all patterns, and acoustic likelihood is calculated by performing acoustic echo removal.

（３）システム構成等
また、本実施例において説明した各処理のうち、自動的におこなわれるものとして説明した処理（例えば、伝達特性測定処理など）の全部または一部を手動的におこなうこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 (3) System configuration etc. In addition, among the processes described in the present embodiment, all or a part of the processes (for example, transfer characteristic measurement process) described as being automatically performed may be manually performed. it can. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合（例えば、音響エコー除去部と音響エコー除去結果出力部とを統合するなど）して構成することができる。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. It can be configured by integrating (for example, integrating an acoustic echo removal unit and an acoustic echo removal result output unit). Further, all or any part of each processing function performed in each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.

（４）プログラム
なお、本実施例で説明した音響エコー削除方法は、あらかじめ用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。このプログラムは、インターネットなどのネットワークを介して配布することができる。また、このプログラムは、ハードディスク、フレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行することもできる。 (4) Program The acoustic echo deletion method described in the present embodiment can be realized by executing a prepared program on a computer such as a personal computer or a workstation. This program can be distributed via a network such as the Internet. The program can also be executed by being recorded on a computer-readable recording medium such as a hard disk, a flexible disk (FD), a CD-ROM, an MO, and a DVD and being read from the recording medium by the computer.

以上のように、本発明に係る音響エコー除去装置、車載装置および音響エコー除去方法は、集音された音声信号を示す観測信号から人により発話された発話信号を抽出する音声認識を行うことに有用であり、特に、人から発話された発話信号を抽出する音響エコー除去を行う環境が複雑な場合でも、高精度に音響エコーを除去することを行うことに適する。 As described above, the acoustic echo removing device, the vehicle-mounted device, and the acoustic echo removing method according to the present invention perform voice recognition that extracts a speech signal uttered by a person from an observation signal indicating a collected speech signal. This is useful, and is particularly suitable for removing acoustic echoes with high accuracy even when the environment for removing acoustic echoes for extracting speech signals uttered by a person is complex.

実施例１に係る音響エコー除去装置の概要と特徴を説明するための図である。It is a figure for demonstrating the outline | summary and the characteristic of the acoustic echo removal apparatus which concern on Example 1. FIG. 実施例１に係る音響エコー除去装置の概要と特徴を説明するための図である。It is a figure for demonstrating the outline | summary and the characteristic of the acoustic echo removal apparatus which concern on Example 1. FIG. 実施例１に係る音響エコー除去装置の構成を示すブロック図である。1 is a block diagram illustrating a configuration of an acoustic echo removal apparatus according to Embodiment 1. FIG. 実施例１に係る音響エコー除去装置における音響エコー除去処理の流れを示すフローチャートである。3 is a flowchart showing a flow of acoustic echo removal processing in the acoustic echo removal apparatus according to Embodiment 1. 荷重センサを用いて伝達特性を推定する場合の例を示す図である。It is a figure which shows the example in the case of estimating a transfer characteristic using a load sensor. 音声尤度選択手法の例を示す図である。It is a figure which shows the example of the speech likelihood selection method. 従来技術を説明するための図である。It is a figure for demonstrating a prior art.

Explanation of symbols

１０音響エコー除去装置
１１マイク
１２伝達特性生成部
１３伝達特性ＤＢ
２０参照信号受付部
２１音響エコー推定部
２２音響エコー除去部
２３音響エコー除去結果出力部 DESCRIPTION OF SYMBOLS 10 Acoustic echo removal apparatus 11 Microphone 12 Transfer characteristic production | generation part 13 Transfer characteristic DB
DESCRIPTION OF SYMBOLS 20 Reference signal reception part 21 Acoustic echo estimation part 22 Acoustic echo removal part 23 Acoustic echo removal result output part

Claims

In the environment where acoustic echo removal is performed to extract the speech signal uttered by a person from the observed signal indicating the collected speech signal, the transfer characteristics are measured in advance for each possible situation and measured transmission. Transfer characteristic holding means for holding the characteristics;
A reference signal receiving means for receiving a reference signal indicating an audio signal output in a plurality of channels in the environment as it is in the plurality of channels;
An acoustic echo for estimating an acoustic echo indicating speech other than the speech signal, using each of the transfer characteristics held by the transfer characteristic holding means for each of the reference signals of the plurality of channels received by the reference signal receiving means. An estimation means;
Using each acoustic echo estimated by the acoustic echo estimation means, acoustic echo removal means for performing the acoustic echo removal,
From the result of acoustic echo removal by the acoustic echo removal means, acoustic echo removal result selection means for selecting the result with the highest speech likelihood representing the strength of the feature of the speech signal;
An acoustic echo removal apparatus comprising:

The acoustic echo deletion result selection means is connected to the acoustic echo removal apparatus and outputs the result having the highest selected speech likelihood to a speech recognition engine that executes various processes using the acoustic echo removal result. The acoustic echo removing apparatus according to claim 1.

In the vehicle that performs acoustic echo removal to extract the utterance signal uttered by the driver's seat from the observation signal indicating the collected voice signal, the transfer characteristic is measured in advance for each possible occupant placement situation. A transfer characteristic holding means for holding the measured transfer characteristic;
A reference signal receiving means for receiving a reference signal indicating an audio signal output in two channels in the vehicle as it is in the two channels;
An acoustic echo for estimating an acoustic echo indicating speech other than the speech signal, using each of the transfer characteristics held by the transfer characteristic holding means for each of the two-channel reference signals received by the reference signal receiving means. An estimation means;
Using each acoustic echo estimated by the acoustic echo estimation means, acoustic echo removal means for performing the acoustic echo removal,
From the result of acoustic echo removal by the acoustic echo removal means, acoustic echo removal result selection means for selecting the result with the highest speech likelihood representing the strength of the feature of the speech signal;
A vehicle-mounted device provided with an acoustic echo removing device.

In addition to the occupant arrangement, the transfer characteristic holding means measures the transfer characteristic in advance for each situation including temperature, running noise, air conditioner sound, and seat position, and holds the measured transfer characteristic. The in-vehicle device according to claim 3.

The acoustic echo estimation means estimates the occupant arrangement based on the weight detected by the load sensor provided on the seat in the vehicle for each of the two-channel reference signals received by the reference signal reception means, The acoustic echo indicating the voice other than the speech signal is estimated using the estimated transfer characteristics of the occupant arrangement among all the transfer characteristics held by the transfer characteristic holding means. In-vehicle device.

In the vehicle that performs acoustic echo removal to extract the utterance signal uttered by the driver's seat from the observation signal indicating the collected voice signal, the transfer characteristic is measured in advance for each possible occupant placement situation. A transfer characteristic holding step for holding the measured transfer characteristic;
A reference signal receiving step for receiving a reference signal indicating an audio signal output in two channels in the environment as it is in the two channels;
An acoustic echo that estimates an acoustic echo indicating a sound other than the speech signal by using each of the transfer characteristics held by the transfer characteristic holding step for each of the two-channel reference signals received by the reference signal receiving step. An estimation process;
Using each acoustic echo estimated by the acoustic echo estimation step, an acoustic echo removal step for performing the acoustic echo removal,
From the result of acoustic echo removal by the acoustic echo removal step, an acoustic echo removal result selection step for selecting a result with the highest speech likelihood representing the strength of the feature of the speech signal;
A method for removing acoustic echo in a vehicle-mounted device, comprising: