JP2007228028A

JP2007228028A - Evaluation method of acoustic echo canceller, and storage medium for use therein

Info

Publication number: JP2007228028A
Application number: JP2006043779A
Authority: JP
Inventors: Naoto Yamashita; 直人山下
Original assignee: NEC Electronics Corp
Current assignee: NEC Electronics Corp
Priority date: 2006-02-21
Filing date: 2006-02-21
Publication date: 2007-09-06

Abstract

<P>PROBLEM TO BE SOLVED: To provide a convenient acoustic echo canceller (AEC) evaluation method not requiring a special device, and to provide a medium for use therein. <P>SOLUTION: The acoustic echo canceller is evaluated by a step of preparing near end data, remote end data, and synthetic voice data compounding near end data and echo voice data obtained from the remote end data through a predetermined acoustic system as temporally synchronized data. Consequently, a convenient AEC evaluation method not requiring a special device for evaluation and a medium for use therein are obtained. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、音響エコーキャンセラを評価する方法、及びそれに用いる記憶媒体に関する。 The present invention relates to a method for evaluating an acoustic echo canceller and a storage medium used therefor.

音響エコーキャンセラは携帯電話のスピーカ・マイクロフォン間に発生するエコー除去に使われており、特にハンズフリーシステムを搭載する電話機には必須のフィルタである。近年、携帯電話の普及とともに音響エコーキャンセラの需要が増えている。 The acoustic echo canceller is used to remove echo generated between a speaker and a microphone of a mobile phone, and is an indispensable filter particularly for a telephone equipped with a hands-free system. In recent years, the demand for acoustic echo cancellers has increased with the spread of mobile phones.

ここで、一般的な音響エコーキャンセラ（以下ＡＥＣとも略称する）の機能について説明する。ＡＥＣは、図１の機能説明図において、携帯電話などの電話機10に備えられている。通常、音響エコーキャンセラが働くシーケンスは以下の５段階である。第１段階：電話回路網より、遠端話者の音声1（黒塗り矢印）が電話機に入力され、音響エコーキャンセラに入力される。第２段階：遠端話者の音声は音響エコーキャンセラの内部にコピーされ、スピーカへ出力される。データの保持に使えるバッファ容量は限られているため、通常、古いデータから順に廃棄する構成になっている。 Here, the function of a general acoustic echo canceller (hereinafter also referred to as AEC) will be described. The AEC is provided in the telephone 10 such as a cellular phone in the functional explanatory diagram of FIG. Usually, the sequence in which the acoustic echo canceller operates is the following five stages. First stage: Far-end speaker's voice 1 (black arrow) is input to the telephone from the telephone network and input to the acoustic echo canceller. Second stage: The far-end speaker's voice is copied into the acoustic echo canceller and output to the speaker. Since the buffer capacity that can be used to hold data is limited, the old data is usually discarded first.

第３段階：スピーカから出力された遠端話者の音声がエコーを起こしてマイクロフォンに流入し、近端話者の音声（白抜き矢印）と混合される。遠端話者の音声はエコーによって振幅が減衰し、時間的な遅延が生ずる。これら減衰／遅延の度合いは、近端側室内音響系20（例えば室内／屋外／電車内など）の条件次第で大きく変化する。 Third stage: The far-end speaker's voice output from the speaker echoes and flows into the microphone, and is mixed with the near-end talker's voice (open arrow). The far-end speaker's voice is attenuated in amplitude by echoes, causing a time delay. The degree of attenuation / delay varies greatly depending on the conditions of the near-end side room acoustic system 20 (for example, indoor / outdoor / in a train).

第４段階：近端話者の音声と遠端話者の音声２が混合した状態で音響エコーキャンセラへ入力される。音響エコーキャンセラは上記第２段階によってコピーされた遠端話者の音声をもとに遠端話者の音声２を推測し、混合音声から減算することで、近端話者の音声のみを抽出する。第５段階：抽出された近端話者の音声を、電話回路網に向けて出力する。 Fourth stage: Near-end speaker's voice and far-end speaker's voice 2 are mixed and input to the acoustic echo canceller. The acoustic echo canceller estimates the far-end talker's voice 2 based on the far-end talker's voice copied in the second step, and subtracts it from the mixed voice to extract only the near-end talker's voice. To do. Fifth stage: The extracted near-end speaker's voice is output to the telephone network.

音響エコーキャンセラの性能は、上記第４段階における「遠端話者の音声の推測」が適切に行われるか否かに依存している。そのため、音響エコーキャンセラを実装する場合は、その性能の評価が不可欠となる。 The performance of the acoustic echo canceller depends on whether or not the “estimation of far-end speaker's speech” in the fourth stage is appropriately performed. Therefore, when implementing an acoustic echo canceller, it is essential to evaluate its performance.

また、例えば自動車の中などで携帯電話を用いる場合などでは、音響エコーキャンセラの性能が低いと通話品質が低下するなどの問題もあり、携帯電話などの携帯型電子機器の開発の期間短縮などにより、音響エコーキャンセラも迅速な開発が求められていた。 Also, for example, when using a mobile phone in a car, etc., there is a problem that the quality of the acoustic echo canceller is low, so that the call quality is lowered. The acoustic echo canceller was also required to be developed quickly.

しかし、音響エコーキャンセラは、前述のように実装して個別に性能評価をする必要があった。従来、これは個別に行っていたので、評価に時間がかかるなどの問題点があった。この課題を解決すべく特許文献１は、評価用音声データとして音声をデータベースに保存しておき、評価のたびに音声生成シミュレータがそれを用いることによって再現性を高めるようにしている。 However, the acoustic echo canceller has to be mounted as described above and individually evaluated for performance. Conventionally, since this has been performed individually, there has been a problem that it takes time for evaluation. In order to solve this problem, Patent Document 1 stores a voice as a voice data for evaluation in a database, and a voice generation simulator uses the voice for each evaluation to improve reproducibility.

特許文献１に開示された従来技術を図２のブロック図を参照してその概要を述べると、ハンズフリー装置のチューニングにかかる時間を削減するため、評価用音声をあらかじめデータベースに保管しておき、繰り返し使用することでハンズフリー装置の評価の再現性を高める。この目的のため、近端音声のデータベース、遠端音声のデータベース、および室内音響系（図１の近端側室内音響系２０に相当）の特性を表すデータベースの３つのデータベースを用意しており、これらを用いて音響エコーキャンセラ（ＡＥＣ）の評価を行うことで、ハンズフリー装置のチューニング効率の向上を図ろうとするものである。 The outline of the prior art disclosed in Patent Document 1 will be described with reference to the block diagram of FIG. 2. In order to reduce the time required for tuning the hands-free device, the evaluation voice is stored in the database in advance. Repeated use improves the reproducibility of evaluation of hands-free devices. For this purpose, three databases are prepared: a near-end speech database, a far-end speech database, and a database representing the characteristics of the room acoustic system (equivalent to the near-end side room acoustic system 20 in FIG. 1). By using these to evaluate an acoustic echo canceller (AEC), it is intended to improve the tuning efficiency of the hands-free device.

特開２００４−２８１４１９号公報JP 2004-281419 A

音響エコーキャンセラの評価では、前述のように近端および遠端の音声（人の声、周りの音など）をそれぞれ入力する必要がある。しかし、従来技術では、近端／遠端音声データを別々のデータとして管理しており、再現性を高める為には、それぞれを同期して入力させる特別な装置を必要とする。 In the evaluation of the acoustic echo canceller, it is necessary to input near-end and far-end sounds (human voice, surrounding sounds, etc.) as described above. However, in the prior art, the near-end / far-end audio data is managed as separate data, and in order to improve reproducibility, a special device that inputs them in synchronization is required.

上記の理由に関してもう少し詳しく述べる。特許文献１に開示された従来技術、つまり図２の遠端側音源データベースでは、遠端側として発生する音声がデータベースで保管されている。これを電話回路網シミュレータに通すことにより遠端音声データとして生成され、評価対象である音響エコーキャンセラに入力される。近端側室内音響データベースは近端側の室内音響系（電車の中、部屋の中など）ごとの音響の特性をデータベース化して保管したものであり、これを近端側室内シミュレータに通すことにより、遠端音声のエコーを再現する。これを近端側音源データベースの音声と加算して音響エコーキャンセラへ入力し、動作させる。 A little more detail on the above reasons. In the prior art disclosed in Patent Document 1, that is, the far-end sound source database of FIG. 2, the voice generated as the far-end side is stored in the database. By passing it through a telephone network simulator, it is generated as far-end speech data and input to an acoustic echo canceller that is an evaluation target. The near-end room acoustic database is a database of acoustic characteristics for each near-end room acoustic system (in a train, in a room, etc.), which is stored in a near-end room simulator. Reproduce the echo of the far-end voice. This is added to the sound of the near-end sound source database and input to the acoustic echo canceller to be operated.

この従来技術は、遠端側音源、近端側音源、および近端側室内音響系特性の３つをデータベースに保管し、遠端側（回路網側）・近端側ともに専用のシミュレータを介して音声を再現して音響エコーキャンセラを動作させることにより、評価用音声データを常に同一の質で用意することができるものである。 In this prior art, the far-end sound source, the near-end sound source, and the near-end room acoustic system characteristics are stored in a database, and both the far-end side (circuit side) and the near-end side go through a dedicated simulator. By reproducing the voice and operating the acoustic echo canceller, the voice data for evaluation can always be prepared with the same quality.

しかしながら、これを実現するには、遠端側音声と近端側音声の、音響エコーキャンセラへ入力するタイミングを揃える必要がある。前述の従来技術にはこの辺は具体的に開示されていないが、入力用の近端／遠端音声データのタイミングの同期を取って音響エコーキャンセラに入力するには特別な装置が必要になる、という課題が生じる。 However, in order to realize this, it is necessary to align the timings of inputting the far-end side sound and the near-end side sound to the acoustic echo canceller. Although this side is not specifically disclosed in the above-described prior art, a special device is required to synchronize the timing of input near-end / far-end audio data and input it to the acoustic echo canceller. The problem arises.

本発明は、評価のために特別な装置が必要であるという従来技術の課題を解決した簡便なＡＥＣ評価方法、及びそれに用いる媒体を提供しようとするものである。なお、本発明のその他の効果は後述する。 The present invention intends to provide a simple AEC evaluation method that solves the problem of the prior art that a special apparatus is required for evaluation, and a medium used therefor. Other effects of the present invention will be described later.

本発明の音響エコーキャンセラ評価方法は、近端音声データと、遠端音声データと、前記近端音声データと、前記遠端音声データを所定の音響系を介して得た反響音声データとを合成した合成音声データと、を時間的に同期したデータとして用意するステップと、前記遠端音声データと、前記合成音声データとを前記音響エコーキャンセラへ入力し、前記合成音声データをエコーキャンセル処理させた処理後音声データを出力させるステップと、を有することを特徴とする。 The acoustic echo canceller evaluation method of the present invention synthesizes near-end speech data, far-end speech data, the near-end speech data, and reverberation speech data obtained from the far-end speech data via a predetermined acoustic system. Preparing the synthesized voice data as time-synchronized data, inputting the far-end voice data and the synthesized voice data to the acoustic echo canceller, and performing echo cancellation processing on the synthesized voice data And outputting the processed audio data.

つまり、本発明は、近端音声データと、遠端音声データと、前記合成音声データと、を時間的に同期したデータとして用意したものを使用することにより、入力用の近端／遠端音声データ等のタイミングの同期を取って音響エコーキャンセラに入力するための特別な装置が必要になる、という前述の課題を解決するものである。 In other words, the present invention uses near-end voice data, far-end voice data, and synthesized voice data prepared as time-synchronized data, so that the near-end / far-end voice for input is used. The present invention solves the above-described problem that a special device is required to synchronize the timing of data or the like and input it to an acoustic echo canceller.

本発明によれば、評価のために特別な装置を必要としない簡便なＡＥＣ評価方法、及びそれに用いる記憶媒体が得られる。 According to the present invention, a simple AEC evaluation method that does not require a special device for evaluation and a storage medium used therefor can be obtained.

本発明の第１の実施形態の要旨を図３を用いて説明する。図３は、本発明の第１の実施形態で用いる音声データのファイルの概要と、ＡＥＣ１０への入力、出力との関係を示している。入力データ１は、共にデジタル化されたデータとして遠端音声データ、反響音声データ、近端音声データとを有している。また、これら音声データは、サンプリング定理に基づき、所定のレートでサンプリングされたものである。ここで、反響音声データとは、遠端音声データに近端側室内音響系特性を加味したものである。なお、入力データ１は、一つのファイル形式になっている。ファイル形式のデータとはハードディスクやフロッピー（登録商標）ディスク、CD-ROMなどの記憶媒体に記録されたデータのまとまりをいう。本発明では記憶媒体はその種類を問わず用いることができる。ここで一つのファイル形式とは、単一のファイル形式という意味であり、複数の番号（名称）を持つファイルに入っていても良い。なお、番号（名称）が同一の一つのファイルに入っていれば、そこに記録・保存されたデータ相互の管理がより簡便にできるというメリットはあるので、ファイル名が同一の、通常いう一ファイルにデータを記録・保存しても良いのは勿論である。 The gist of the first embodiment of the present invention will be described with reference to FIG. FIG. 3 shows the outline of the audio data file used in the first embodiment of the present invention and the relationship between the input and output to the AEC 10. The input data 1 includes far-end voice data, reverberant voice data, and near-end voice data as data digitized together. These audio data are sampled at a predetermined rate based on the sampling theorem. Here, the echo sound data is obtained by adding the near-end side room acoustic system characteristics to the far-end sound data. Note that the input data 1 is in one file format. File format data refers to a collection of data recorded on a storage medium such as a hard disk, floppy (registered trademark) disk, or CD-ROM. In the present invention, any type of storage medium can be used. Here, one file format means a single file format and may be included in a file having a plurality of numbers (names). If there is a single file with the same number (name), there is a merit that the data recorded and saved there can be managed more easily. Therefore, one normal file with the same file name. Of course, the data may be recorded and stored in the memory.

ここで、近端側室内音響系特性とは、遠端音声データを聞く話者のいる環境、例えば自動車の中とか、オフイス環境とか、体育館などの環境の音響系特性をいう。従って、室内音響系特性という場合の室内とは一般にいう室内とは異なり、所定の空間内ともいうべきものである。従って、厳密に言うと単に所定の音響系特性というべきものであるが、従来の慣用に従い室内音響系特性という。 Here, the near-end room acoustic system characteristic means an acoustic system characteristic of an environment where a speaker who listens to far-end voice data is present, for example, in an automobile, an office environment, or a gymnasium. Therefore, in the case of room acoustic system characteristics, the room is different from the room generally called and should also be called a predetermined space. Therefore, strictly speaking, it should be simply a predetermined acoustic system characteristic, but it is called a room acoustic system characteristic in accordance with conventional usage.

図３によれば、近端音声データと、反響音声データとを合成ポイント１１で合成して合成音声データとしてＡＥＣ１０に入れる。なお、近端音声データ、反響音声データはデジタルデータであるので、後述するように合成する際には、図示しないＤＡコンバータ等を用いてアナログデータとし、それを物理的に合成した後に図示しないＡＤコンバータ等を用いてデジタルデータとして、ＡＥＣ１０の入力端子に入れることとなる。 According to FIG. 3, the near-end voice data and the echo voice data are synthesized at the synthesis point 11 and put into the AEC 10 as synthesized voice data. Since near-end voice data and reverberation voice data are digital data, when synthesizing as will be described later, analog data is used by using a DA converter (not shown) or the like, and after being physically synthesized, AD (not shown) Digital data is input to the input terminal of the AEC 10 using a converter or the like.

ＡＥＣ１０では、他の入力端子から得たデジタルの遠端音声データ、つまり遠端話者の音声データをもとに、合成音声データから減算することで、近端話者の音声のみを抽出して、処理後出力として出力する。出力データファイルとして、この処理後出力データ、つまりＡＥＣから出力された想定近端音声データと処理前データが得られる。ＡＥＣの評価は、この処理後出力データと現実の近端音声データとを比較して判断される。また、この際には、処理前出力データもときに参照される。なお、具体的な実施形態は後述する In AEC10, only the voice of the near-end speaker is extracted by subtracting from the synthesized voice data based on the digital far-end voice data obtained from other input terminals, that is, the voice data of the far-end speaker. Output as post-processing output. As the output data file, post-processing output data, that is, assumed near-end audio data and pre-processing data output from the AEC are obtained. The AEC is evaluated by comparing the post-processing output data with the actual near-end voice data. At this time, output data before processing is also sometimes referred to. Specific embodiments will be described later.

表１を用いてこの入力データファイルのデータ構造をもう少し詳しく説明する。表１のデータファイルは、次の３つの種類のデータを持つ、遠端音声データ、反響音声データ、近端音声データである。ここで、反響音声データは、遠端音声データに特定の室内音響系特性をミックスして得たものである。反響音声データを得るのは実測及びシミュレーションのいずれで行っても良いが実測がより好ましいのはいうまでもない。なお、前述のようにいずれもデジタルデータである。 The data structure of this input data file will be described in a little more detail using Table 1. The data files in Table 1 are far-end voice data, reverberant voice data, and near-end voice data having the following three types of data. Here, the echo sound data is obtained by mixing the far-end sound data with specific room acoustic system characteristics. The echo sound data may be obtained by either actual measurement or simulation, but it is needless to say that actual measurement is more preferable. As described above, all are digital data.

表１の反響音声データをより詳説すると、特定の遠端音声データＡ１(t)に対して、種々の室内音響系特性を有する場で所定の時間に渡って測定したデータをデジタル変換したものである。例えばファイル１は、普通の乗用自動車、いわゆる普通車の中、ファイル２は、大型バスの中、ファイル３は、列車の中などというものである。この場合ｆ１(t)は、普通車の室内音響系を示し、Ａ１・ｆ１(t)は、普通車の室内音響系の影響を受けた反響音声データであることを示す。また、ｆ２(t)は、大型バスの室内音響系を示し、Ａ１・ｆ２(t)は、大型バスの室内音響系の影響を受けた反響音声データであることを示す。以下、同様であり、種々の室内音響系で測定したもののデータである。 In more detail, the reverberant sound data in Table 1 is obtained by digitally converting data measured over a predetermined time in a field having various room acoustic system characteristics with respect to specific far-end sound data A1 (t). is there. For example, file 1 is an ordinary passenger car, a so-called ordinary car, file 2 is a large bus, file 3 is a train, and so on. In this case, f1 (t) indicates a room acoustic system of a normal vehicle, and A1 · f1 (t) indicates reverberation sound data affected by the room acoustic system of a normal vehicle. Further, f2 (t) indicates the indoor acoustic system of the large bus, and A1 · f2 (t) indicates reverberant sound data that is affected by the indoor acoustic system of the large bus. Hereinafter, it is the same, and it is the data of what was measured by various room acoustic systems.

なお、近端音声データもこの場合特定の一のデータを用いている。遠端音声データ、近端音声データを変えていないのは、この入力ファイルでは、ＡＥＣの評価として、種々の室内音響系の変化でどうなるか、という観点で見たいためである。勿論、後述するようにファイルの構成は種々変えて得ることもできる。 Note that near-end voice data also uses specific data in this case. The reason why the far-end voice data and the near-end voice data are not changed is that this input file is to be viewed from the viewpoint of what happens with various changes in the room acoustic system as an AEC evaluation. Of course, as will be described later, the file structure can be changed in various ways.

本発明の第１の実施形態に用いるデータを説明する表である。

It is a table | surface explaining the data used for the 1st Embodiment of this invention.

図６は、この入力データを用いて、具体的にＡＥＣを評価する際の一例を示す例である。ここで入力データファイルは、各音声等をデジタル変換されて記録されているものである。入力データは図示しない入力端末を介してパーソナルコンピュータ（以下ＰＣという）２０の記憶装置２２に記憶される。そのＰＣ２０の図示しない出力端子から、遠端音声データ（ＩＤ１）、反響音声データ（ＩＤ２）、近端音声データ（ＩＤ３）が出力される。 FIG. 6 is an example showing an example of specifically evaluating AEC using this input data. Here, the input data file is recorded by digitally converting each sound or the like. The input data is stored in a storage device 22 of a personal computer (hereinafter referred to as PC) 20 via an input terminal (not shown). Far-end voice data (ID1), echo voice data (ID2), and near-end voice data (ID3) are output from an output terminal (not shown) of the PC 20.

このうち、遠端音声データ（ＩＤ１）は、そのままＡＥＣ１０に入力される。反響音声データ（ＩＤ２）、近端音声データ（ＩＤ３）は、デジタルアナログコンバータ（ＤＡＣ）１２を介して、合成器１３のＲとＬの端子へそれぞれ入る。合成器は例えばステレオスピーカのＲとＬの端子を用い、出力側を共通にしておくことで、アナログ化されたデータを電圧合成して、一つのアナログデータとして出力し、アナログデジタルコンバータ（ＡＤＣ）１４を介してこれらを合成した合成音声データ（ＩＤ４）としてＡＥＣ１０へ入力する。 Among these, the far-end voice data (ID1) is input to the AEC 10 as it is. The echo sound data (ID 2) and the near-end sound data (ID 3) enter the R and L terminals of the synthesizer 13 via the digital-analog converter (DAC) 12, respectively. The synthesizer uses, for example, R and L terminals of a stereo speaker, and the output side is made common so that the analogized data is voltage-synthesized and output as one analog data, and an analog-digital converter (ADC) 14 is input to the AEC 10 as synthesized voice data (ID4) obtained by synthesizing them.

なお、ここで、合成器１３の一例としてステレオスピーカを紹介したのは、単に入力端子が２つ有り、これらの音声を合成するのに都合が良いからであって、基本的には何でも良い。つまり、２つのデータをＤＡ変換して、アナログデータとした後、電気的に同一の端子に入れて一つのアナログデータとした後、再度ＡＤＣ１４を通してＡＥＣ１０に加えれば良いので、合成器は入力側に２端子を有してそれを１端子の出力として取り出すことができれば良い。ここで、２つの入力データを合成するイメージ図を図５に示す。図５に示すように加算点でデータが加算されることが分かる。また、図３の合成ポイント１１は、図５の加算点、図６の合成器１３の機能を果すものである。 Here, the stereo speaker is introduced as an example of the synthesizer 13 because it has only two input terminals and is convenient for synthesizing these sounds. Basically, anything may be used. That is, after the two data are DA-converted to analog data, they can be electrically inserted into the same terminal to make one analog data, and then added again to the AEC 10 through the ADC 14. It suffices if it has two terminals and can take it out as an output of one terminal. Here, FIG. 5 shows an image diagram of combining two input data. As can be seen from FIG. 5, data is added at the addition point. 3 performs the function of the addition point in FIG. 5 and the combiner 13 in FIG.

この結果、ＡＥＣ１０から出力データとして処理後音声データ（ＯＤ１）が得られる。この処理後音声データは、ＡＥＣ１０が理想的な特性を示した場合には最初の近端音声データ（ＩＤ３）のみを出力するはずである。従ってこの１Ｄ３と比較すれば良い。なお、比較すべき近端音声データ（ＩＤ３）が見えないが、これは、入力ファイルにあるデジタルデータを用いれば良い。具体的にはＰＣ２０における音響エコーキャンセラ評価環境の設定により、音響エコーキャンセラ処理を行う前の状態のデータを保管しておき、処理されたデータと同時に出力することで、出力ファイルを得ることができる。例えばＡＥＣ１０には、通常、データを同期させるプログラムが組み込まれているので、これを用いても良い。より具体的には音響エコーキャンセラ評価の際には図示しないDSP評価ボードを用い、DSP評価ボードを制御するエミュレータソフトがインストールされたPCで制御することとしても良い。 As a result, processed audio data (OD1) is obtained from the AEC 10 as output data. This processed audio data should output only the first near-end audio data (ID3) when the AEC 10 exhibits ideal characteristics. Therefore, it may be compared with this 1D3. Although the near-end voice data (ID3) to be compared is not visible, digital data in the input file may be used for this. Specifically, by setting the acoustic echo canceller evaluation environment in the PC 20, data in a state before performing the acoustic echo canceller processing is stored, and an output file can be obtained by outputting simultaneously with the processed data. . For example, since the AEC 10 normally incorporates a program for synchronizing data, it may be used. More specifically, when evaluating the acoustic echo canceller, a DSP evaluation board (not shown) may be used and controlled by a PC in which emulator software for controlling the DSP evaluation board is installed.

また、ＡＥＣ１０の評価の一手段として、処理前の合成音声データと、処理後の音声データを比較することも考えられるので、これも図示しない入力端子を介してＰＣ２０に戻している。ＰＣ２０の中でこれらの比較データのファイルを作ることにより、デジタルデータとしてのＡＥＣ評価データファイルを得ることができる。 Further, as one means for evaluating the AEC 10, it is conceivable to compare the synthesized voice data before processing with the voice data after processing, and this is also returned to the PC 20 via an input terminal (not shown). By making these comparison data files in the PC 20, an AEC evaluation data file as digital data can be obtained.

以上のように実施形態１では、簡単な構成を用いて簡便にＡＥＣの評価を行うことができる。従って、従来技術の場合には、入力用の近端／遠端音声データのタイミングの同期を取って音響エコーキャンセラに入力するには特別な装置が必要になり、この特別な装置は評価を実行するシミュレータまたはエミュレータなどのソフトウェアなどで実現することになるため、これにより評価環境が複雑となるという問題もあったが、これらも解決している。 As described above, in Embodiment 1, AEC can be easily evaluated using a simple configuration. Therefore, in the case of the prior art, a special device is required to synchronize the timing of the input near-end / far-end audio data and input it to the acoustic echo canceller, and this special device performs evaluation. Since this is realized by software such as a simulator or an emulator, this causes a problem that the evaluation environment becomes complicated, but these are also solved.

次に本発明の第２の実施形態について、図４、図７、表２などを用いて説明する。第２の実施形態は、簡単にいうと、予め、合成音声データを第１の実施形態で用いたＰＣ２０、ＤＡＣ１２、ステレオスピーカ１３、ＡＤＣ１４などを用いて得ておき、これを図４のようにデジタルデータファイルにしておけば、種々のＡＥＣに対して簡単に評価することができるというものである。 Next, a second embodiment of the present invention will be described with reference to FIG. 4, FIG. 7, Table 2, and the like. In the second embodiment, simply speaking, the synthesized voice data is obtained in advance using the PC 20, DAC 12, stereo speaker 13, ADC 14, etc. used in the first embodiment, and this is obtained as shown in FIG. If it is a digital data file, it can be easily evaluated for various AECs.

説明の重複をさけるために実施形態１と異なる点を中心に説明すると、前述のように第２の実施形態では、前述のように反響音声データに代えて合成音声データを入力ファイルとして用い、合成作業をしない点が異なる。このデータファイルの構造は、表２に示す。従って、図４では、合成音声データをそのままＡＥＣ１０に入力する。また、図７では、このため、合成器１２やＤＡＣ１１、ＡＤＣ１３は必要ない。従って、より簡便に種々のＡＥＣの評価を行うことができるという効果を奏する。 In order to avoid duplication of explanation, the description will focus on the points different from the first embodiment. As described above, in the second embodiment, synthesized voice data is used as an input file instead of reverberant voice data as described above. The difference is that no work is done. The structure of this data file is shown in Table 2. Therefore, in FIG. 4, the synthesized speech data is input to the AEC 10 as it is. Further, in FIG. 7, the synthesizer 12, the DAC 11, and the ADC 13 are not necessary. Therefore, there is an effect that various AECs can be evaluated more easily.

本発明の第２の実施形態に用いるデータを説明する表である。

It is a table | surface explaining the data used for the 2nd Embodiment of this invention.

なお、前述のように音声データは、サンプリング定理に基づき、所定のレートでサンプリングされたディジタルデータを用意する。反響音声データを用意するには、ホワイトノイズ等の場合は一般の波形編集ツールの音声生成機能などを用いて作成する。また実在の音声（人の声など）を使用する場合は、録音したアナログ信号を所定のサンプリングレートでサンプリングしたものを用いる。 As described above, as the audio data, digital data sampled at a predetermined rate is prepared based on the sampling theorem. In order to prepare reverberant sound data, in the case of white noise or the like, it is created using the sound generation function of a general waveform editing tool. In addition, when using actual voice (such as a human voice), a recorded analog signal sampled at a predetermined sampling rate is used.

また、ＡＥＣの処理後出力と、入力ファイルの近端音声データの比較は、ステレオデータの左右を同期させて出力する装置は一般的に普及しているので、特別な装置を用意しなくても同期して出力することができる。波形編集ツールでステレオ表示（L側波形とR側波形を別々に表示）させれば、比較・確認作業が行いやすい。 In addition, for comparison between the output after AEC processing and the near-end audio data of the input file, devices that output stereo data in synchronization with the left and right of stereo data are generally widespread, so there is no need to prepare a special device. Synchronous output is possible. If you use the waveform editing tool to display in stereo (L side waveform and R side waveform are displayed separately), it is easy to compare and check.

なお、実施形態１や２において、そのデータは室内音響系を種々に１回変えた実測データ（反響音響データ）を用いたが、１回の実測で得た反響音声データと近端音声データとの時間の同期を種々に変えることで、異なる室内音響系をシミュレートすることも可能である。また、音響エコーキャンセラの評価では、近端音声と遠端音声のタイミングを微妙にずらした音声データのセット（近端・遠端音声）を複数用いる。この場合、従来技術ではタイミングのバリエーションを評価データ自体に含めることができず、評価用データが近端音声データ/遠端音声データ/タイミングデータに別れる為、管理が複雑になるという課題も生じていたが、本発明ではこの課題も解決されている。 In the first and second embodiments, the measurement data (echo acoustic data) obtained by changing the room acoustic system in various ways is used as the data. However, the echo sound data and the near-end sound data obtained by one measurement are used. It is also possible to simulate different room acoustic systems by varying the time synchronization of. In the evaluation of the acoustic echo canceller, a plurality of sets of audio data (near-end / far-end audio) in which the timings of the near-end audio and the far-end audio are slightly shifted are used. In this case, in the prior art, timing variations cannot be included in the evaluation data itself, and the evaluation data is divided into near-end voice data / far-end voice data / timing data. However, this problem is also solved in the present invention.

音響エコーキャンセラ（ＡＥＣ）の機能を説明する図である。It is a figure explaining the function of an acoustic echo canceller (AEC). 従来技術の図である。It is a figure of a prior art. 本発明の第１の実施形態の要旨を説明する説明図である。It is explanatory drawing explaining the summary of the 1st Embodiment of this invention. 本発明の第２の実施形態の要旨を説明する説明図である。It is explanatory drawing explaining the summary of the 2nd Embodiment of this invention. 音声合成を説明する説明図である。It is explanatory drawing explaining a voice synthesis | combination. 本発明の第１の実施形態のブロック図である。It is a block diagram of a 1st embodiment of the present invention. 本発明の第２の実施形態のブロック図である。It is a block diagram of the 2nd Embodiment of this invention.

Explanation of symbols

１入力データ
２出力データ
１０音響エコーキャンセラ（ＡＥＣ）
１１合成ポイント
１２ＤＡＣ
１３合成器
１４ＡＤＣ
２０ＰＣ
２１ＣＰＵ
２２記憶装置
２３表示装置 1 Input data 2 Output data 10 Acoustic echo canceller (AEC)
11 synthesis point 12 DAC
13 Synthesizer 14 ADC
20 PC
21 CPU
22 Storage device 23 Display device

Claims

An evaluation method for an acoustic echo canceller,
Near-end audio data and
Far-end voice data,
Synthesized speech data obtained by synthesizing the near-end speech data and the reverberation speech data obtained from the far-end speech data via a predetermined acoustic system;
Preparing as time-synchronized data,
Input the far-end voice data and the synthesized voice data to the acoustic echo canceller,
Outputting post-processing audio data obtained by performing echo cancellation processing on the synthesized audio data;
A method for evaluating an acoustic echo canceller, comprising:

An acoustic echo canceller evaluation method according to claim 1,
The step of preparing the synthesized speech data obtained by synthesizing the near-end speech data and the reverberation speech data obtained from the far-end speech data via a predetermined acoustic system as time-synchronized data,
Data represented in one file format in which the near-end voice data and the synthesized voice data obtained by synthesizing the far-end voice data via reverberant voice data via a predetermined acoustic system are temporally synchronized. The method for evaluating an acoustic echo canceller according to claim 1, wherein

An acoustic echo canceller evaluation method according to claim 1,
The step of preparing the synthesized speech data obtained by synthesizing the near-end speech data and the reverberation speech data obtained from the far-end speech data via a predetermined acoustic system as time-synchronized data,
Preparing the near-end audio data, the far-end audio data, and the reverberant audio data as data represented in one file format that is temporally synchronized;
The method for evaluating an acoustic echo canceller according to claim 1, further comprising a step of synthesizing the near-end speech data and the reverberation speech data into synthesized speech data.

4. The echo sound data according to claim 1, wherein the echo sound data is generated by performing data processing so as to cause a time delay of the predetermined room acoustic system when the echo sound data is generated. The method for evaluating an acoustic echo canceller according to any one of claims 1 to 3.

5. The predetermined acoustic system according to claim 1, wherein the predetermined acoustic system is prepared as a plurality of pieces of data obtained by editing a time synchronization relationship with the near-end audio data, the far-end audio data, and the reverberant audio data according to a predetermined rule. 5. The acoustic echo canceller evaluation method according to claim 1, further comprising:

A storage medium for evaluation recording evaluation data used in an evaluation method of an acoustic echo canceller,
Near-end audio data and
Far-end voice data,
Synthesized speech data obtained by synthesizing the near-end speech data and the reverberation speech data obtained from the far-end speech data via a predetermined acoustic system;
Is recorded in the storage medium as the evaluation data synchronized in time,
The far-end voice data;
The synthesized speech data;
Is input to the acoustic echo canceller to evaluate the performance of the acoustic echo canceller.

A storage medium for evaluation recording evaluation data used in an evaluation method of an acoustic echo canceller,
Near-end audio data and
Far-end voice data,
The far-end voice data is recorded in the storage medium as the evaluation data synchronized in time with the reverberant far-end voice data obtained through the predetermined acoustic system,
The far-end voice data;
Synthesized voice data obtained by synthesizing the near-end voice data and the echo voice data;
Is input to the acoustic echo canceller to evaluate the performance of the acoustic echo canceller.