JP2008022069A

JP2008022069A - Voice recording apparatus and voice recording method

Info

Publication number: JP2008022069A
Application number: JP2006189664A
Authority: JP
Inventors: Kazumasa Murai; 和昌村井
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2006-07-10
Filing date: 2006-07-10
Publication date: 2008-01-31

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice recording apparatus capable of recording voice of each talker without being affected by reverberation sounds. <P>SOLUTION: The voice recording apparatus 1 includes: a reference signal generating section 5 for generating a reference signal received by a speaker 15 located at a sound source and acting like a sound generating section; an impulse response storage section 6 storing each of voice signals resulting from recording the sound generated from the speaker 15 by a plurality of voice pickup sections located differently from each other and temporally synchronized; and a transfer function calculation section 7 for obtaining a transfer function from the position of the sound source to microphones 11 to 14 acting like the voice pickup sections on the basis of the voice signals and the reference signal stored in the impulse response storage section 6. Further, the voice recording apparatus 1 includes: an inverse filter coefficient calculation section 8 for calculating an inverse filter coefficient for canceling the characteristic of the obtained transfer function; and an inverse filter processing section 9 for applying filter processing to the voice signals picked up by the plurality of voice pickup sections on the basis of the inverse filter coefficient. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、目的の音声を収録する音声収録装置および音声収録方法に関する。 The present invention relates to a sound recording apparatus and a sound recording method for recording a target sound.

従来、話者の音声を個別に収録する方法として、接話型マイク、ヘッドセットマイクなどを使用する方法や話者の音声の特性に基づいて音声信号処理をする方法が提案されている。また、他の従来技術として、遠隔会議等において、複数のマイクロホンにより受音された音声信号をフィルタ処理して出力することにより、雑音や歪を低減し、目的とする音源から発せられた音を高品質に収音する装置が提案されている（特許文献１）。特許文献１に記載の装置は、マイクロホンで受音した信号に基づいて話者位置推定を行う。次に、話者位置推定結果を受け推定話者位置に遅延和アレーの焦点が向くような遅延を遅延器に設定する。各マイクロホンで収録された音声信号について信号対雑音比（ＳＮ比）を推定する。推定されたＳＮ比はフィルタ係数の決定に用いられる。最適フィルタ計算部はアレーの出力のＳＮ比、目的音成分の歪が最適となるような最適フィルタを計算し、フィルタに設定する。
特開２００１−３０９４８３号公報 Conventionally, as a method for individually recording a speaker's voice, a method using a close-talking microphone, a headset microphone, or the like, or a method for processing an audio signal based on the characteristics of the speaker's voice has been proposed. As another conventional technique, in a remote conference or the like, a sound signal received by a plurality of microphones is filtered and output to reduce noise and distortion, and a sound emitted from a target sound source can be reduced. An apparatus that collects sound with high quality has been proposed (Patent Document 1). The apparatus described in Patent Document 1 performs speaker position estimation based on a signal received by a microphone. Next, a delay is set in the delay device so that the delay sum array is focused on the estimated speaker position based on the speaker position estimation result. A signal-to-noise ratio (S / N ratio) is estimated for an audio signal recorded by each microphone. The estimated S / N ratio is used to determine the filter coefficient. The optimal filter calculation unit calculates an optimal filter that optimizes the S / N ratio of the output of the array and the distortion of the target sound component, and sets the filter.
JP 2001-309383 A

しかしながら、上記特許文献１に記載の技術では、部屋の壁などで反射した残響音がマイクロホンに入ってくるため、残響音の影響を受けることなく各話者の音声を収音することができないという問題があった。 However, in the technique described in Patent Document 1, since the reverberant sound reflected from the wall of the room or the like enters the microphone, the voice of each speaker cannot be collected without being affected by the reverberant sound. There was a problem.

そこで、本発明は、上記問題点に鑑みてなされたもので、残響音の影響を受けることなく各話者の音声を収音することができる音声収録装置および音声収録方法を提供することを目的とする。 Accordingly, the present invention has been made in view of the above problems, and an object thereof is to provide an audio recording apparatus and an audio recording method capable of collecting the voice of each speaker without being affected by reverberant sound. And

上記課題を解決するために、本発明の音声収録装置は、音源に置かれた音声発生部に入力する基準信号を発生する基準信号発生部と、前記音声発生部で発生された音声を異なる位置に置かれ時間的に同期がとられた複数の音声収音部で収音した音声信号をそれぞれ記憶する記憶部と、前記記憶部に記憶された音声信号と前記基準信号に基づいて、前記音源の位置から前記音声収音部までの伝達関数を求める伝達関数算出部と、前記求めた伝達関数の特性を打ち消す逆フィルタ係数を計算する逆フィルタ係数計算部と、前記逆フィルタ係数に基づいて、前記複数の音声収音部で収音される音声信号をフィルタ処理するフィルタ処理部とを有する。 In order to solve the above problems, an audio recording apparatus according to the present invention includes a reference signal generating unit that generates a reference signal to be input to an audio generating unit placed on a sound source, and audio generated by the audio generating unit at different positions. A sound storage unit that stores sound signals picked up by a plurality of sound pickup units that are placed in synchronization with each other in time, and the sound source based on the sound signal stored in the storage unit and the reference signal Based on the inverse filter coefficient, a transfer function calculation unit that calculates a transfer function from the position of the sound pickup unit, an inverse filter coefficient calculation unit that calculates an inverse filter coefficient that cancels the characteristics of the obtained transfer function, And a filter processing unit that performs filter processing on the audio signals collected by the plurality of sound collecting units.

本発明によれば、音源に置かれた音声発生部に入力する基準信号を発生し、音声発生部で発生された音声を異なる位置に置かれ時間的に同期がとられた複数の音声収音部で収音した音声信号をそれぞれ記憶しておき、この記憶部に記憶された音声信号と基準信号に基づいて、音源の位置から音声収音部までの伝達関数を求め、求めた伝達関数の特性を打ち消す逆フィルタ係数を計算して音声信号をフィルタ処理することで、残響音を除去した音声を作り出すことができるため残響音の影響を受けることなく話者の音声を個別に収音することができる。 According to the present invention, a reference signal to be input to a sound generator placed on a sound source is generated, and a plurality of sound pickups in which the sound generated by the sound generator is placed at different positions and synchronized in time Each of the sound signals collected by the storage unit is stored, and based on the sound signal and the reference signal stored in the storage unit, a transfer function from the position of the sound source to the sound collection unit is obtained, and the obtained transfer function By collecting the inverse filter coefficient that cancels the characteristics and filtering the speech signal, it is possible to create speech without the reverberation, so that the speaker's speech can be collected individually without being affected by the reverberation. Can do.

前記フィルタ処理部は、前記複数の音声収音部へ到来する音を同位相化し加算することで目的の音声を収音する。前記基準信号はインパルス入力であり、前記複数の音声収音部で収音した音声信号はインパルス応答であるのが好ましい。本発明の音声収録装置は、前記フィルタ処理部がフィルタ処理した音声信号を前記音源で発生された音声信号として記録する音声収録部をさらに有する。 The filter processing unit picks up a target sound by making the phases of the sounds arriving at the plurality of sound collecting units in phase and adding them. It is preferable that the reference signal is an impulse input, and the sound signals collected by the plurality of sound collecting units are impulse responses. The audio recording apparatus of the present invention further includes an audio recording unit that records the audio signal filtered by the filter processing unit as an audio signal generated by the sound source.

前記フィルタ処理部は、前記音源の位置を示す音源情報に基づいて前記音源の位置を推定して前記フィルタ処理する。これにより、画像や、例えば、椅子の位置や、電波・超音波・赤外線を用いたタグなどを用いた画像以外の音源情報を用いて、音源位置を推定しフィルタ処理をすることもできる。また、残響が少ない部屋であれば声の音源推定も可能である。前記音源情報は、例えば撮像手段により撮像した画像信号である。これにより、画像信号を使って音源の推定の候補を絞ることができ、フィルタ処理の精度を向上させることができる。 The filter processing unit estimates the position of the sound source based on sound source information indicating the position of the sound source and performs the filtering process. As a result, the sound source position can be estimated and filtered using the sound source information other than the image, for example, the position of the chair or the image using a tag using radio waves, ultrasonic waves, or infrared rays. If the room has little reverberation, voice source estimation is possible. The sound source information is, for example, an image signal captured by an imaging unit. Thereby, sound source estimation candidates can be narrowed down using the image signal, and the accuracy of the filter processing can be improved.

本発明の音声収録方法は、音源に置かれた音声発生部に入力する基準信号を発生するステップと、前記音声発生部で発生された音声を異なる位置に置かれ時間的に同期がとられた複数の音声収音部で収音した音声信号をそれぞれ記憶部に記憶するステップと、前記記憶部に記憶された音声信号と前記基準信号に基づいて、前記音源の位置から前記音声収音部までの伝達関数を求めるステップと、前記伝達関数の特性を打ち消す逆フィルタ係数を計算するステップと、前記逆フィルタ係数に基づいて、前記複数の音声収音部で収音される音声信号をフィルタ処理するステップとを有する。本発明によれば、基準信号を使って残響音を除去した音声を作り出すことができるため残響音の影響を受けることなく話者の音声を収音することができる。 According to the audio recording method of the present invention, the step of generating a reference signal to be input to a sound generator placed on a sound source and the sound generated by the sound generator are placed at different positions and synchronized in time. From the step of storing each of the sound signals picked up by the plurality of sound collecting units in the storage unit, and from the position of the sound source to the sound collecting unit based on the sound signal stored in the storage unit and the reference signal Obtaining a transfer function, calculating an inverse filter coefficient that cancels the characteristics of the transfer function, and filtering audio signals picked up by the plurality of sound pickup units based on the inverse filter coefficient Steps. According to the present invention, since the reverberant sound is removed using the reference signal, the voice of the speaker can be collected without being affected by the reverberant sound.

前記基準信号はインパルス入力であり、前記複数の音声収音部で収音した音声信号はインパルス応答である。前記フィルタ処理するステップは、前記音源の位置を示す音源情報に基づいて前記音源の位置を推定して前記フィルタ処理する。これにより、画像や、例えば、椅子の位置や、電波・超音波・赤外線を用いたタグなどを用いた画像以外の音源情報を用いて、音源位置を推定しフィルタ処理をすることもできる。また、残響が少ない部屋であれば声の音源推定も可能である。前記音源情報は、撮像手段により撮像した画像信号である。これにより、画像信号を使って音源の推定の候補を絞ることができ、フィルタ処理の精度を向上させることができる。 The reference signal is an impulse input, and the voice signals collected by the plurality of voice collecting units are impulse responses. The filtering step estimates the position of the sound source based on sound source information indicating the position of the sound source, and performs the filtering process. As a result, the sound source position can be estimated and filtered using the sound source information other than the image, for example, the position of the chair or the image using a tag using radio waves, ultrasonic waves, or infrared rays. In addition, it is possible to estimate the sound source of a voice in a room with little reverberation. The sound source information is an image signal captured by an imaging unit. Thereby, sound source estimation candidates can be narrowed down using the image signal, and the accuracy of the filter processing can be improved.

また、公知技術のＩＣＡ(independent component analysis)を音声に適用したＢＳＳ(Blind
Source Separation)の技術を用いて前記フィルタ処理におけるフィルタのパラメータを最適化することにより、精度を向上することができる。BSSについては、次の文献により提案されている。H.
Saruwatari, T. Kawamura, T.
Nishikawa, K. Shikano,
``Fast-Convergence Algorithm for Blind Source Separation Based
on Array Signal Processing,'' IEICE Trans.
Fundamentals, Vol.E86-A, No.3, pp.286--291 March 2003. In addition, BSS (Blind) which applied ICA (independent component analysis) of known technology to speech
The accuracy can be improved by optimizing the filter parameters in the filter processing using the technique of Source Separation. BSS has been proposed by the following literature. H.
Saruwatari, T. Kawamura, T.
Nishikawa, K. Shikano,
`` Fast-Convergence Algorithm for Blind Source Separation Based
on Array Signal Processing, '' IEICE Trans.
Fundamentals, Vol.E86-A, No.3, pp.286--291 March 2003.

多チャンネルでは計算が複雑で精度も低下するので、ＢＳＳの初期値として、本発明で得たフィルタのパラメータを使うことによって現実的な時間で計算が可能になる。公知のBSSは、帯域ごとに分割した上で、各帯域ごとにフィルタ処理を行って音声を分離するが、本発明ではマイクロホンアレイ技術と同じように、音波のサンプリング単位で処理を行う。ＢＳＳは、複数の音源が独立（２人以上の発話があっても、互いに相関がない）という事実に基づいているが、これは本発明のパラメータを微調整・最適化にも用いることができる。本発明をＢＳＳによる最適化の初期値とすることは可能で、これについては前例はない。本発明のパラメータを初期値として、公知のBSSを適用することにより、より精度を高められることや、体（音源）が動いたときに微調整をすることができる。 Since the calculation is complicated and the accuracy is reduced in a multi-channel, the calculation can be performed in a realistic time by using the filter parameter obtained in the present invention as the initial value of the BSS. A known BSS is divided into bands and then filtered for each band to separate the sound. In the present invention, the process is performed in units of sound wave sampling as in the microphone array technology. BSS is based on the fact that multiple sound sources are independent (even if there are two or more utterances, there is no correlation between them), but this can also be used for fine-tuning / optimizing the parameters of the present invention. . It is possible to set the present invention as an initial value for optimization by BSS, and there is no precedent for this. By using a known BSS with the parameters of the present invention as initial values, the accuracy can be further improved and fine adjustment can be performed when the body (sound source) moves.

本発明によれば、残響音の影響を受けることなく各話者の音声を収音することができる音声収録装置および音声収録方法を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the audio | voice recording apparatus and audio | voice recording method which can collect each speaker's audio | voice without being influenced by a reverberation sound can be provided.

以下、本発明を実施するための最良の形態について説明する。図１は、受音点までの伝達関数を事前に測定し逆フィルタを計算する音声収録装置を説明する図である。図１に示すように、音声収録装置１は、第１〜第４音声入力端子２１〜２４、音声出力端子３、基準信号出力端子４、基準信号発生部５、インパルス応答記憶部６、伝達関数算出部７、逆フィルタ係数計算部８、逆フィルタ処理部９および音声収録部１０を備える。参照符号１１〜１４は第１〜第４マイクロホン（マイクロホンアレイ、複数の音声収音部）、１５はスピーカ、１６〜１８は話者、２０はビデオカメラなどの撮像部をそれぞれ示している。話者１６〜１８は、それぞれ音源Ａ〜Ｃの位置に存在するものとする。 Hereinafter, the best mode for carrying out the present invention will be described. FIG. 1 is a diagram for explaining an audio recording apparatus that measures a transfer function up to a sound receiving point in advance and calculates an inverse filter. As shown in FIG. 1, the audio recording apparatus 1 includes first to fourth audio input terminals 21 to 24, an audio output terminal 3, a reference signal output terminal 4, a reference signal generation unit 5, an impulse response storage unit 6, a transfer function. A calculation unit 7, an inverse filter coefficient calculation unit 8, an inverse filter processing unit 9, and an audio recording unit 10 are provided. Reference numerals 11 to 14 denote first to fourth microphones (microphone array, a plurality of sound collecting units), 15 denotes a speaker, 16 to 18 denote a speaker, and 20 denotes an imaging unit such as a video camera. It is assumed that the speakers 16 to 18 exist at the positions of the sound sources A to C, respectively.

音声収録装置１は、話者位置から基準信号を発生して、受音点までの伝達関数を事前に測定しておき、その伝達関数の逆フィルタを計算し、受音信号に畳み込むことによって残響を除去して音声を収録するものである。 The sound recording device 1 generates a reference signal from the speaker position, measures a transfer function up to the sound receiving point in advance, calculates an inverse filter of the transfer function, and convolves with the sound receiving signal to reverberate. Is to record the sound.

基準信号発生部５は、音源に置かれた音声発生部であるスピーカ１５に入力する基準信号を発生するものである。基準信号発生部５は、発生した基準信号を基準信号出力端子４に通して話者１６〜１８と同じ位置におかれたスピーカ１５より出力する。第１〜第４マイクロホン１１〜１４は、互いに異なる任意の位置に設置されている。第１〜第４マイクロホンは、例えば会議室の天井や床面に２次元に配列される。発生された基準信号を第１マイクロホン１１〜第４マイクロホン１４で受音する。インパルス応答記憶部６は、スピーカ１５で発生された音声を異なる位置に置かれ時間的に同期がとられた複数のマイクロホン１１〜１４で収音した音声信号（例えばインパルス応答）をそれぞれ記憶する。このインパルス応答記憶部６は、例えばハードディスクや半導体メモリにより構成され、受音された各チャネルのインパルス応答を記憶する。 The reference signal generator 5 generates a reference signal to be input to the speaker 15 which is a sound generator placed on the sound source. The reference signal generator 5 passes the generated reference signal through the reference signal output terminal 4 and outputs it from the speaker 15 placed at the same position as the speakers 16 to 18. The first to fourth microphones 11 to 14 are installed at arbitrary positions different from each other. The first to fourth microphones are arranged two-dimensionally, for example, on the ceiling or floor surface of the conference room. The generated reference signal is received by the first microphone 11 to the fourth microphone 14. The impulse response storage unit 6 stores voice signals (for example, impulse responses) collected by a plurality of microphones 11 to 14 which are placed at different positions and are synchronized in time. The impulse response storage unit 6 is composed of, for example, a hard disk or a semiconductor memory, and stores the impulse response of each received channel.

図２は、インパルス応答記憶部６の内容を説明するための図であり、同図（Ａ）は音源Ａにおけるインパルス入力とインパルス応答を説明する図、同図（Ｂ）は音源Ｂにおけるインパルス入力とインパルス応答を説明する図、同図（Ｃ）は音源Ｃにおけるインパルス入力とインパルス応答を説明する図である。各図において、（ａ）はスピーカ１５に入力される基準信号としてのインパルス入力の波形、（ｂ）は第1マイクロホン１１で受信されたインパルス応答の波形、（ｃ）は第２マイクロホン１２で受信されたインパルス応答の波形、（ｄ）は第３マイクロホン１３で受信されたインパルス応答の波形、(e)は第４マイクロホン１４で受信されたインパルス応答の波形をそれぞれ示す。 FIG. 2 is a diagram for explaining the contents of the impulse response storage unit 6, where FIG. 2A is a diagram for explaining the impulse input and impulse response in the sound source A, and FIG. 2B is the impulse input in the sound source B. FIG. 6C is a diagram for explaining impulse input and impulse response in the sound source C. FIG. In each figure, (a) is a waveform of an impulse input as a reference signal input to the speaker 15, (b) is a waveform of an impulse response received by the first microphone 11, and (c) is received by the second microphone 12. (D) shows the waveform of the impulse response received by the third microphone 13, and (e) shows the waveform of the impulse response received by the fourth microphone 14, respectively.

伝達関数算出部７は、インパルス応答記憶部６に記憶された音声信号と基準信号に基づいて、音源の位置からマイクロホン１１〜１４までの伝達関数を求める。具体的には、伝達関数計算部７は、受音された各チャンネルの信号と基準信号に基づいて話者１６〜１８の位置から第１マイクロホン１１までの伝達関数と、話者１６〜１８の位置から第２マイクロホン１２までの伝達関数と、話者１６〜１８の位置から第３マイクロホン１３までの伝達関数と、話者１６〜１８の位置から第４マイクロホン１４までの伝達関数を計算する。逆フィルタ係数計算部８は、測定された各チャンネルの伝達関数の特性を打ち消す逆フィルタ係数を計算する。 The transfer function calculation unit 7 obtains a transfer function from the position of the sound source to the microphones 11 to 14 based on the audio signal and the reference signal stored in the impulse response storage unit 6. Specifically, the transfer function calculation unit 7 transfers the transfer function from the position of the speakers 16 to 18 to the first microphone 11 based on the received signal of each channel and the reference signal, and the speakers 16 to 18. A transfer function from the position to the second microphone 12, a transfer function from the positions of the speakers 16 to 18 to the third microphone 13, and a transfer function from the positions of the speakers 16 to 18 to the fourth microphone 14 are calculated. The inverse filter coefficient calculation unit 8 calculates an inverse filter coefficient that cancels the measured transfer function characteristics of each channel.

逆フィルタ処理部９は、逆フィルタ係数に基づいて、複数のマイクロホン１１〜１４で収音される音声信号をフィルタ処理するものである。この逆フィルタ処理部９は、複数のマイクロホン１１〜１４により受音された音声信号を逆フィルタ係数を用いてフィルタ処理して出力することにより、残響が除去された音声信号を音声出力端子３から出力する。音声収録部１０は、逆フィルタ処理部９がフィルタ処理した音声信号を音源で発生された音声信号として記録する。例えばハードディスク装置や半導体メモリにより構成され、音声出力端子３から出力される音声信号を話者１６〜１８毎に格納する。これにより目的とする音源から発せられた音を高品質に収音することができる。 The inverse filter processing unit 9 filters audio signals collected by the plurality of microphones 11 to 14 based on the inverse filter coefficient. The inverse filter processing unit 9 filters the audio signal received by the plurality of microphones 11 to 14 using the inverse filter coefficient and outputs the filtered audio signal from the audio output terminal 3. Output. The audio recording unit 10 records the audio signal filtered by the inverse filter processing unit 9 as an audio signal generated by a sound source. For example, it is constituted by a hard disk device or a semiconductor memory, and the audio signal output from the audio output terminal 3 is stored for each speaker 16-18. Thereby, the sound emitted from the target sound source can be collected with high quality.

基準信号発生部５、伝達関数算出部７、逆フィルタ係数計算部８、逆フィルタ処理部９の処理は、ＣＰＵ（Central Processing Unit）とその周辺回路によって所定のプログラムを実行することによって実現することもでき、またＡＳＩＣ（Application Specific
Integrated Circuit）などにより実現することもできる。また、逆フィルタ処理部９は、撮像部２０により撮像した画像信号に基づいて、音源の位置を推定してフィルタ処理するようにしてもよい。これにより、画像信号を使って音源の推定の候補を絞ることができ、フィルタ処理の精度を向上させることができる。 The processing of the reference signal generator 5, the transfer function calculator 7, the inverse filter coefficient calculator 8, and the inverse filter processor 9 is realized by executing a predetermined program by a CPU (Central Processing Unit) and its peripheral circuits. ASIC (Application Specific)
(Integrated Circuit). Further, the inverse filter processing unit 9 may estimate the position of the sound source based on the image signal captured by the imaging unit 20 and perform the filter processing. Thereby, sound source estimation candidates can be narrowed down using the image signal, and the accuracy of the filter processing can be improved.

図３は、本発明の実施形態に係る逆フィルタ係数計算処理のフローチャートである。次に逆フィルタ係数計算処理について説明する。まず、逆フィルタ係数計算処理では、各話者１６〜１８が発話する予定の位置Ａにスピーカ１５をセットする（ステップＳ１１）。次に、基準信号発生部５は、発生した基準信号（インパルス入力）を基準信号出力端子４に通して話者１６〜１８と同じ位置におかれたスピーカ１５より出力する（ステップ１２）。スピーカ１５より発生された基準信号を第１マイクロホン１１〜第４マイクロホン１４で受音する（ステップＳ１３）。インパルス応答記憶部６は、受音された各チャネルのインパルス応答を記憶する（ステップＳ１４）。 FIG. 3 is a flowchart of the inverse filter coefficient calculation process according to the embodiment of the present invention. Next, the inverse filter coefficient calculation process will be described. First, in the inverse filter coefficient calculation process, the speaker 15 is set at the position A where each speaker 16-18 is scheduled to speak (step S11). Next, the reference signal generator 5 passes the generated reference signal (impulse input) through the reference signal output terminal 4 and outputs it from the speaker 15 placed at the same position as the speakers 16 to 18 (step 12). The reference signal generated from the speaker 15 is received by the first microphone 11 to the fourth microphone 14 (step S13). The impulse response storage unit 6 stores the received impulse response of each channel (step S14).

次に、同様に、話者１６〜１８が発話する予定の位置Ｂ、Ｃについても同様の手続きを行い、インパルス応答記憶部６に各チャネルのインパルス応答を記憶する。会議や発表会などの場合、椅子の位置や発表者の位置から音源の位置がおおよそ分かっているためこのような処理が可能となる。 Next, similarly, the same procedure is performed for the positions B and C where the speakers 16 to 18 are to speak, and the impulse response storage unit 6 stores the impulse response of each channel. In the case of a meeting or a presentation, such processing is possible because the position of the sound source is roughly known from the position of the chair or the position of the presenter.

伝達関数算出部７は、全ての発話予定位置についてのインパルス応答が収集されていると判断した場合（ステップＳ１５で「Ｙ」）、受音された各チャンネルの信号と基準信号に基づいて話者１６〜１８の位置から第１マイクロホン１１までの伝達関数と、話者１６〜１８の位置から第２マイクロホン１２までの伝達関数と、話者１６〜１８の位置から第３マイクロホン１３までの伝達関数と、話者１６〜１８の位置から第４マイクロホン１４までの伝達関数を計算する（ステップＳ１６）。逆フィルタ係数計算部８は、測定された各チャンネルの伝達関数の特性を打ち消す逆フィルタ係数を計算する（ステップＳ１７）。 When the transfer function calculation unit 7 determines that impulse responses for all utterance scheduled positions have been collected ("Y" in step S15), the speaker is based on the received signal of each channel and the reference signal. The transfer function from the position of 16-18 to the first microphone 11, the transfer function from the position of the speakers 16-18 to the second microphone 12, and the transfer function from the position of the speakers 16-18 to the third microphone 13 Then, a transfer function from the positions of the speakers 16 to 18 to the fourth microphone 14 is calculated (step S16). The inverse filter coefficient calculation unit 8 calculates an inverse filter coefficient that cancels the measured transfer function characteristics of each channel (step S17).

次に、話者毎の音声を個別に収録する際の処理について説明する。図４は話者毎の音声を個別に収録する際の処理フローチャートである。例えば、音源位置Ａにおいて話者１６が発話をしたと仮定する（ステップＳ２１）。第１〜第４マイクロホン１１〜１４に音声信号が入力され、入力された音声信号は第１〜第４音声入力端子２１〜２４を介して逆フィルタ処理部９に入力される（ステップＳ２２）。 Next, a process when recording the voice of each speaker individually will be described. FIG. 4 is a processing flowchart when recording the voice of each speaker individually. For example, it is assumed that the speaker 16 speaks at the sound source position A (step S21). Audio signals are input to the first to fourth microphones 11 to 14, and the input audio signals are input to the inverse filter processing unit 9 via the first to fourth audio input terminals 21 to 24 (step S22).

逆フィルタ処理部９は、第１〜第４マイクロホン１１〜１４へ到来する音を同位相化し加算することで、焦点位置に対する感度を高める遅延和アレーという手法を用い、目的音源位置に焦点を向けることにより、目的音源位置以外にある雑音を抑圧し、ＳＮ比を向上させた音源Ａの音声信号を生成する（ステップＳ２３）。これにより目的音源からの音を強調した音声信号を生成できる。次に、逆フィルタ処理部９は、複数のマイクロホン１１〜１４により受音された音声信号を逆フィルタ係数を用いてフィルタ処理して出力する（ステップＳ２４）。これにより、残響が除去された音声信号を音声出力端子３から出力される。 The inverse filter processing unit 9 uses a technique called a delay sum array that increases the sensitivity to the focal position by adding in-phase the sounds arriving at the first to fourth microphones 11 to 14, and directs the focus to the target sound source position. As a result, noise other than the target sound source position is suppressed, and a sound signal of the sound source A with an improved S / N ratio is generated (step S23). As a result, an audio signal in which the sound from the target sound source is emphasized can be generated. Next, the inverse filter processing unit 9 filters and outputs the audio signals received by the plurality of microphones 11 to 14 using the inverse filter coefficient (step S24). As a result, the audio signal from which reverberation has been removed is output from the audio output terminal 3.

音声収録部１０は、音声出力端子３から出力される音声信号を話者１６〜１８毎に格納し、話者の音声を個別的に収録する（ステップＳ２５）。音声収録装置は、音声収録の終了の指示が入力された場合、処理を終了する。これにより目的とする音源から発せられた話者の音声を高品質に個別に収録することができる。 The voice recording unit 10 stores the voice signal output from the voice output terminal 3 for each of the speakers 16 to 18, and individually records the voice of the speaker (step S25). The sound recording device ends the process when an instruction to end sound recording is input. Thereby, the voice of the speaker emitted from the target sound source can be individually recorded with high quality.

上記実施形態に係る音声収録装置によれば、基準信号を使って残響音を除去した音声を作り出すことができるため残響音の影響を受けることなく話者の音声を個別に収音することができる。 According to the audio recording apparatus according to the above-described embodiment, it is possible to create a voice from which reverberation is removed using a reference signal, and thus it is possible to individually collect the voice of a speaker without being affected by the reverberation. .

以上本発明の好ましい実施例について詳述したが、本発明は係る特定の実施例に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。なお、上記では音源位置は３つの場合を例にとって説明したが、本発明ではこれに限定されることなくさらに多くの音源位置が存在する場合であっても適用することができる。また、上記では４つのマイクロホンを用いた例について説明したが、実際にはさらに多くのマイクロホンを用いるのが精度向上のために良い。また、上記では基準信号の例としてインパルス入力を用いた例について説明したが、本発明ではこれに限定されることなく、基準となる信号であればどのような信号であってもよい。また、本発明は会議の場面だけでなく様々な場面においても適用することができる。また、上記では音源の位置を示す音源情報として撮像部により撮像した画像信号を用いた例について説明したが、本発明では、音源情報は画像信号を用いたものに限定されることはない。すなわち、逆フィルタ処理部は、音源の位置を示す音源情報に基づいて音源の位置を推定してフィルタ処理することも可能である。 Although the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to the specific embodiments, and various modifications, within the scope of the gist of the present invention described in the claims, It can be changed. In the above description, the case where there are three sound source positions has been described as an example. However, the present invention is not limited to this, and the present invention can be applied even when more sound source positions exist. In addition, although an example using four microphones has been described above, in practice, it is better to use more microphones for improving accuracy. In the above description, an example in which an impulse input is used as an example of the reference signal has been described. However, the present invention is not limited to this, and any signal may be used as long as it is a reference signal. Further, the present invention can be applied not only to a meeting scene but also to various scenes. Moreover, although the example using the image signal imaged by the imaging unit as the sound source information indicating the position of the sound source has been described above, the present invention is not limited to the sound source information using the image signal. That is, the inverse filter processing unit can also perform filter processing by estimating the position of the sound source based on sound source information indicating the position of the sound source.

受音点までの伝達関数を事前に測定し逆フィルタを計算する音声収録装置の技術を説明する図である。It is a figure explaining the technique of the audio | voice recording apparatus which measures the transfer function to a sound receiving point in advance, and calculates an inverse filter. インパルス応答記憶部の内容を説明するための図である。It is a figure for demonstrating the content of an impulse response memory | storage part. 本発明の実施形態に係る逆フィルタ係数計算処理のフローチャートである。It is a flowchart of the inverse filter coefficient calculation process which concerns on embodiment of this invention. 話者毎の音声を個別に収録する際の処理フローチャートである。It is a process flowchart at the time of recording the audio | voice for every speaker separately.

Explanation of symbols

１音声収録装置
２１〜２４音声入力端子
３音声出力端子
４基準信号出力端子
５基準信号発生部
６インパルス応答記録部
７伝達関数算出部
８逆フィルタ係数計算部
９逆フィルタ処理部 DESCRIPTION OF SYMBOLS 1 Audio | voice recording device 21-24 Audio | voice input terminal 3 Audio | voice output terminal 4 Reference signal output terminal 5 Reference signal generation part 6 Impulse response recording part 7 Transfer function calculation part 8 Inverse filter coefficient calculation part 9 Inverse filter process part

Claims

A reference signal generator for generating a reference signal to be input to a sound generator placed on a sound source;
A storage unit for storing each of the sound signals picked up by a plurality of sound pickup units placed in different positions and synchronized in time with the sound generated by the sound generation unit;
A transfer function calculation unit for obtaining a transfer function from the position of the sound source to the sound pickup unit based on the sound signal stored in the storage unit and the reference signal;
An inverse filter coefficient calculation unit for calculating an inverse filter coefficient that cancels the characteristics of the obtained transfer function;
An audio recording apparatus comprising: a filter processing unit that performs a filter process on audio signals collected by the plurality of audio pickup units based on the inverse filter coefficients.

The sound recording apparatus according to claim 1, wherein the filter processing unit picks up a target sound by in-phaseizing and adding sounds arriving at the plurality of sound collecting units.

The reference signal is an impulse input;
The sound recording apparatus according to claim 1, wherein the sound signals collected by the plurality of sound collecting units are impulse responses.

The audio recording apparatus according to claim 1, further comprising an audio recording unit that records the audio signal filtered by the filter processing unit as an audio signal generated by the sound source.

5. The audio recording apparatus according to claim 1, wherein the filter processing unit estimates the position of the sound source based on sound source information indicating the position of the sound source and performs the filter processing. 6. .

6. The audio recording apparatus according to claim 5, wherein the sound source information is an image signal captured by an imaging unit.

Generating a reference signal to be input to a sound generator placed on a sound source;
Storing each of the sound signals collected by the plurality of sound collecting units that are placed at different positions and synchronized in time with the sound generated by the sound generating unit in the storage unit;
Obtaining a transfer function from the position of the sound source to the sound pickup unit based on the sound signal stored in the storage unit and the reference signal;
Calculating an inverse filter coefficient that cancels the characteristics of the transfer function;
Filtering audio signals picked up by the plurality of sound pickup units based on the inverse filter coefficients.

The reference signal is an impulse input;
The voice recording method according to claim 7, wherein the voice signals picked up by the plurality of voice pickup units are impulse responses.

9. The audio recording method according to claim 7, wherein the filtering process performs the filtering process by estimating the position of the sound source based on sound source information indicating the position of the sound source.

The audio recording method according to claim 9, wherein the sound source information is an image signal captured by an imaging unit.