JP2008205896A

JP2008205896A - Sound emitting and picking up device

Info

Publication number: JP2008205896A
Application number: JP2007040507A
Authority: JP
Inventors: Akira Ouchi; 亮大内; Takuro Sone; 卓朗曽根
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2007-02-21
Filing date: 2007-02-21
Publication date: 2008-09-04
Anticipated expiration: 2027-02-21
Also published as: JP5380777B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a sound emitting and picking up device capable of easily performing setting that directs a sound beam in a particular direction and lowers volume only in a particular direction. <P>SOLUTION: A control part 4 performs processing for detecting a sound source position, and also analyzes contents of sound picked up by a microphone to extract a command. As an analysis of sound contents, for example, particular sound contents are extracted by sound recognition, and the sound contents are extracted as a command. The control part 4 performs orientation setting processing for setting a delay amount of beam control parts 7A and 7B on the basis of the detected sound source position and command contents. Consequently, a user can easily direct a sound beam in the direction of the user himself and direct the sound beam in other directions only by issuing particular command sound. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は、音声を収音するとともに、特定の方向に強い指向性を有する音声ビームを出力する放収音装置に関する。 The present invention relates to a sound emission and collection device that collects sound and outputs a sound beam having strong directivity in a specific direction.

従来、スピーカアレイの各ユニットに供給する音声信号を遅延制御することで、特定の方向に強い指向性を有する音声ビームを出力する放音装置が知られている。 2. Description of the Related Art Conventionally, a sound emitting device that outputs an audio beam having strong directivity in a specific direction by delay-controlling an audio signal supplied to each unit of a speaker array is known.

例えば特許文献１では各スピーカユニットの遅延量など、指向性を制御するパラメータを設定するために、マイクアレイを用いて発話者の位置を特定し、発話者の方向に音声ビームを向ける装置が提案されている。
特開２００６−２７０８７６号公報 For example, Patent Document 1 proposes an apparatus for specifying a speaker position using a microphone array and directing an audio beam toward the speaker in order to set parameters for controlling directivity, such as the delay amount of each speaker unit. Has been.
JP 2006-270876 A

しかし、特許文献１の装置は、発話者の方向に音声ビームを向けるため、汎用性が低いものであった。例えば、家庭内で使用する場合、一方では大音量で映画の音声を聴きたいユーザが存在し、他方では電話をするため映画の音声を小さくしたいユーザが存在する等、特定の方向に音声ビームを向けるだけでなく、特定の方向だけ音量を下げたい場合が有る。 However, the apparatus of Patent Document 1 has low versatility because the voice beam is directed toward the speaker. For example, when using in a home, there are users who want to listen to the sound of a movie at a high volume on the one hand, and there are users who want to reduce the sound of the movie to make a phone call on the other hand. There are times when you want to turn down the volume only in a specific direction.

そこで、この発明は、特定の方向に音声ビームを向けたり、特定の方向だけ音量を下げたりする設定を容易に行うことができる放収音装置を提供することを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to provide a sound emission and collection device that can easily perform settings such as directing an audio beam in a specific direction or lowering the volume only in a specific direction.

この発明の放収音装置は、音声を収音し、収音信号を出力する収音部と、音源位置を検出する音源位置検出部と、音声に特定の方向へ指向性を持たせて放音する放音部と、前記収音信号を入力し、収音信号に含まれる指向性を指示するコマンドを抽出する音声解析部と、前記音源位置検出部が検出した音源位置、および前記音声解析部の抽出した指向性を指示するコマンドの内容に基づいて、前記放音部の指向性パターンを設定する制御部と、を備えたことを特徴とする。 The sound emission and collection device of the present invention includes a sound collection unit that collects sound and outputs a sound collection signal, a sound source position detection unit that detects a sound source position, and emits sound with directivity in a specific direction. A sound emission unit that emits sound, a sound analysis unit that inputs the sound collection signal and extracts a command that indicates directivity included in the sound collection signal, a sound source position detected by the sound source position detection unit, and the sound analysis And a control unit that sets a directivity pattern of the sound emitting unit based on the contents of the command that instructs the directivity extracted by the unit.

この構成では、収音信号から指向性を指示するコマンドを抽出する。例えば、音声認識により「こちらへ」、「音を大きく」等の単語を抽出する。また、収音信号の基となった音源の位置を検出する。音源位置の検出は、例えばマイクアレイの各マイクユニットの出力音声信号から線形予測を行う。これらのコマンド抽出結果、音源位置の検出結果に基づいて指向性を制御する。指向性の設定は種々の態様が考えられるが、例えば、「こちらへ」という単語を抽出した場合、その方向に強い指向性を有する音声ビームを向ける。 In this configuration, a command indicating directivity is extracted from the collected sound signal. For example, words such as “here” and “sound louder” are extracted by voice recognition. Further, the position of the sound source that is the basis of the collected sound signal is detected. The sound source position is detected by performing linear prediction from the output audio signal of each microphone unit of the microphone array, for example. The directivity is controlled based on these command extraction results and sound source position detection results. There are various modes of setting the directivity. For example, when the word “here” is extracted, an audio beam having a strong directivity is directed in that direction.

また、この発明は、さらに、前記音声解析部は、収音信号に含まれるソースを選択するコマンドをさらに抽出し、前記制御部は、前記音声解析部が抽出したソースを選択するコマンドに基づいて、選択されたソースの音声の指向性パターンを設定し、前記放音部は、前記制御部が設定した指向性パターンに基づいて、異なるソースの音声を同時に複数の方向へ指向性を持たせて放音することを特徴とする。 Further, according to the present invention, the voice analysis unit further extracts a command for selecting a source included in the collected sound signal, and the control unit is based on the command for selecting the source extracted by the voice analysis unit. The directivity pattern of the selected source sound is set, and the sound emitting unit is configured to direct the sound of different sources simultaneously in a plurality of directions based on the directivity pattern set by the control unit. It is characterized by emitting sound.

この構成では、異なるソースの音声を複数の方向へ指向性を持たせて放音する。例えばスピーカアレイの各スピーカユニットに入力する音声信号を個別に遅延処理することで複数の方向に同時に指向性を持たせることが可能となる。更にこの構成では収音信号からソースを選択するコマンドを抽出する。例えば音声が２つのソース「ソースＡ」、「ソースＢ」からなる場合、音声認識により「ソースＡ」や「ソースＢ」という単語を抽出する。選択されたソースの音声の指向性パターンのみ設定する。これにより、例えば、「ソースＡ」という発言の後に「こちらへ」という単語を抽出した場合、その方向にソースＡの音声のみビームを向ける。 In this configuration, sound from different sources is emitted with directivity in a plurality of directions. For example, it is possible to provide directivity in a plurality of directions at the same time by individually delaying audio signals input to the respective speaker units of the speaker array. Further, in this configuration, a command for selecting a source is extracted from the collected sound signal. For example, when the voice is composed of two sources “source A” and “source B”, the words “source A” and “source B” are extracted by voice recognition. Only the directivity pattern of the selected source audio is set. Thus, for example, when the word “here” is extracted after the statement “source A”, only the sound of the source A is directed in that direction.

また、この発明は、さらに、前記音声解析部は、収音信号に含まれるトリガとなるコマンドをさらに抽出し、前記制御部は、前記音声解析部がトリガとなるコマンドを抽出した場合のみ、前記指向性を指示するコマンドの内容に基づいてその後指向性パターンを設定することを特徴とする。 Further, according to the present invention, the voice analysis unit further extracts a command that is a trigger included in the collected sound signal, and the control unit extracts the command that the voice analysis unit triggers only when the command is extracted. A directivity pattern is then set based on the content of a command that instructs directivity.

この構成では、収音信号からトリガとなるコマンドを抽出する。トリガとなるコマンドとしては、例えば「コマンド入力」という単語である。この単語を認識した場合のみ指向性パターンを設定する。例えば、「コマンド入力」という発言の後に「こちらへ」という単語を抽出した場合、その方向にソースＡの音声のみビームを向ける。単に「こちらへ」という単語を抽出した場合はこれを無視する。無意識に発言された内容の音声を無視することで、ユーザの設定意志を反映する。 In this configuration, a trigger command is extracted from the collected sound signal. The command serving as a trigger is, for example, the word “command input”. A directivity pattern is set only when this word is recognized. For example, when the word “here” is extracted after the statement “command input”, only the sound of the source A is directed in that direction. If you simply extract the word "here", ignore it. By ignoring unintentionally spoken content, the user's intention to set is reflected.

また、この発明は、さらに、前記音声解析部は、前記収音信号に含まれる特定のリズムパターンをコマンドとして抽出することを特徴とする。 Furthermore, the present invention is further characterized in that the voice analysis unit extracts a specific rhythm pattern included in the collected sound signal as a command.

この構成では、特定のリズムパターンをコマンドとして抽出する。例えば、所定レベル以上でかつ短い単発音（例えば手をたたく音声など）をカウントし、所定時間内（例えば３秒）の入力回数によってコマンドを抽出する。例えば、単発音１回で「音を大きく」と判断し、２回で「音を小さく」と判断する。 In this configuration, a specific rhythm pattern is extracted as a command. For example, a short single sound (for example, a clapping voice) that is equal to or higher than a predetermined level is counted, and a command is extracted according to the number of inputs within a predetermined time (eg, 3 seconds). For example, it is determined that “sound is loud” with a single sound, and “sound is small” with twice.

また、この発明は、さらに、前記制御部は、所定方向にのみ音量が低下するように指向性パターンを設定することを特徴とする。 Furthermore, the present invention is further characterized in that the control unit sets a directivity pattern so that the volume decreases only in a predetermined direction.

この構成では、指向性パターンの態様として、所定方向にのみ音量が低下するようにする。スピーカアレイの場合、各スピーカユニットから放音された音声は、位相が異なる領域で弱められる。したがって、各スピーカユニットに入力する音声信号の遅延量をコントロールすることで、所定方向にのみ音量が低下するように指向性を設定することができる。この場合、指向性を指示するコマンドとして「ここだけミュート」等の単語を抽出すればよい。これにより、特定の音声を発言するだけで、静かにしたい特定の領域のみ音量を低下させることができる。 In this configuration, the volume is reduced only in a predetermined direction as an aspect of the directivity pattern. In the case of a speaker array, the sound emitted from each speaker unit is weakened in regions having different phases. Therefore, by controlling the delay amount of the audio signal input to each speaker unit, the directivity can be set so that the volume decreases only in a predetermined direction. In this case, a word such as “mute only here” may be extracted as a command for directivity. As a result, the volume can be lowered only in a specific area where the user wants to be quiet by simply speaking a specific sound.

また、この発明は、さらに、前記収音信号のエコー成分を除去するエコーキャンセラをさらに備え、前記音声解析部は、前記エコーキャンセラがエコー成分を除去した収音信号に含まれるコマンドを抽出することを特徴とする。 The present invention further includes an echo canceller that removes an echo component of the collected sound signal, and the speech analysis unit extracts a command included in the collected sound signal from which the echo canceler has removed the echo component. It is characterized by.

この構成では、収音信号からエコー成分を除去する。エコー成分を除去した後の収音信号について音声認識等を行うため、コマンド抽出の精度が向上する。 In this configuration, the echo component is removed from the collected sound signal. Since voice recognition or the like is performed on the collected sound signal after removing the echo component, the accuracy of command extraction is improved.

この発明によれば、収音信号に含まれる指向性を指示するコマンドを抽出することにより、ユーザの発言で特定の方向に音声ビームを向けたり、特定の方向だけ音量を下げたりすることができる。 According to the present invention, by extracting a command indicating the directivity included in the collected sound signal, the sound beam can be directed in a specific direction or the volume can be decreased only in a specific direction by a user's speech. .

この実施形態の放収音装置は、マイクで収音した音声に基づいて放音指向性を制御する装置であり、他の装置から入力された音声を所定の方向に指向性を制御して放音する。この放収音装置は、テレビやオーディオ装置に接続することで種々のオーディオソースを放音するスピーカ装置として用いることが可能であり、他装置にマイクで収音した音声を出力することで音声会議装置として用いることも可能である。
以下、図面を参照してこの発明の実施形態である放収音装置について説明する。図１は放収音装置の構成を示すブロック図である。 The sound emission and collection device of this embodiment is a device that controls the sound emission directivity based on the sound collected by the microphone. Sound. This sound emission and collection device can be used as a speaker device that emits various audio sources by connecting to a television or an audio device. It can also be used as a device.
Hereinafter, a sound emitting and collecting apparatus according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a sound emission and collection device.

この放収音装置１は、マイクアレイ２、入出力インタフェース（Ｉ／Ｆ）３、制御部４、スピーカアレイ５、エコーキャンセラ６、ビーム制御部７Ａ、ビーム制御部７Ｂ、ミキサ８、Ｄ／Ａコンバータ１１〜１８、アンプ（ＡＭＰ）３１〜３８、アンプ（ＡＭＰ）４１〜４８、Ａ／Ｄコンバータ５１〜５８、収音ビーム生成部６１、および収音ビーム選択部７１を備えている。 The sound emission and collection device 1 includes a microphone array 2, an input / output interface (I / F) 3, a control unit 4, a speaker array 5, an echo canceller 6, a beam control unit 7A, a beam control unit 7B, a mixer 8, and a D / A. Converters 11 to 18, amplifiers (AMP) 31 to 38, amplifiers (AMP) 41 to 48, A / D converters 51 to 58, a sound collection beam generation unit 61, and a sound collection beam selection unit 71 are provided.

マイクアレイ２は、複数の（同図の例では８つの）マイクユニット２１〜２８を直線状に配列してなり、マイクユニット２１〜２８が収音した音声（収音信号）をそれぞれ出力する。スピーカアレイ５は、複数の（同図の例では８つの）スピーカユニット５１〜５８を直線状に配列してなり、それぞれ入力された音声信号を放音する。 The microphone array 2 includes a plurality of (eight in the example of the figure) microphone units 21 to 28 arranged in a straight line, and outputs sounds (sound collection signals) collected by the microphone units 21 to 28, respectively. The speaker array 5 is formed by linearly arranging a plurality of (eight in the example shown in the figure) speaker units 51 to 58, and each outputs an input audio signal.

マイクユニット２１〜２８が収音した収音信号はフロントエンドのアンプ４１〜４８で増幅され、Ａ／Ｄコンバータ５１〜５８でデジタル変換される。Ａ／Ｄコンバータ５１〜５８でデジタル化された収音信号はエコーキャンセラ６に入力される。 The collected sound signals picked up by the microphone units 21 to 28 are amplified by the front-end amplifiers 41 to 48 and digitally converted by the A / D converters 51 to 58. The collected sound signal digitized by the A / D converters 51 to 58 is input to the echo canceller 6.

エコーキャンセラ６は、フィルタ処理部６０を含み、ミキサ８から入力されるスピーカユニット５１〜５８に対応する音声信号をフィルタ処理部６０に入力する。フィルタ処理部６０は、スピーカユニット５１〜５８に対応する音声信号をそれぞれフィルタ処理して、スピーカアレイ５からマイクアレイ２に回り込む回帰音声信号を擬似した擬似回帰音信号を生成する。フィルタ処理部６０は、この擬似回帰音信号を各収音信号から減算することでエコー成分を消去し、収音ビーム生成部６１に出力する。エコーキャンセラ６によりエコー成分を消去することで、後述の音源位置検出処理、コマンド解析処理の精度が向上する。 The echo canceller 6 includes a filter processing unit 60, and inputs audio signals corresponding to the speaker units 51 to 58 input from the mixer 8 to the filter processing unit 60. The filter processing unit 60 filters the audio signals corresponding to the speaker units 51 to 58 to generate a pseudo regression sound signal that simulates the regression audio signal that circulates from the speaker array 5 to the microphone array 2. The filter processing unit 60 subtracts this pseudo-regression sound signal from each collected sound signal to eliminate the echo component, and outputs it to the collected sound beam generation unit 61. Erasing the echo component by the echo canceller 6 improves the accuracy of the sound source position detection process and the command analysis process described later.

収音ビーム生成部６１は、エコーキャンセラ６でエコー成分が除去された収音信号をそれぞれ遅延して合成することによりマイクアレイ２全体としての収音指向性をビーム化する。このビーム化された収音指向性により、特定の領域で発生した音声を高いゲインで収音する。なお、ビーム化された収音指向性を収音ビームと呼ぶ。本実施形態では、マイクアレイ２の周囲４つの領域に対応する収音ビームＭＢ１１〜ＭＢ１４を生成する。 The collected sound beam generator 61 delays and synthesizes the collected sound signals from which the echo components have been removed by the echo canceller 6, thereby converting the collected sound directivity of the microphone array 2 as a whole into a beam. Due to the beam-collected sound directivity, sound generated in a specific region is collected with a high gain. The beam-collected sound directivity is called a sound collecting beam. In the present embodiment, sound collection beams MB11 to MB14 corresponding to the four areas around the microphone array 2 are generated.

図２は、収音ビームの一例を示す図である。同図において、収音ビーム生成部６１は、収音したい位置に焦点を結ぶような収音ビームを形成し、狭い範囲の音声を高ゲインで収音する。ここで、収音領域Ｐ１〜Ｐ４は、例えばマイクアレイの正面に設定される。収音ビーム生成部６１は、各マイクユニット２１〜２８が収音した音声信号を、焦点（同図においてはＦ３）から等距離になるように遅延したのち合成することにより、焦点周辺（収音領域Ｐ３）で発生した音声を高ゲインで取り出すことができる。 FIG. 2 is a diagram illustrating an example of a sound collecting beam. In the figure, a sound collecting beam generating unit 61 forms a sound collecting beam that focuses on a position to be picked up, and picks up a narrow range of sound with high gain. Here, the sound collection areas P1 to P4 are set in front of the microphone array, for example. The sound collection beam generator 61 synthesizes the audio signals collected by the microphone units 21 to 28 after delaying them so as to be equidistant from the focus (F3 in the figure), thereby combining the sound around the focus (sound collection). The sound generated in the region P3) can be extracted with high gain.

図１において、収音ビーム生成部６１が生成した４つの収音ビームＭＢ１１〜ＭＢ１４は、収音ビーム選択部７１に入力される。収音ビーム選択部７１は、４つの収音ビームＭＢ１１〜ＭＢ１４のうち最もレベルの高い信号を選択し、その収音ビームをメイン収音ビームとして入出力Ｉ／Ｆ３に出力する。 In FIG. 1, the four sound collecting beams MB11 to MB14 generated by the sound collecting beam generating unit 61 are input to the sound collecting beam selecting unit 71. The collected sound beam selector 71 selects a signal having the highest level among the four collected sound beams MB11 to MB14, and outputs the collected sound beam as a main collected beam to the input / output I / F 3.

図３は、収音ビーム選択部７１の主要構成を示すブロック図である。
収音ビーム選択部７１は、ＢＰＦ（バンドパスフィルタ）１７１、全波整流回路１７２、ピーク検出回路１７３、レベル比較器１７４、および信号選択回路１７５を備えている。 FIG. 3 is a block diagram showing the main configuration of the collected sound beam selector 71.
The collected sound beam selector 71 includes a BPF (band pass filter) 171, a full wave rectifier circuit 172, a peak detector circuit 173, a level comparator 174, and a signal selector circuit 175.

ＢＰＦ１７１は、人の音声の主成分帯域を通過帯域とするバンドパスフィルタであり、収音ビームＭＢ１１〜ＭＢ１４を帯域通過フィルタ処理して、全波整流回路１７２に出力する。全波整流回路１７２は、収音ビームＭＢ１１〜ＭＢ１４を全波整流（絶対値化）する。ピーク検出回路１７３は、全波整流された収音ビームＭＢ１１〜ＭＢ１４のピーク検出を行い、ピーク値データＰｓ１１〜Ｐｓ１４を出力する。レベル比較器１７４は、ピーク値データＰｓ１１〜Ｐｓ１４を比較して、最も高いレベルのピーク値データに対応する収音ビームを選択する選択指示データを信号選択回路１７５に与える。また、レベル比較器１７４は、選択指示データを制御部４にも与える。制御部４は、選択指示データを後述の音源位置検出処理に用いる。信号選択回路１７５は、選択指示データが示す収音ビームを選択し、メイン収音ビームとして入出力Ｉ／Ｆ３に出力する。また、信号選択回路１７５は、選択指示データが示す収音ビームを選択し、メイン収音ビームとして制御部４にも出力する。制御部４は、メイン収音ビームを後述のコマンド解析処理に用いる。
これは、音源が存在する収音領域に対応する収音ビームの信号レベルが他の領域に対応する収音ビームの信号レベルよりも高いことを利用している。 The BPF 171 is a band-pass filter having a passband that is a main component band of human speech, and performs band-pass filter processing on the collected sound beams MB11 to MB14 and outputs them to the full-wave rectifier circuit 172. The full wave rectification circuit 172 performs full wave rectification (absolute value conversion) on the sound collection beams MB11 to MB14. The peak detection circuit 173 detects the peaks of the collected sound beams MB11 to MB14 subjected to full-wave rectification, and outputs peak value data Ps11 to Ps14. The level comparator 174 compares the peak value data Ps11 to Ps14 and gives selection instruction data for selecting a sound collecting beam corresponding to the peak value data of the highest level to the signal selection circuit 175. Further, the level comparator 174 also supplies selection instruction data to the control unit 4. The control unit 4 uses the selection instruction data for a sound source position detection process described later. The signal selection circuit 175 selects the sound collection beam indicated by the selection instruction data, and outputs it to the input / output I / F 3 as the main sound collection beam. Further, the signal selection circuit 175 selects the sound collection beam indicated by the selection instruction data, and outputs it to the control unit 4 as the main sound collection beam. The control unit 4 uses the main sound collection beam for command analysis processing described later.
This utilizes the fact that the signal level of the sound collecting beam corresponding to the sound collecting region where the sound source exists is higher than the signal level of the sound collecting beam corresponding to the other region.

入出力Ｉ／Ｆ３（出力Ｉ／Ｆ３０Ｃ）に入力されたメイン収音ビームは、この放収音装置１を音声会議装置として用いる場合、他の装置に出力される。ネットワークを経由して出力される場合は、所定のプロトコルの音声情報として出力される。 The main sound collection beam input to the input / output I / F 3 (output I / F 30C) is output to other devices when the sound emission and collection device 1 is used as an audio conference device. When output via a network, it is output as audio information of a predetermined protocol.

入出力Ｉ／Ｆ３は、機能的に入力Ｉ／Ｆ３０Ａ，入力Ｉ／Ｆ３０Ｂ，および出力Ｉ／Ｆ３０Ｃからなり、他の装置と音声信号（または音声情報）を入出力する。入力Ｉ／Ｆ３０Ａに入力された音声信号はビーム制御部７Ａに出力され、入力Ｉ／Ｆ３０Ｂに入力された音声信号はビーム制御部７Ｂに出力される。なお、音声情報が入力された場合、音声信号に変換されて出力される。 The input / output I / F 3 functionally includes an input I / F 30A, an input I / F 30B, and an output I / F 30C, and inputs / outputs audio signals (or audio information) to / from other devices. The audio signal input to the input I / F 30A is output to the beam control unit 7A, and the audio signal input to the input I / F 30B is output to the beam control unit 7B. When voice information is input, it is converted into a voice signal and output.

ビーム制御部７Ａ、７Ｂは、スピーカアレイ５のスピーカユニット５１〜５８に入力する音声信号に遅延処理、ゲインコントロールを行うことで、所定方向に強い指向性を有する音声ビームを形成することができる。また、逆に所定方向にのみ音量が低下するような音声ビーム（以下、音声ディップと言う。）を形成することもできる。各スピーカユニット５１〜５８に対応する音声信号の遅延量、ゲインは制御部４により設定される。各スピーカユニット５１〜５８で放音された音声は、位相が共通する領域で強められ、逆に位相が異なる領域では弱められる。したがって、各スピーカユニットに入力する音声信号の遅延量をコントロールすることで特定の方向に音声ビームを向けたり、音声ディップを向けたりすることができる。 The beam controllers 7A and 7B can form a sound beam having strong directivity in a predetermined direction by performing delay processing and gain control on the sound signal input to the speaker units 51 to 58 of the speaker array 5. Conversely, an audio beam whose volume is reduced only in a predetermined direction (hereinafter referred to as an audio dip) can be formed. The delay amount and gain of the audio signal corresponding to each of the speaker units 51 to 58 are set by the control unit 4. The sound emitted from the speaker units 51 to 58 is strengthened in a region having a common phase, and is weakened in a region having a different phase. Therefore, the sound beam can be directed in a specific direction or the sound dip can be directed by controlling the delay amount of the sound signal input to each speaker unit.

ビーム制御部７Ａ、７Ｂが出力した音声信号はミキサ８に入力される。ミキサ８は、ビーム制御部７Ａ、７Ｂがそれぞれ出力したスピーカユニット５１〜５８に対応する音声信号をミキシングし、エコーキャンセラ６に出力する。エコーキャンセラ６は、上記のように、スピーカユニット５１〜５８に対応する音声信号から擬似回帰音信号を生成する。また、エコーキャンセラ６は、スピーカユニット５１〜５８に対応する音声信号をＤ／Ａコンバータ１１〜１８に出力する。スピーカユニット５１〜５８に対応する音声信号は、それぞれＤ／Ａコンバータ１１〜１８でアナログ音声信号に変換され、アンプ３１〜３８で増幅された後スピーカユニット５１〜５８で放音される。 The audio signals output from the beam controllers 7A and 7B are input to the mixer 8. The mixer 8 mixes the audio signals corresponding to the speaker units 51 to 58 output from the beam controllers 7A and 7B, respectively, and outputs them to the echo canceller 6. As described above, the echo canceller 6 generates a pseudo regression sound signal from the audio signals corresponding to the speaker units 51 to 58. Further, the echo canceller 6 outputs audio signals corresponding to the speaker units 51 to 58 to the D / A converters 11 to 18. Audio signals corresponding to the speaker units 51 to 58 are converted into analog audio signals by the D / A converters 11 to 18, respectively, amplified by the amplifiers 31 to 38, and then emitted by the speaker units 51 to 58.

ここで、ビーム制御部７Ａ、７Ｂがそれぞれ違う領域に音声ビームを出力するように遅延処理を行うことで、ユーザは、各場所で異なるソースの音声を聴くことができる。例えば図４に示すように、リビングのソファーの位置に居るユーザｈ１は、映画音声（ソースＡ）を聴き、ダイニングテーブルの位置に居るユーザｈ２は音楽（ソースＢ）を聴くことができる。また、同じ映画音声であっても、ユーザｈ１は日本語の音声を聴き、ユーザｈ２は英語の音声を聴く、といったこともできる。各音声ビーム（音声ディップ）のソース、方向は制御部４により設定される。 Here, by performing delay processing so that the beam controllers 7A and 7B output sound beams to different regions, the user can listen to the sound of different sources at each location. For example, as shown in FIG. 4, a user h1 who is at the position of the sofa in the living room can listen to the movie sound (source A), and a user h2 who is at the position of the dining table can listen to music (source B). In addition, even if the movie sound is the same, the user h1 can listen to Japanese sound, and the user h2 can listen to English sound. The source and direction of each sound beam (sound dip) are set by the control unit 4.

制御部４は、ＣＰＵを含み、レベル比較器１７４から入力した選択指示データに基づいて、音源の位置を検出する音源位置検出処理を行う。最も単純には、選択指示データが示す音声ビームの収音領域に音源が存在すると判断し、この収音領域を音源位置とする。なお、図示はしないが、マイクユニット２１〜２８が収音した収音信号（エコーキャンセラ６の出力した収音信号）をそれぞれ入力し、線形予測法や最小分散法等、その他一般的な手法を用いて音源位置を検出してもよい。 The control unit 4 includes a CPU, and performs sound source position detection processing for detecting the position of the sound source based on the selection instruction data input from the level comparator 174. Most simply, it is determined that a sound source exists in the sound beam collection area indicated by the selection instruction data, and this sound collection area is set as the sound source position. In addition, although not shown in figure, the sound collection signal (sound collection signal which the echo canceller 6 output) which the microphone units 21-28 collected is input, respectively, and other general methods, such as a linear prediction method and the minimum variance method, are used. It may be used to detect the sound source position.

また、制御部４は、信号選択回路１７５から入力したメイン収音ビームを解析するコマンド解析処理を行う。コマンド解析処理は、音声認識を行い、メイン収音ビームの音声内容からコマンドを抽出する処理である。具体的には、制御部４は、入力した音声信号と予めメモリ（図示せず）等に記憶してある音声信号のパターンとを比較する。比較方法は、例えば隠れマルコフモデル等の確率モデルを利用する。制御部４は、入力した音声信号の内容から特定の音声内容を認識した場合、これをコマンドとして抽出する。コマンドの内容は、トリガ、ソースの選択、およびビームの設定に分類される。 In addition, the control unit 4 performs a command analysis process for analyzing the main sound collection beam input from the signal selection circuit 175. The command analysis process is a process of performing voice recognition and extracting a command from the voice content of the main sound collection beam. Specifically, the control unit 4 compares the input audio signal with the pattern of the audio signal stored in advance in a memory (not shown) or the like. The comparison method uses a probability model such as a hidden Markov model. When the specific sound content is recognized from the content of the input sound signal, the control unit 4 extracts this as a command. The contents of the command are classified into trigger, source selection, and beam setting.

制御部４は、トリガのコマンドとして抽出される音声（例えば「コマンド入力」という音声）を予め定めておき、このトリガ音声を認識した後に入力される音声信号をソースの選択、およびビームの設定のコマンドとして抽出するコマンド抽出処理を行い、トリガ音声が認識されていなければコマンド抽出処理を実行しない。 The control unit 4 determines in advance a voice (for example, a voice called “command input”) to be extracted as a trigger command, selects the source of the voice signal input after recognizing the trigger voice, and sets the beam. A command extraction process for extracting as a command is performed. If the trigger voice is not recognized, the command extraction process is not executed.

同様に、制御部４は、ソースの選択のコマンドとして抽出される音声内容を予め定めておく。ソースの選択のコマンドとして抽出される音声内容は、例えば「ソースＡ」、「ソースＢ」等である。
また、制御部４は、ビームの設定のコマンドとして抽出される音声内容も予め定めておく。ビームの設定のコマンドとして抽出される音声内容は、例えば「音を大きく」、「音を小さく」等である。
なお、ソースの選択、およびビームの設定のコマンドの抽出は、本発明において必須ではない。 Similarly, the control unit 4 determines in advance the audio content to be extracted as a source selection command. The audio content extracted as a source selection command is, for example, “source A”, “source B”, and the like.
The control unit 4 also determines in advance the audio content extracted as a beam setting command. The audio content extracted as a beam setting command is, for example, “sound up”, “sound down”, or the like.
Note that selection of a source and extraction of a beam setting command are not essential in the present invention.

また、音声認識に限らず、例えば特定のリズムパターンをコマンドとして抽出することもできる。制御部４は、所定レベル以上の音声でかつ所定レベル以上の時間が短い単発音（例えば手をたたく音声など）をカウントし、所定時間内（例えば３秒）の入力回数によってコマンドを抽出する。例えば、単発音１回で「音を大きく」と判断し、２回で「音を小さく」と判断する。 In addition to voice recognition, for example, a specific rhythm pattern can be extracted as a command. The control unit 4 counts a single sound (for example, a voice of clapping a hand) that is a sound of a predetermined level or more and has a short time of a predetermined level or more, and extracts a command according to the number of inputs within a predetermined time (for example, 3 seconds). For example, it is determined that “sound is loud” with a single sound, and “sound is small” with twice.

制御部４は、音源位置検出処理で検出した音源位置、およびコマンド解析処理で解析したコマンド内容に基づいて、ビーム制御部７Ａ、７Ｂの遅延量やゲインを設定する指向性設定処理を行う。以下、図面を参照して指向性設定処理の具体的な例について説明する。なお、いずれの例においても、ユーザは最初に「コマンド入力」等のトリガ音声を発しているものとする。 The control unit 4 performs directivity setting processing for setting the delay amount and gain of the beam control units 7A and 7B based on the sound source position detected by the sound source position detection processing and the command content analyzed by the command analysis processing. Hereinafter, a specific example of the directivity setting process will be described with reference to the drawings. In any example, it is assumed that the user first emits a trigger voice such as “command input”.

図５は、指向性設定処理の例として、音声ビームをコントロールする例を示す図である。同図（Ａ）は、ユーザの方向に音声ビームを向ける場合の例を示した図である。同図において、ユーザｈ１が「ソースＡこちらへ」と発言すると、制御部４は、ソースの選択のコマンドとして「ソースＡ」を抽出し、ビームの設定のコマンドとして「こちらへ」を抽出する。また、制御部４は、ユーザｈ１の位置を検出する。そして、制御部４は、ソースＡの音声（同図の例では映画音声）がユーザｈ１の位置に向けられるように、ビーム制御部７Ａの遅延量を設定する。これにより、ユーザｈ１は、各場所で「ソースＡこちらへ」と発言するだけで、音声ビームを自身の方向に向けることができる。 FIG. 5 is a diagram illustrating an example of controlling an audio beam as an example of directivity setting processing. FIG. 3A is a diagram showing an example in which an audio beam is directed toward the user. In the figure, when the user h1 says “Source A here”, the control unit 4 extracts “Source A” as a source selection command and extracts “Here” as a beam setting command. Further, the control unit 4 detects the position of the user h1. Then, the control unit 4 sets the delay amount of the beam control unit 7A so that the sound of the source A (movie sound in the example in the figure) is directed to the position of the user h1. As a result, the user h1 can direct the sound beam in his / her direction only by saying “Source A here” at each place.

次に、同図（Ｂ）は、ユーザの方向に向けられている音声ビームの音量を変更する場合の例を示した図である。同図において、ユーザｈ１が「ソースＡ音を大きく」と発言すると、制御部４は、ソースの選択のコマンドとして「ソースＡ」を抽出し、ビームの設定のコマンドとして「音を大きく」を抽出する。また、制御部４は、ユーザｈ１の位置を検出する。そして、制御部４は、ソースＡの音声ビームの音量が大きくなるように、ビーム制御部７Ａのゲインを設定する。なお、このときに検出したユーザｈ１の位置が音声ビームの方向からずれていれば、ユーザｈ１の位置に音声ビームを向けるようにビーム制御部７Ａの遅延量を設定してもよい。これにより、ユーザｈ１は、各場所で「ソースＡ音を大きく」と発言するだけで、自身の位置だけソースＡの音量を大きくすることができる。
図５（Ａ）、および図５（Ｂ）に示した指向性設定の例は、夜間にテレビや音楽を楽しんでいる場合、家庭内の他の音が大きく、映画の音声を聞き取り難い場合、等に好適である。 Next, FIG. 5B is a diagram showing an example of changing the volume of the sound beam directed toward the user. In the figure, when the user h1 says “source A louder”, the control unit 4 extracts “source A” as a source selection command and “sound louder” as a beam setting command. To do. Further, the control unit 4 detects the position of the user h1. Then, the control unit 4 sets the gain of the beam control unit 7A so that the volume of the sound beam of the source A is increased. If the detected position of the user h1 is deviated from the direction of the sound beam, the delay amount of the beam control unit 7A may be set so that the sound beam is directed to the position of the user h1. As a result, the user h1 can increase the volume of the source A only by his / her own position only by saying “Increase the source A sound” at each location.
The example of directivity setting shown in FIG. 5 (A) and FIG. 5 (B) is that when enjoying TV and music at night, when other sounds in the home are loud and it is difficult to hear the sound of the movie, It is suitable for etc.

次に、図６は、ユーザの方向に音声ディップを向ける場合の例を示した図である。同図において、ユーザｈ１が「ソースＡここだけミュート」と発言すると、制御部４は、ソースの選択のコマンドとして「ソースＡ」を抽出し、ビームの設定のコマンドとして「ここだけミュート」を抽出する。また、制御部４は、ユーザｈ１の位置を検出する。そして、制御部４は、ソースＡの音声がユーザｈ１の位置だけ音量が低下するように（図中２点破線で示す音声ディップが向けられるように）、ビーム制御部７Ａの遅延量を設定する。これにより、ユーザｈ１は、各場所で「ソースＡここだけミュート」と発言するだけで、音声ディップを自身の方向に向けることができる。
同図の例は、ユーザがテレビや音楽を楽しんでいるとき、電話がかかってきて一時的に音量を下げたい場合等に好適である。 Next, FIG. 6 is a diagram illustrating an example in which an audio dip is directed toward the user. In the figure, when the user h1 says “source A mute only”, the control unit 4 extracts “source A” as a source selection command and “mute only here” as a beam setting command. To do. Further, the control unit 4 detects the position of the user h1. Then, the control unit 4 sets the delay amount of the beam control unit 7A so that the volume of the sound of the source A is reduced only by the position of the user h1 (so that a sound dip indicated by a two-dot broken line in the figure is directed). . As a result, the user h1 can direct the audio dip in his / her direction only by saying “source A mute only here” at each location.
The example shown in the figure is suitable for a case where the user is enjoying television or music and wants to temporarily lower the volume due to an incoming call.

次に、図７は、ユーザ以外の方向（特定の方向）に音声ビームを向ける場合の例を示した図である。同図（Ａ）において、ユーザｈ１が「ソースＡ反対方向」と発言すると、制御部４は、ソースの選択のコマンドとして「ソースＡ」を抽出し、ビームの設定のコマンドとして「反対方向」を抽出する。また、制御部４は、ユーザｈ１の位置を検出する。そして、制御部４は、ソースＡの音声ビームがユーザと反対の方向に向けられるように、ビーム制御部７Ａの遅延量を設定する。なお、反対の方向とは、スピーカアレイ５の中心位置Ｏからアレイ長軸方向に直交する方向軸Ｙを挟んで対称となる位置を言う。同図の例では、ユーザｈ１の位置の反対の方向にユーザｈ２が存在する。したがって、ソースＡの音声ビームがユーザｈ２に向けられることとなる。
以上のように、ユーザｈ１は、各場所で「ソースＡ反対方向」と発言するだけで、音声ビームを自身と異なる方向に向けることができる。なお、予め音声ビームを向ける方向を複数設定しておき、その方向に音声ビームを向けることもできる。 Next, FIG. 7 is a diagram illustrating an example in which an audio beam is directed in a direction other than the user (a specific direction). In FIG. 5A, when the user h1 says “source A opposite direction”, the control unit 4 extracts “source A” as a source selection command and “beam opposite direction” as a beam setting command. Extract. Further, the control unit 4 detects the position of the user h1. Then, the control unit 4 sets the delay amount of the beam control unit 7A so that the sound beam of the source A is directed in the direction opposite to the user. Note that the opposite direction refers to a position that is symmetric with respect to the direction axis Y orthogonal to the array major axis direction from the center position O of the speaker array 5. In the example of the figure, the user h2 exists in the direction opposite to the position of the user h1. Therefore, the sound beam of the source A is directed to the user h2.
As described above, the user h1 can direct the sound beam in a direction different from that of the user h1 simply by saying “the direction opposite to the source A” at each place. A plurality of directions in which the sound beam is directed can be set in advance, and the sound beam can be directed in that direction.

同図（Ｂ）において、制御部４は、音声ビームを向ける方向として方向１〜３までの複数の方向を予め設定している。なお、設定する方向の数はこの例に限らない。ここで、ユーザｈ１が「ソースＡ方向１」と発言すると、制御部４は、ソースの選択のコマンドとして「ソースＡ」を抽出し、ビームの設定のコマンドとして「方向１」を抽出する。そして、制御部４は、ソースＡの音声ビームが予め設定した方向１に向けられるように、ビーム制御部７Ａの遅延量を設定する。
図７の例は、ユーザが音楽を楽しんでいるとき、これを他の人に聴かせたい場合等に好適である。また、上記のようにユーザがテレビや音楽を楽しんでいるとき、電話がかかってきて一時的に他の方向に音声ビームを向けたい場合等にも好適である。 In FIG. 5B, the control unit 4 presets a plurality of directions from directions 1 to 3 as directions in which the sound beam is directed. The number of directions to be set is not limited to this example. Here, when the user h1 says “source A direction 1”, the control unit 4 extracts “source A” as a source selection command and “direction 1” as a beam setting command. Then, the control unit 4 sets the delay amount of the beam control unit 7A so that the sound beam of the source A is directed in the preset direction 1.
The example in FIG. 7 is suitable when the user is enjoying music and wants other people to listen to it. Further, when the user is enjoying television or music as described above, it is also suitable for a case where a telephone call is received and the user wants to temporarily direct the sound beam in the other direction.

次に、図８は、ユーザ以外の方向（特定の方向）に音声ディップを向ける場合の例を示した図である。同図（Ａ）において、ユーザｈ１が「ソースＡ反対方向だけミュート」と発言すると、制御部４は、ソースの選択のコマンドとして「ソースＡ」を抽出し、ビームの設定のコマンドとして「反対方向だけミュート」を抽出する。また、制御部４は、ユーザｈ１の位置を検出する。そして、制御部４は、ソースＡの音声がユーザと反対の方向だけ音量が低下するように（図中２点破線で示す音声ディップが向けられるように）、ビーム制御部７Ａの遅延量を設定する。同図の例では、ユーザｈ１の位置の反対の方向にユーザｈ２が存在する。したがって、ソースＡの音声について、ユーザｈ２の位置だけ音量が低下する。 Next, FIG. 8 is a diagram illustrating an example in which the voice dip is directed in a direction other than the user (specific direction). In FIG. 5A, when the user h1 says “Mute only in the direction opposite to the source A”, the control unit 4 extracts “source A” as the command for selecting the source, and “in the opposite direction as the command for setting the beam”. Just extract "mute". Further, the control unit 4 detects the position of the user h1. Then, the control unit 4 sets the delay amount of the beam control unit 7A so that the volume of the sound of the source A is reduced only in the direction opposite to the user (so that the audio dip indicated by a two-dot broken line in the figure is directed). To do. In the example of the figure, the user h2 exists in the direction opposite to the position of the user h1. Therefore, the volume of the sound of the source A is reduced only by the position of the user h2.

以上のように、ユーザｈ１は、各場所で「ソースＡ反対方向だけミュート」と発言するだけで、音声ディップを自身と異なる方向に向けることができる。なお、予め音声ディップを向ける方向を複数設定しておき、その方向に音声ディップを向けることもできる。 As described above, the user h1 can direct the audio dip in a direction different from that of the user h1 only by saying “mute only in the direction opposite to the source A” at each place. It is also possible to set a plurality of directions in which the voice dip is directed in advance and direct the voice dip in that direction.

同図（Ｂ）において、制御部４は、音声ディップを向ける方向として方向１〜３までの複数の方向を設定している。なお、この例においても、設定する方向の数はこの例に限らない。ここで、ユーザｈ１が「ソースＡ方向１だけミュート」と発言すると、制御部４は、ソースの選択のコマンドとして「ソースＡ」を抽出し、ビームの設定のコマンドとして「方向１だけミュート」を抽出する。そして、制御部４は、ソースＡの音声ディップが予め設定した方向１に向けられるように、ビーム制御部７Ａの遅延量を設定する。
図８の例は、赤ちゃんが寝ている方向だけ音量を下げたい場合等に好適である。また、家庭内の電話機の方向を予め設定しておけば、電話がかかってきた場合に、電話機の方向だけ音量を下げることも可能である。 In FIG. 5B, the control unit 4 sets a plurality of directions from directions 1 to 3 as directions in which the audio dip is directed. Also in this example, the number of directions to be set is not limited to this example. Here, when the user h1 says “Mute only in source A direction 1”, the control unit 4 extracts “source A” as a source selection command and “mute only in direction 1” as a beam setting command. Extract. Then, the control unit 4 sets the delay amount of the beam control unit 7A so that the audio dip of the source A is directed in the preset direction 1.
The example of FIG. 8 is suitable for the case where it is desired to decrease the volume only in the direction in which the baby is sleeping. Also, if the direction of the telephone in the home is set in advance, it is possible to reduce the volume only in the direction of the telephone when a call is received.

以上のように、本発明の放収音装置によれば、ユーザが本体やリモコンを操作して複雑な設定を行う必要なく、音声を発するだけで、音声ビーム、音声ディップを容易にコントロールすることができる。 As described above, according to the sound emission and collection device of the present invention, the user can easily control the sound beam and the sound dip only by emitting sound without having to perform complicated settings by operating the main body or the remote control. Can do.

放収音装置の構成を示すブロック図である。It is a block diagram which shows the structure of a sound emission and collection apparatus. 収音ビームの形成概念を示す図である。It is a figure which shows the formation concept of a sound collection beam. 収音ビーム選択部７１の主要構成を示すブロック図である。3 is a block diagram illustrating a main configuration of a sound collection beam selection unit 71. FIG. 各場所で異なるソースの音声を聴く場合の一例を示した図である。It is the figure which showed an example in the case of listening to the sound of a different source in each place. 指向性設定処理の例として、音声ビームをコントロールする例を示す図である。It is a figure which shows the example which controls an audio | voice beam as an example of a directivity setting process. ユーザの方向に音声ディップを向ける場合の例を示した図である。It is the figure which showed the example in the case of directing an audio | voice dip in the direction of a user. ユーザ以外の方向（特定の方向）に音声ビームを向ける場合の例を示した図である。It is the figure which showed the example in the case of directing an audio | voice beam to directions other than a user (specific direction). ユーザ以外の方向（特定の方向）に音声ディップを向ける場合の例を示した図である。It is the figure which showed the example in the case of directing an audio | voice dip in directions (specific direction) other than a user.

Explanation of symbols

１−放収音装置
２−マイクアレイ
３−入出力インタフェース
４−制御部
５−スピーカアレイ
６−エコーキャンセラ
７Ａ，７Ｂ−ビーム制御部
８−ミキサ 1-Sound emitting and collecting device 2-Microphone array 3-Input / output interface 4-Control unit 5-Speaker array 6-Echo canceller 7 A, 7 B-Beam control unit 8-Mixer

Claims

A sound collection unit that collects sound and outputs a sound collection signal;
A sound source position detection unit for detecting a sound source position;
A sound emitting part that emits sound with directivity in a specific direction;
A voice analysis unit that inputs the collected sound signal and extracts a command that directs directivity included in the collected sound signal;
A control unit that sets a directivity pattern of the sound emitting unit based on a sound source position detected by the sound source position detecting unit and a content of a command that instructs the directivity extracted by the voice analysis unit;
A sound emission and collection device.

The voice analysis unit further extracts a command for selecting a source included in the collected sound signal,
The control unit sets the directivity pattern of the selected source voice based on the command for selecting the source extracted by the voice analysis unit,
The sound emission and collection device according to claim 1, wherein the sound emission unit emits sound of different sources with directivity in a plurality of directions at the same time based on the directivity pattern set by the control unit.

The voice analysis unit further extracts a trigger command included in the collected sound signal,
3. The control unit according to claim 1, wherein the control unit sets a directivity pattern thereafter based on the content of the command instructing the directivity only when the voice analysis unit extracts a trigger command. 4. Sound emission and collection device.

The sound emission and collection device according to claim 1, wherein the sound analysis unit extracts a specific rhythm pattern included in the sound collection signal as a command.

The sound emission and collection device according to any one of claims 1 to 4, wherein the control unit sets a directivity pattern so that the volume decreases only in a predetermined direction.

An echo canceller for removing an echo component of the collected sound signal;
6. The sound emission and collection device according to claim 1, wherein the voice analysis unit extracts a command included in a sound collection signal from which an echo component has been removed by the echo canceller.