JP2011066805A

JP2011066805A - Sound collection device and sound collection method

Info

Publication number: JP2011066805A
Application number: JP2009217413A
Authority: JP
Inventors: Takashi Yato; 隆矢頭
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2009-09-18
Filing date: 2009-09-18
Publication date: 2011-03-31

Abstract

<P>PROBLEM TO BE SOLVED: To provide a sound collection device and a sound collection method which can input desired voice in an excellent state even when two or more speakers simultaneously utter. <P>SOLUTION: The sound collection device is provided with: a directivity forming section 101 which forms sound collection directivity in two or more directions using two or more microphones MC1-MCm; a voice signal detection section 102 which detects presence/absence of a voice signal to be collected from the two or more directions; and an utterance selection 103 which executes a simultaneous selection function for simultaneously selecting voice signals from the two or more directions when the voice signals are simultaneously detected from the two or more directions. Thus, even when the two or more speakers simultaneously utter, the desired voice is input at the successful state. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、収音装置および収音方法に関する。 The present invention relates to a sound collection device and a sound collection method.

テレビ／音声会議、遠隔講義、ＩＰ電話等のシステムでは、マイクロホン等の収音装置を用いて音声信号が入力されて遠隔地に送信される。ところで、収音装置として無指向性マイクロホンを用いる場合、受信したい話者の音声とともに、周辺の雑音、反響、雑談等も受信されてしまい、所望の音声が聴取し難くなるという問題がある。 In a system such as a TV / voice conference, a remote lecture, or an IP phone, a voice signal is input using a sound collection device such as a microphone and transmitted to a remote place. By the way, when an omnidirectional microphone is used as a sound collection device, there is a problem that ambient noise, echoes, chats, etc. are received along with the voice of the speaker to be received, making it difficult to hear the desired voice.

上記問題に鑑みて、下記特許文献１、２は、２個以上のマイクロホンを用いて２以上の方向に収音指向性を形成し、２以上の収音ビームのうち最大レベルの信号を伴う収音ビームの方向に所望の音源が存在するとして収音方向を限定する技術を開示している。 In view of the above problems, Patent Documents 1 and 2 below form a sound collection directivity in two or more directions using two or more microphones, and collect a signal with a maximum level signal among two or more sound collection beams. A technique for limiting the sound collection direction on the assumption that a desired sound source exists in the direction of the sound beam is disclosed.

特許文献１には、最大レベルの信号を伴う収音ビームの方向を検出し、当該方向に指向性の照準を合わせて収音される音声信号を、話者の音声として入力することが記載されている。 Patent Document 1 describes that a direction of a sound collecting beam accompanied by a signal of a maximum level is detected, and a sound signal picked up with a directivity aiming in the direction is input as a speaker's voice. ing.

特許文献２には、最大レベルの信号を伴う収音ビームおよび当該ビームに隣接する収音ビームにより収音される音声信号を、話者の音声として入力することが記載されている。 Patent Document 2 describes that a sound collecting beam accompanied by a signal of the maximum level and a sound signal collected by a sound collecting beam adjacent to the beam are input as a speaker's sound.

特開２００３−３０４５８９号公報JP 2003-304589 A 特開２００７−１３４００号公報JP 2007-13400 A

上記方法は、いずれも最大レベルの信号を伴う収音ビームの方向に話者が存在することを想定して、収音方向を限定するものである。しかし、会議システム等の利用状況を想定すると、話者が１人に限定されず、２人以上の話者が同時に発話する場合もしばしば生じる。また、このような利用状況は、会議システム等に限定されず、遠隔講義、ＩＰ電話等のシステムでも同様に想定される。 All of the above methods limit the sound collection direction on the assumption that a speaker is present in the direction of the sound collection beam with the maximum level signal. However, assuming the use situation of a conference system or the like, the number of speakers is not limited to one, and two or more speakers often speak at the same time. Moreover, such a use situation is not limited to a conference system or the like, and is similarly assumed in a system such as a remote lecture or an IP phone.

この場合、話者を１人に限定して収音すると、他の話者の音声が入力されず、一方、話者を限定せずに収音すると、複数の話者の音声が混信して所望の音声が聴取し難くなってしまう。 In this case, if only one speaker is picked up, the voices of other speakers are not input. On the other hand, if the voice is picked up without limiting the speakers, the voices of a plurality of speakers are mixed. It becomes difficult to hear the desired sound.

そこで、本発明は、２人以上の話者が同時に発話する場合でも、所望の音声を良好な状態で入力可能な、収音装置および収音方法を提供しようとするものである。 Therefore, the present invention is intended to provide a sound collection device and a sound collection method that can input desired sound in a good state even when two or more speakers speak at the same time.

本発明のある実施形態によれば、２個以上のマイクロホンを用いて２以上の方向に収音指向性を形成する指向性形成部と、２以上の方向から収音される音声信号の有無を検出する音声信号検出部と、２以上の方向から同時に音声信号が検出される場合に、２以上の方向からの音声信号を同時に選択する同時選択機能を実行する発話選択部と、を備える収音装置が提供される。 According to an embodiment of the present invention, a directivity forming unit that forms sound collection directivity in two or more directions using two or more microphones, and presence / absence of an audio signal collected from two or more directions are determined. A sound collection unit comprising: an audio signal detection unit to detect; and an utterance selection unit that executes a simultaneous selection function of simultaneously selecting audio signals from two or more directions when audio signals are detected simultaneously from two or more directions. An apparatus is provided.

かかる構成によれば、２以上の方向から同時に音声信号が検出される場合でも、２以上の方向からの音声信号を選択的に入力することができる。これにより、２人以上の話者が同時に発話する場合でも、所望の音声を良好な状態で入力することができる。 According to this configuration, even when audio signals are detected simultaneously from two or more directions, audio signals from two or more directions can be selectively input. Thereby, even when two or more speakers speak at the same time, a desired voice can be input in a good state.

また、上記発話選択部は、同時選択機能を実行する際に、２以上の方向から同時に検出される音声信号のうち最大レベルで検出される音声信号を基準とする、他の音声信号のレベル比が所定の閾値未満である場合に、所定の閾値未満のレベルで検出される音声信号の方向を選択しなくてもよい。 In addition, when the speech selection unit performs the simultaneous selection function, the level ratio of other audio signals based on the audio signal detected at the maximum level among the audio signals detected simultaneously from two or more directions. Is less than a predetermined threshold, the direction of the audio signal detected at a level less than the predetermined threshold may not be selected.

また、上記発話選択部は、同時選択機能を実行する際に、所定数を超える方向から同時に音声信号が検出される場合に、２個以上のマイクロホンの少なくともいずれか１個を用いて無指向性収音を行ってもよい。 In addition, when the speech selection unit executes the simultaneous selection function and the audio signal is simultaneously detected from directions exceeding a predetermined number, the utterance selection unit uses omnidirectionality by using at least one of two or more microphones. Sound collection may be performed.

また、上記発話選択部は、２以上の方向からの音声信号を同時に選択する同時選択機能、または１の優先方向からの音声信号を選択する優先選択機能のいずれかを選択して実行してもよい。 Further, the utterance selection unit may select and execute either a simultaneous selection function for simultaneously selecting audio signals from two or more directions or a priority selection function for selecting audio signals from one priority direction. Good.

また、上記発話選択部は、優先選択機能を実行する際に、ユーザーの指示に応じて、２以上の方向から同時に検出される音声信号のうち最も先行して検出される音声信号の方向を、１の優先方向として選択してもよい。 In addition, when the speech selection unit executes the priority selection function, the direction of the voice signal detected most first among the voice signals detected simultaneously from two or more directions in accordance with a user instruction, One priority direction may be selected.

また、上記発話選択部は、優先選択機能を実行する際に、ユーザーの指示に応じて、２以上の方向から同時に検出される音声信号のうち最大レベルで検出される音声信号の方向を、１の優先方向として選択してもよい。 In addition, when executing the priority selection function, the utterance selection unit determines the direction of the audio signal detected at the maximum level from among the audio signals detected simultaneously from two or more directions according to a user instruction. May be selected as the preferred direction.

また、上記発話選択部は、ユーザーの指示に応じて、同時選択機能または優先選択機能のいずれかを選択してもよい。 Further, the utterance selection unit may select either the simultaneous selection function or the priority selection function according to a user instruction.

また、本発明の別の実施形態によれば、２個以上のマイクロホンを用いて２以上の方向に収音指向性を形成するステップと、２以上の方向から収音される音声信号の有無を検出するステップと、２以上の方向から同時に音声信号が検出される場合に、２以上の方向からの音声信号を同時に選択する同時選択機能を実行するステップと、を含む収音方法が提供される。 According to another embodiment of the present invention, the step of forming sound collection directivity in two or more directions using two or more microphones and the presence or absence of an audio signal collected from two or more directions are determined. And a step of performing a simultaneous selection function of simultaneously selecting audio signals from two or more directions when audio signals are detected simultaneously from two or more directions. .

本発明によれば、２人以上の話者が同時に発話する場合でも、所望の音声を良好な状態で入力可能な、収音装置および収音方法を提供することができる。 According to the present invention, it is possible to provide a sound collection device and a sound collection method capable of inputting desired sound in a good state even when two or more speakers speak at the same time.

本発明の一実施形態に係る収音装置の構成を示すブロック図である。It is a block diagram which shows the structure of the sound collection device which concerns on one Embodiment of this invention. ビームフォーミングの基本原理を示す図である。It is a figure which shows the basic principle of beam forming. 収音装置の詳細を説明する図（１／２）である。It is a figure (1/2) explaining the detail of a sound collection device. 収音装置の詳細を説明する図（２／２）である。It is a figure (2/2) explaining the detail of a sound-collecting apparatus. 本発明の一実施形態に係る収音方法を示すフロー図である。It is a flowchart which shows the sound collection method which concerns on one Embodiment of this invention. 優先選択動作時の処理を説明する図（１／３）である。It is a figure (1/3) explaining the process at the time of a priority selection operation | movement. 優先選択動作時の処理を説明する図（２／３）である。It is a figure (2/3) explaining the process at the time of a priority selection operation | movement. 優先選択動作時の処理を説明する図（３／３）である。It is a figure (3/3) explaining the process at the time of a priority selection operation | movement. 同時選択動作時の処理を説明する図（１／３）である。It is a figure (1/3) explaining the process at the time of simultaneous selection operation | movement. 同時選択動作時の処理を説明する図（２／３）である。It is a figure (2/3) explaining the process at the time of simultaneous selection operation | movement. 同時選択動作時の処理を説明する図（３／３）である。It is a figure (3/3) explaining the process at the time of simultaneous selection operation | movement.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書および図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Exemplary embodiments of the present invention will be described below in detail with reference to the accompanying drawings. In the present specification and drawings, components having substantially the same functional configuration are denoted by the same reference numerals, and redundant description is omitted.

［１．収音装置］
以下では、図１〜図４を参照しながら、本発明の一実施形態に係る収音装置について説明する。図１は、本発明の一実施形態に係る収音装置の構成を示すブロック図である。図２は、ビームフォーミングの基本原理を示す図である。図３および図４は、収音装置の詳細を説明する図である。 [1. Sound collection device]
Hereinafter, a sound collecting apparatus according to an embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a block diagram showing a configuration of a sound collection device according to an embodiment of the present invention. FIG. 2 is a diagram showing the basic principle of beam forming. 3 and 4 are diagrams for explaining the details of the sound collecting device.

収音装置は、ｍ個のマイクロホン１００−１〜１００−ｍ（ＭＣ１〜ＭＣｍ）からなるマイクロホンアレイ、指向性形成部１０１、音声信号検出部１０２、発話選択部１０３、操作部１０４、セレクタ１０５、ミキサ１０６を含む。収音装置は、所定の配列で配置された２個以上のマイクロホンＭＣ１〜ＭＣｍから入力される信号に、指向性形成部１０１により所定の信号処理を施し、任意の方向に収音指向性を形成する。 The sound collection device includes a microphone array composed of m microphones 100-1 to 100-m (MC1 to MCm), a directivity forming unit 101, an audio signal detection unit 102, an utterance selection unit 103, an operation unit 104, a selector 105, A mixer 106 is included. The sound collection device performs predetermined signal processing by the directivity forming unit 101 on signals input from two or more microphones MC1 to MCm arranged in a predetermined arrangement, and forms sound collection directivity in an arbitrary direction. To do.

本実施形態では、話者Ｓの方向を検出するために、ビームフォーミングの原理を用いて収音指向性が形成される。ビームフォーミングの原理を用いると、２個以上の無指向性マイクロホンを用いて任意の方向に収音指向性を形成することができる。なお、以下では、収音方向をＪ方向に分割する場合を想定する。 In this embodiment, in order to detect the direction of the speaker S, the sound collection directivity is formed using the principle of beam forming. If the principle of beam forming is used, the sound collection directivity can be formed in an arbitrary direction using two or more omnidirectional microphones. In the following, it is assumed that the sound collection direction is divided in the J direction.

なお、収音指向性は、例えば、Ｊ個の指向性マイクロホン１００−１〜１００−Ｊを準備し、円周をＪ等分した各円弧の中心方向に各マイクロホン１００−１〜１００−Ｊの指向性を向けて設置することで形成されてもよい。 Note that the sound collection directivity is obtained by, for example, preparing J directional microphones 100-1 to 100 -J and arranging the microphones 100-1 to 100 -J in the center direction of each arc obtained by equally dividing the circumference into J. You may form by directing and directing.

図２には、θ方向から到来する音波（平面波）を距離ｌで離間して配置された２個のマイクロホンＭＣ１、ＭＣ２により受信する場合が示されている。θ方向から到来する音波は、マイクＭＣ１により受信された後、距離ｄだけ伝播してマイクＭＣ２により受信される。ここで、距離ｄは、次式で表される。
ｄ＝ｌｓｉｎθ・・・（１） FIG. 2 shows a case where sound waves (plane waves) arriving from the θ direction are received by two microphones MC1 and MC2 arranged at a distance l. A sound wave coming from the θ direction is received by the microphone MC1, and then propagated by a distance d and received by the microphone MC2. Here, the distance d is expressed by the following equation.
d = lsin θ (1)

よって、マイクＭＣ２の受信信号ｘ２（ｔ）は、マイクＭＣ１の受信信号ｘ_１（ｔ）に比して、音波が距離ｄの伝播に要する時間差（遅延量）τで遅れた信号となる。ここで、受信信号ｘ_２（ｔ）および遅延量τは、次式で表される。
ｘ_２(ｔ)＝ｘ_１（ｔ−τ）・・・（２）
τ＝ｄ／ｃ（ｃ：音速）・・・（３） Thus, the received signal x2 of the microphone MC2 (t) is different from the received signal _x 1 of the microphone MC1 (t), the time difference required for the propagation waves of the distance d (delay amount) becomes delayed signal in tau. Here, the received signal x ₂ (t) and the delay amount τ are expressed by the following equations.
x ₂ (t) = x ₁ (t−τ) (2)
τ = d / c (c: speed of sound) (3)

よって、受信信号ｘ_１（ｔ）に遅延量τを付加して受信信号ｘ_２（ｔ）に加算すると、同位相の信号同士が加算された加算信号ｂ（ｔ）では、特定方向θから到来する音波（の振幅）が強調される。なお、θの符号が逆であれば、受信信号ｘ_２（ｔ）に遅延量τが付加される。
ｂ（ｔ）＝ｘ_２（ｔ）＋ｘ_１（ｔ−τ）・・・（４） Therefore, when the delay amount τ is added to the received signal x ₁ (t) and added to the received signal x ₂ (t), the added signal b (t) obtained by adding the signals having the same phase comes from the specific direction θ. The sound wave (the amplitude) is emphasized. If the sign of θ is reversed, a delay amount τ is added to the received signal x ₂ (t).
b (t) = x ₂ (t) + x ₁ (t−τ) (4)

上記原理による指向性形成は、時間領域と同様に周波数領域でも行うことができる。時間軸上で遅延量τを付加した信号のフーリエ変換は、元の信号のフーリエ変換にｅ^−ｊωτを乗じたものとなる。よって、加算信号ｂ（ｔ）、受信信号ｘ_１（ｔ）、ｘ_２（ｔ）のフーリエ変換をＢ（ω）、Ｘ_１（ω）、Ｘ_２（ω）とすると、時間軸上の加算信号ｂ（ｔ）は、周波数軸上で次式により表される。
Ｂ（ω）＝Ｘ_２（ω）＋ｅ^−ｊωτＸ_１（ω）・・・（５） Directivity formation based on the above principle can be performed in the frequency domain as well as in the time domain. The Fourier transform of the signal with the delay amount τ on the time axis is ^obtained by multiplying the Fourier transform of the original signal by e ^−jωτ . Therefore, if the Fourier transform of the addition signal b (t) and the reception signals x ₁ (t), x ₂ (t) is B (ω), X ₁ (ω), X ₂ (ω), the addition on the time axis The signal b (t) is expressed by the following equation on the frequency axis.
B (ω) = X ₂ (ω) + e ^−jωτ X ₁ (ω) (5)

ここで、時間軸上では、デジタル処理を行う場合に遅延量τをサンプリング周期間隔でしか選択することができない。一方、周波数軸上では、遅延量τを任意に選択できるので、遅延量τを変化させて指向方向を任意に設定することができる。 Here, on the time axis, when digital processing is performed, the delay amount τ can be selected only at the sampling cycle interval. On the other hand, since the delay amount τ can be arbitrarily selected on the frequency axis, the pointing direction can be arbitrarily set by changing the delay amount τ.

図３には、周波数領域で指向性形成を行う指向性形成部１０１の構成が示されている。指向性形成部１０１は、高速フーリエ変換を用いた時間−周波数変換部３０１−１〜３０１−ｍ（ＦＦＴ（高速フーリエ変換）部とも称する。）、遅延制御部３０２、乗算部３０３−１〜３０３−ｍ、加算部３０４を含む。 FIG. 3 shows a configuration of the directivity forming unit 101 that performs directivity formation in the frequency domain. The directivity forming unit 101 includes time-frequency conversion units 301-1 to 301-m (also referred to as FFT (Fast Fourier Transform) units) using a fast Fourier transform, a delay control unit 302, and multiplication units 303-1 to 303. -M, including an adder 304.

直線状に距離ｌで離間して配置されたｍ個のマイクロホンＭＣ１〜ＭＣｍにより収音された音声信号ｘ_１（ｔ）〜ｘ_ｍ（ｔ）（図４参照）は、不図示のＡ／Ｄ変換器によりデジタル信号に変換され、指向性形成部１０１に供給される。マイクＭＣ２、ＭＣ３、・・・、ＭＣｍには、音源の方向θに起因して、マイクＭＣ１を基準として到達時間差τ、２τ、・・・、（ｍ−１）τが生じる。 Audio signals x ₁ (t) to x _m (t) (see FIG. 4) collected by m microphones MC1 to MCm arranged linearly at a distance of l are A / D (not shown). The signal is converted into a digital signal by the converter and supplied to the directivity forming unit 101. The microphones MC2, MC3,..., MCm have arrival time differences τ, 2τ,..., (M−1) τ with respect to the microphone MC1 due to the sound source direction θ.

よって、到達時間差τ、２τ、・・・、（ｍ−１）τに相当する遅延量を受信信号ｘ_ｍ−１（ｔ）、ｘ_ｍ−２（ｔ）、・・・、ｘ_１（ｔ）に付加することで、全ての信号が同位相化される。同位相化された信号を加算することで、θ方向から到来する音声のみが強調される。そして、マイクロホンＭＣの増加に比例して加算される信号が増加すると、指向方向の利得が増加する。 Therefore, the arrival time difference τ, 2τ, ···, (m -1) received signal a delay amount corresponding to _{_{τ x m-1 (t)}} , x m-2 (t), ···, x 1 (t ), All signals are in phase. By adding the in-phase signals, only the voice coming from the θ direction is enhanced. And if the signal added in proportion to the increase in the microphone MC increases, the gain in the directivity direction increases.

指向性形成部１０１は、上記原理を周波数領域で実現する。受信信号ｘ_１（ｔ）、ｘ_２（ｔ）、・・・、ｘ_ｍ（ｔ）は、ＦＦＴ部３０１−１〜３０１−ｍによりスペクトルＸ_１（ω）、Ｘ_２（ω）、・・・、Ｘ_ｍ（ω）に変換され、乗算部３０３−１〜３０３−ｍにより遅延係数が乗じられて遅延が付加される。 The directivity forming unit 101 realizes the above principle in the frequency domain. The received signals x ₁ (t), x ₂ (t),..., X _m (t) are converted into spectra X ₁ (ω), X ₂ (ω),. .., X _m (ω), multiplied by a delay coefficient by the multipliers 303-1 to 303-m, and a delay is added.

遅延係数は、指向性の方向に応じて、遅延制御部３０２により乗算部３０３−１〜３０３−ｍ−１に供給される。遅延係数は、距離差ｄにより生じる時間差τを想定すると、ｅ^{−ｊω（ｍ−１）τ}、ｅ^{−ｊω（ｍ−２）τ}、・・・、ｅ^−ｊωτとなる。なお、θの符号が逆であれば、マイクＭＣｍが音源に最も近接するので、スペクトルＸ_ｍ（ω）に最大の遅延量（ｍ−１）τが付加される。 The delay coefficient is supplied to the multipliers 303-1 to 303-m-1 by the delay controller 302 according to the direction of directivity. Assuming a time difference τ caused by the distance difference d, the delay coefficients are e ^{−jω (m−1) τ} , e ^{−jω (m−2) τ} ,..., E ^−jωτ . If the sign of θ is reversed, since the microphone MCm is closest to the sound source, the maximum delay amount (m−1) τ is added to the spectrum X _m (ω).

遅延制御部３０２は、マイクアレイの周囲に均等な指向性を形成するように遅延量を制御する。これにより、各方向からの収音信号スペクトルＢ_１（ω）、Ｂ_２（ω）、・・・、Ｂ_Ｊ（ω）が得られる。 The delay control unit 302 controls the delay amount so as to form uniform directivity around the microphone array. Thereby, the collected sound signal spectrums B ₁ (ω), B ₂ (ω),..., B _J (ω) from each direction are obtained.

図３に示すＢ_０（ω）は、任意のマイクＭＣの収音信号スペクトルを直接出力したものであり、指向性を有しない無指向性（全方位）信号である。なお、図３では、Ｂ_０（ω）として受信信号ｘ_１（ｔ）のスペクトルＸ_１（ω）が出力されているが、他のマイクＭＣの受信信号が出力されてもよい。 B ₀ (ω) shown in FIG. 3 is a direct output of a collected signal spectrum of an arbitrary microphone MC, and is a non-directional (omnidirectional) signal having no directivity. In FIG. 3, the spectrum X ₁ (ω) of the received signal x ₁ (t) is output as B ₀ (ω), but the received signal of another microphone MC may be output.

図１に説明を戻すと、音声信号検出部１０２は、各方向からの収音信号から音声信号の有無を検出する。音声検出は、収音信号中に音声信号の有無を検出可能な公知の方法を用いて行われる。音声検出では、例えば、信号レベルを基準とし、所定レベル以上の受信信号が所定時間以上継続した場合に音声入力の開始が判定され、所定レベル未満の受信信号が所定時間以上継続した場合に音声入力の終了が判定される。音声信号検出部１０２は、収音した全ての方向の信号について音声信号の有無を検出し、検出結果および信号レベル（レベル情報）を発話選択部１０３に供給する。 Returning to FIG. 1, the audio signal detection unit 102 detects the presence or absence of an audio signal from the collected sound signals from each direction. The sound detection is performed using a known method capable of detecting the presence or absence of a sound signal in the collected sound signal. In voice detection, for example, the start of voice input is determined when a received signal of a predetermined level or higher continues for a predetermined time or more with reference to the signal level. The end of is determined. The audio signal detection unit 102 detects the presence or absence of an audio signal for the collected signals in all directions, and supplies the detection result and the signal level (level information) to the utterance selection unit 103.

発話選択部１０３は、２以上の方向から同時に音声信号が検出される場合に、２以上の方向からの音声信号を同時に選択する同時選択動作、または１の優先方向からの音声を選択する優先選択動作を行う。ユーザーは、同時選択動作または優先選択動作を動作モードとして指定することができる。動作モードは、操作部１０４のディップスイッチ等を介してユーザーにより指定され、発話選択部１０３に通知される。 The utterance selection unit 103 simultaneously selects an audio signal from two or more directions simultaneously when an audio signal is detected from two or more directions, or a priority selection to select audio from one priority direction. Perform the action. The user can specify the simultaneous selection operation or the priority selection operation as the operation mode. The operation mode is designated by the user via a dip switch or the like of the operation unit 104 and is notified to the utterance selection unit 103.

［２．収音方法］
以下では、図５〜図７を参照しながら、本発明の一実施形態に係る収音方法について説明する。図５は、本発明の一実施形態に係る収音方法を示すフロー図である。図６Ａ〜６Ｃおよび図７Ａ〜７Ｃは、優先選択動作時および同時選択動作時の発話選択部１０３の処理を各々に説明する図である。 [2. Sound collection method]
Hereinafter, a sound collection method according to an embodiment of the present invention will be described with reference to FIGS. FIG. 5 is a flowchart showing a sound collection method according to an embodiment of the present invention. FIGS. 6A to 6C and FIGS. 7A to 7C are diagrams for explaining processing of the utterance selection unit 103 during the priority selection operation and the simultaneous selection operation, respectively.

まず、発話選択部１０３は、音声信号が検出された方向の数（音声検出数ｎ）を評価する（ステップＳ３０１）。音声検出数ｎ＝０の場合、特定の方向に指向性を有する信号が検出されていないので、無指向性（全方位）信号が選択されて無指向性収録が行われる（Ｓ３０９）。音声検出数ｎ＝１の場合、音声信号が検出された方向が選択される（Ｓ３０２）。 First, the utterance selection unit 103 evaluates the number of directions in which a voice signal is detected (sound detection number n) (step S301). When the number of detected voices n = 0, a signal having directivity in a specific direction has not been detected, and therefore an omnidirectional (omnidirectional) signal is selected and omnidirectional recording is performed (S309). When the number of detected voices n = 1, the direction in which the voice signal is detected is selected (S302).

一方、音声検出数ｎ＞１の場合、発話選択部１０３は、動作モードの指定状況を確認し（Ｓ３０３）、同時選択動作または優先選択動作を選択する。そして、優先選択動作が選択されている場合に、優先方向の選択基準が選択される（Ｓ３０４）。選択基準は、動作モードと同様に、操作部１０４を介してユーザーにより指定され、発話選択部１０３に通知されるものとする。 On the other hand, when the number of detected voices n> 1, the utterance selection unit 103 confirms the operation mode designation status (S303), and selects the simultaneous selection operation or the priority selection operation. Then, when the priority selection operation is selected, the selection criterion for the priority direction is selected (S304). As in the operation mode, the selection criterion is designated by the user via the operation unit 104 and notified to the utterance selection unit 103.

第１の選択基準では、２以上の方向から同時に検出されている音声信号のうち最も先行して検出されている音声信号の方向が選択される。この場合、所定レベル以上の音声が収音されている限り、後続話者Ｓ´の音声信号のレベルに拘らずに、先行する優先話者Ｓｐの音声を継続して収音することができるが、後続話者Ｓ´の音声信号を選択することができない。 In the first selection criterion, the direction of the audio signal detected most precedingly is selected from the audio signals detected simultaneously from two or more directions. In this case, as long as the voice of a predetermined level or higher is collected, the voice of the preceding priority speaker Sp can be continuously collected regardless of the level of the voice signal of the subsequent speaker S ′. The voice signal of the subsequent speaker S ′ cannot be selected.

第２の選択基準では、２以上の方向から同時に検出されている音声信号のうち最大レベルで検出されている音声信号の方向が選択される。この場合、先行する優先話者Ｓｐの音声信号よりもレベルが高ければ、後続話者Ｓ´の音声信号を選択することができるが、先行話者Ｓの音声を継続して収音することができない。 In the second selection criterion, the direction of the audio signal detected at the maximum level is selected from the audio signals detected simultaneously from two or more directions. In this case, if the level is higher than the voice signal of the preceding priority speaker Sp, the voice signal of the subsequent speaker S ′ can be selected, but the voice of the preceding speaker S can be continuously collected. Can not.

第１の選択基準が選択されている場合に、先行して音声信号が検出されている方向の話者Ｓを優先話者Ｓｐとして、音声検出が行われなくなるまで当該方向が選択される（Ｓ３０５）。一方、第２の選択基準が選択されている場合に、最大レベルで検出されている音声の方向の話者Ｓを優先話者Ｓｐとして、最大レベルで検出されている音声信号の方向が変更されるまで当該方向が選択される（Ｓ３０６）。 When the first selection criterion is selected, the speaker S in the direction in which the voice signal is detected in advance is set as the priority speaker Sp, and the direction is selected until voice detection is not performed (S305). ). On the other hand, when the second selection criterion is selected, the direction of the voice signal detected at the maximum level is changed with the speaker S in the direction of the voice detected at the maximum level as the priority speaker Sp. This direction is selected until it is determined (S306).

図６Ａ〜６Ｃには、選択基準に基づく優先選択動作時の処理が示されている。図６Ａでは、話者Ｓ１（優先話者Ｓｐ）の音声が収音ビームＢ１により受信されている。ここで、話者Ｓ１よりも大きな音量で話者Ｓ２が発話を開始する場合を想定する。図６Ｂに示すように、第１の選択基準が選択されていれば、話者Ｓ１を優先話者Ｓｐとして、先行する話者Ｓ１の音声信号の収音が優先され、収音ビームＢ１により話者Ｓ１の音声信号の受信が継続される。一方、図６Ｃに示すように、第２の選択基準が選択されていれば、話者Ｓ１の代わりに話者Ｓ２を優先話者Ｓｐとして、音量の大きな話者Ｓ２の音声信号の収音が優先され、収音ビームＢ１の代わりに収音ビームＢ２により話者Ｓ２の音声信号の受信が開始される。 6A to 6C show processing during the priority selection operation based on the selection criterion. In FIG. 6A, the voice of the speaker S1 (priority speaker Sp) is received by the sound collection beam B1. Here, it is assumed that the speaker S2 starts speaking at a louder volume than the speaker S1. As shown in FIG. 6B, if the first selection criterion is selected, the sound collection of the speech signal of the preceding speaker S1 is given priority with the speaker S1 as the priority speaker Sp, and the speech is collected by the sound collection beam B1. Reception of the voice signal of the person S1 is continued. On the other hand, as shown in FIG. 6C, if the second selection criterion is selected, the voice signal of the louder speaker S2 is picked up with the speaker S2 as the priority speaker Sp instead of the speaker S1. Priority is given and reception of the voice signal of the speaker S2 is started by the sound collection beam B2 instead of the sound collection beam B1.

ステップＳ３０３の処理で同時選択動作が選択されている場合に、音声信号が検出される各方向の音声レベルが評価され、所定の基準を満たす方向の数が音声検出数ｎ´として求められる（Ｓ３０７）。ここで、各方向の音声信号は、最大レベルの音声信号に対して所定の比率以上のレベルを伴う場合に、所定の基準を満たすと判断される。 When the simultaneous selection operation is selected in the process of step S303, the sound level in each direction in which the sound signal is detected is evaluated, and the number of directions satisfying a predetermined criterion is obtained as the sound detection number n ′ (S307). ). Here, the audio signal in each direction is determined to satisfy a predetermined criterion when accompanied by a level equal to or higher than a predetermined ratio with respect to the audio signal of the maximum level.

そして、音声検出数ｎ´が所定の閾値ｎ_ｍａｘを超えるかが判定される（Ｓ３０８）。多数の方向から同時に音声信号が検出される場合に、該当する全ての方向が選択されてもよいが、方向の数が余り多くなると、指向性形成により特定の方向からの音声信号を収音する意義が薄れてしまう。つまり、収音方向を絞ると発話が聴取し易くなる一方で、収音方向以外の環境音に伴う雰囲気が伝わり難くなる。そして、同時発話数が多くなりすぎると、発話が聴取され難くなるとともに、雰囲気も伝わり難くなる。よって、同時発話数が多くなりすぎた場合は、無指向性マイクに切替えて、雰囲気が伝わり易くなるようにした方が望ましいためである。 Then, it is determined whether the number of detected voices n ′ exceeds a predetermined threshold value n _max (S308). When audio signals are detected simultaneously from a large number of directions, all applicable directions may be selected. However, if the number of directions increases, sound signals from a specific direction are collected by directivity formation. The significance will fade. That is, when the sound collection direction is narrowed, it is easy to hear the utterance, but the atmosphere accompanying the environmental sound other than the sound collection direction is difficult to be transmitted. And if the number of simultaneous utterances increases too much, it will be difficult to hear the utterances and the atmosphere will be difficult to convey. Therefore, when the number of simultaneous utterances increases too much, it is desirable to switch to an omnidirectional microphone so that the atmosphere can be easily transmitted.

このため、本実施形態では、同時に選択可能な方向数の閾値ｎ_ｍａｘを設定する。そして、音声検出数ｎ´が閾値ｎ_ｍａｘを超える場合に、特定の方向が選択されずに無指向性（全方位）信号が選択されて無指向収録が行われる（Ｓ３０９）。 For this reason, in this embodiment, the threshold value n _max of the number of directions that can be selected simultaneously is set. When the number of detected voices n ′ exceeds the threshold n _max , a non-directional (omnidirectional) signal is selected without selecting a specific direction, and omnidirectional recording is performed (S309).

一方、音声検出数ｎ´が閾値ｎ_ｍａｘ以下である場合には、前述した所定の基準を満たす方向が同時に選択される（Ｓ３１０）。これは、検出された信号間でレベル差が大きい場合には、混信を回避するために、最大レベルの音声信号に対して所定の比率未満のレベルを伴う音声信号の方向を選択しないことが望ましいためである。 On the other hand, when the number of detected voices n ′ is equal to or smaller than the threshold value n _max , directions that satisfy the above-described predetermined criterion are simultaneously selected (S310). This is because when the level difference between detected signals is large, it is desirable not to select the direction of an audio signal with a level less than a predetermined ratio with respect to the audio signal at the maximum level in order to avoid interference. Because.

図７Ａ〜７Ｃには、音声検出数ｎ´に基づく同時選択動作時の処理が示されている。図７Ａでは、話者Ｓ１〜Ｓ４の音声信号が収音ビームＢ１〜Ｂ４により受信されている。なお、音声検出数ｎ´の閾値ｎ_ｍａｘが４に設定されている。ここで、図７Ｂに示すように、話者Ｓ１〜Ｓ４の音声信号のうち、最大レベルで検出されている話者Ｓ１の音声信号を基準として、話者Ｓ２〜Ｓ３の音声信号が所定未満のレベルを伴う場合、収音ビームＢ１〜Ｂ４の代わりに収音ビームＢ１により話者Ｓ１の音声信号が受信される。 7A to 7C show processing at the time of the simultaneous selection operation based on the number of detected voices n ′. In FIG. 7A, the voice signals of the speakers S1 to S4 are received by the sound collecting beams B1 to B4. Note that the threshold n _{max for the} number of detected voices n ′ is set to 4. Here, as shown in FIG. 7B, the voice signals of the speakers S2 to S3 are less than a predetermined value based on the voice signal of the speaker S1 detected at the maximum level among the voice signals of the speakers S1 to S4. When the level is accompanied, the voice signal of the speaker S1 is received by the sound collection beam B1 instead of the sound collection beams B1 to B4.

また、図７Ｃに示すように、話者Ｓ５が発話を開始すると、音声検出数ｎ´（＝５）が閾値ｎ_ｍａｘ（＝４）を超えるので、収音ビームＢ１〜Ｂ４の代わりに収音ビームＢ０により無指向性収録が行われる。なお、図７Ｃでは、マイクロホンＭＣ１〜ＭＣｍの前面にのみ収音ビームＢ０が示されているが、側面および背面にも形成される。 Also, as shown in FIG. 7C, when the speaker S5 starts speaking, the number of detected voices n ′ (= 5) exceeds the threshold value n _max (= 4), so that sound is collected instead of the sound collecting beams B1 to B4. Omnidirectional recording is performed by the beam B0. In FIG. 7C, the sound collection beam B0 is shown only on the front surface of the microphones MC1 to MCm, but it is also formed on the side surface and the back surface.

そして、発話選択部１０３は、選択された指向方向を指向方向の選択情報としてセレクタ１０５に通知する（Ｓ３１１）。セレクタ１０５は、選択情報に基づいて、指向性形成部１０１の出力信号から選択すべき方向の信号を抽出してミキサ１０６に供給する。ミキサ１０６は、供給された信号をミキシングし、送信音声信号として出力する。 Then, the utterance selection unit 103 notifies the selector 105 of the selected directional direction as selection information of the directional direction (S311). The selector 105 extracts a signal in a direction to be selected from the output signal of the directivity forming unit 101 based on the selection information and supplies the extracted signal to the mixer 106. The mixer 106 mixes the supplied signal and outputs it as a transmission audio signal.

［３．まとめ］
以上説明したように、上記実施形態に係る収音装置および収音方法によれば、２以上の方向から同時に音声信号が検出される場合でも、２以上の方向からの音声信号を選択的に入力することができる。これにより、２人以上の話者Ｓが同時に発話する場合でも、所望の音声を良好な状態で入力することができる。 [3. Summary]
As described above, according to the sound collection device and the sound collection method according to the above embodiment, even when audio signals are detected simultaneously from two or more directions, audio signals from two or more directions are selectively input. can do. Thereby, even when two or more speakers S speak at the same time, a desired voice can be input in a good state.

以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field to which the present invention pertains can come up with various changes or modifications within the scope of the technical idea described in the claims. Of course, it is understood that these also belong to the technical scope of the present invention.

１００マイクロホン
１０１指向性形成部
１０２音声信号検出部
１０３発話選択部
１０４操作部
１０５セレクタ
１０６ミキサ
３０１時間−周波数変換部（ＦＦＴ部）
３０２遅延制御部
３０３乗算部
３０４加算部 DESCRIPTION OF SYMBOLS 100 Microphone 101 Directionality formation part 102 Audio | voice signal detection part 103 Speech selection part 104 Operation part 105 Selector 106 Mixer 301 Time-frequency conversion part (FFT part)
302 delay control unit 303 multiplication unit 304 addition unit

Claims

A directivity forming unit that forms sound collection directivity in two or more directions using two or more microphones;
An audio signal detector that detects the presence or absence of an audio signal collected from the two or more directions;
An utterance selection unit that executes a simultaneous selection function of simultaneously selecting audio signals from the two or more directions when audio signals are detected simultaneously from the two or more directions;
A sound collecting device.

The speech selection unit, when executing the simultaneous selection function, a level ratio of other audio signals based on an audio signal detected at a maximum level among audio signals detected simultaneously from the two or more directions. The sound collection device according to claim 2, wherein the direction of an audio signal detected at a level less than the predetermined threshold is not selected when is less than the predetermined threshold.

The speech selection unit uses the at least one of the two or more microphones to be omnidirectional when audio signals are simultaneously detected from directions exceeding a predetermined number when executing the simultaneous selection function. The sound collection device according to claim 1, wherein sound collection is performed.

The utterance selection unit selects and executes either a simultaneous selection function for simultaneously selecting voice signals from the two or more directions or a priority selection function for selecting voice signals from one priority direction. The sound collection apparatus in any one of 1-3.

When executing the priority selection function, the utterance selection unit, according to a user instruction, the direction of the audio signal that is detected most first among the audio signals detected simultaneously from the two or more directions, The sound collecting device according to claim 4, wherein the sound collecting device is selected as the one priority direction.

When executing the priority selection function, the utterance selection unit determines the direction of the audio signal detected at the maximum level among the audio signals detected simultaneously from the two or more directions according to a user instruction. The sound collecting device according to claim 4, wherein the sound collecting device is selected as one priority direction.

The sound collection device according to claim 4, wherein the utterance selection unit selects either the simultaneous selection function or the priority selection function in accordance with a user instruction.

Forming sound collection directivity in two or more directions using two or more microphones;
Detecting the presence or absence of an audio signal collected from the two or more directions;
Executing a simultaneous selection function of simultaneously selecting audio signals from the two or more directions when audio signals are detected simultaneously from the two or more directions;
Including sound collection method.