JP2007174011A

JP2007174011A - Sound pickup device

Info

Publication number: JP2007174011A
Application number: JP2005365769A
Authority: JP
Inventors: Hiroshi Kayama; 啓嘉山
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2005-12-20
Filing date: 2005-12-20
Publication date: 2007-07-05

Abstract

<P>PROBLEM TO BE SOLVED: To execute sound pickup with a high S/N ratio without damaging frequency characteristics even in a state in which a sound source moves. <P>SOLUTION: An output signal synthesizer 30 synthesizes and outputs a final digital audio signal SS from digital audio signals S-k (k=1-m) obtained by microphones 11-k (k=1-m). Extractors 20-k (k=1-m) output sound strength signals Es-k (k=1-m) from the digital audio signals S-k (K=1-m). A switching control unit 40 controls the output signal synthesizer 30 on the basis of the sound strength signals Es-k (K=1-m), so that a signal is outputted high in strength of a sound component. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、外界の音を収音して電気信号を出力する収音装置に関する。 The present invention relates to a sound collection device that collects external sounds and outputs an electrical signal.

音声会議など、雑音の多い環境下において話者の音声の収音を行うためには、音源である話者の口から発する音声を高いＳ／Ｎ比で収音する収音装置が必要である。こういった要求に応えるための技術として、特許文献１および２は、単一指向性を持ったマイクロフォンを組み合わせ、各マイクロフォンの出力信号を重ね合わせて出力する技術を提案している。
特開平１０−１２６８７６号公報特開２０００−１８８７９５号公報 In order to collect a speaker's voice in a noisy environment such as an audio conference, a sound collection device that collects a voice emitted from the speaker's mouth as a sound source with a high S / N ratio is required. . As techniques for meeting these demands, Patent Documents 1 and 2 propose a technique of combining microphones having unidirectionality and outputting the output signals of the microphones in a superimposed manner.
Japanese Patent Laid-Open No. 10-126876 JP 2000-188895 A

上述した特許文献１および２に開示された技術によれば、特定の位置または方向から音声が発生する場合に限り、その音声を収音した各マイクロフォンの出力信号が同相で重ね合わされて出力される。従って、音源である話者の口から発する音声に対象を絞り、この音声を高いＳ／Ｎ比で収音することができる。しかしながら、広い周波数帯域に亙って、複数のマイクロフォンの出力信号を同位相で重ね合わせることは困難である。このため、特許文献１および２に開示された技術により収音を行うと、収音される音の周波数特性が損なわれるという問題がある。また、特許文献１および２に開示された技術は、話者の口の位置が移動する場合など、音源の位置が移動する場合に、その音源から発生する音声を高いＳ／Ｎ比で収音することが困難であるという問題がある。 According to the techniques disclosed in Patent Documents 1 and 2 described above, only when sound is generated from a specific position or direction, output signals of the microphones that pick up the sound are superimposed and output in phase. . Therefore, it is possible to focus on the voice uttered from the speaker's mouth, which is a sound source, and collect the voice with a high S / N ratio. However, it is difficult to superimpose the output signals of a plurality of microphones in the same phase over a wide frequency band. For this reason, when sound collection is performed by the techniques disclosed in Patent Documents 1 and 2, there is a problem that the frequency characteristics of the collected sound are impaired. Further, the techniques disclosed in Patent Documents 1 and 2 collect sound generated from a sound source with a high S / N ratio when the position of the sound source moves, such as when the position of the speaker's mouth moves. There is a problem that it is difficult to do.

この発明は、以上説明した事情に鑑みてなされたものであり、音源が移動する場合においても、周波数特性を損ねることなく、高いＳ／Ｎ比で収音を行うことができる収音装置を提供することを目的としている。 The present invention has been made in view of the circumstances described above, and provides a sound collection device capable of collecting sound with a high S / N ratio without impairing frequency characteristics even when the sound source moves. The purpose is to do.

この発明は、外界から音を収音して電気信号を出力する複数のマイクロフォンと、前記複数のマイクロフォンの各出力信号から出力対象となる音声信号を合成して出力する出力信号合成手段と、前記複数のマイクロフォンの各出力信号から少なくとも音声成分を抽出し、各信号の音声成分の強度またはＳ／Ｎ比を示す信号を出力する抽出手段と、前記抽出手段の出力信号に基づいて、前記複数のマイクロフォンの各出力信号のうち音声成分の強度またはＳ／Ｎ比の高い信号が前記出力信号合成手段により出力されるように前記出力信号合成手段を制御する切換制御処理を実行する切換制御手段とを具備することを特徴とする収音装置を提供する。
かかる発明によれば、複数のマイクロフォンの各出力信号のうち音声成分の強度またはＳ／Ｎ比の大きな信号が出力信号合成手段により出力される。従って、音源が移動する状況においても、音源から発生する音声を、周波数特性を損ねることなく、高いＳ／Ｎ比で収音することができる。 The present invention includes a plurality of microphones that collect sound from the outside world and output an electrical signal, output signal synthesis means that synthesizes and outputs an audio signal to be output from each output signal of the plurality of microphones, Extracting means for extracting at least a sound component from each output signal of the plurality of microphones and outputting a signal indicating the intensity or S / N ratio of the sound component of each signal, and based on the output signal of the extracting means, A switching control means for executing a switching control process for controlling the output signal synthesizing means so that a signal having a high sound component intensity or a high S / N ratio among the output signals of the microphones is output by the output signal synthesizing means; Provided is a sound collecting device.
According to this invention, the output signal synthesizing means outputs a signal having a large intensity or a high S / N ratio among the output signals of the plurality of microphones. Therefore, even in a situation where the sound source moves, sound generated from the sound source can be collected with a high S / N ratio without impairing frequency characteristics.

以下、図面を参照し、この発明の実施の形態を説明する。
＜実施形態の構成＞
図１はこの発明の一実施形態である収音装置の構成を示すブロック図である。図１に示すように、本実施形態における収音装置は、ｍ個のマイクロフォン１１−ｋ（ｋ＝１〜ｍ）を有する。図２および図３は、同収音装置におけるマイクロフォン１１−ｋ（ｋ＝１〜ｍ）の実装例を各々示している。なお、これらの図では、マイクロフォンの個数ｍが３である場合の例が示されている。 Embodiments of the present invention will be described below with reference to the drawings.
<Configuration of Embodiment>
FIG. 1 is a block diagram showing a configuration of a sound collecting apparatus according to an embodiment of the present invention. As illustrated in FIG. 1, the sound collection device according to the present embodiment includes m microphones 11-k (k = 1 to m). 2 and 3 show examples of mounting the microphone 11-k (k = 1 to m) in the sound collecting device, respectively. In these drawings, an example in which the number m of microphones is 3 is shown.

本実施形態における収音装置は、独立した装置として構成される場合もあるし、他の装置に組み込まれる場合もある。図２は、前者の例である収音装置におけるマイクロフォンの実装例を示している。この収音装置では、スタンド５０１の上部に固定された水平なバー５０２に３個のマイクロフォン１１−１〜１１−３が固定されている。図３は、後者の例として、本実施形態に係る収音装置が組み込まれたノート型パソコンにおけるマイクロフォンの実装例を示している。この例では、ノート型パソコンのディスプレイ５０３の上部に３個のマイクロフォン１１−１〜１１−３が固定されている。 The sound collection device according to the present embodiment may be configured as an independent device or may be incorporated into another device. FIG. 2 shows a mounting example of a microphone in the sound collecting apparatus which is the former example. In this sound collecting device, three microphones 11-1 to 11-3 are fixed to a horizontal bar 502 fixed to the top of a stand 501. FIG. 3 shows, as the latter example, a microphone mounting example in a notebook personal computer in which the sound collecting device according to the present embodiment is incorporated. In this example, three microphones 11-1 to 11-3 are fixed to the upper part of the display 503 of the notebook computer.

本実施形態において用いられるマイクロフォン１１−ｋ（ｋ＝１〜ｍ）は、受音感度が音の到来方向に依存する単一指向性マイクロフォンである。図２および図３に示す例において、マイクロフォンにおいて最大の受音感度が得られる方角を向いた軸を最大感度軸と呼ぶものとすると、マイクロフォン１１−１〜１１−３は、各々最大感度軸を収音装置またはノート型パソコンの斜め右、真正面、斜め左の各方角に向けている。このように本実施形態におけるｍ個のマイクロフォン１１−ｋ（ｋ＝１〜ｍ）は、各々の最大感度軸が放射線を描くように収音装置に固定されている。 The microphone 11-k (k = 1 to m) used in the present embodiment is a unidirectional microphone whose sound receiving sensitivity depends on the direction of arrival of sound. In the example shown in FIG. 2 and FIG. 3, assuming that the axis oriented in the direction in which the maximum sound receiving sensitivity is obtained in the microphone is called the maximum sensitivity axis, each of the microphones 11-1 to 11-3 has the maximum sensitivity axis. The sound pickup device or notebook computer is directed to the right, right front, and left diagonal directions. As described above, the m microphones 11-k (k = 1 to m) in the present embodiment are fixed to the sound collection device so that each maximum sensitivity axis draws radiation.

話者は、これらのマイクロフォン１１−ｋ（ｋ＝１〜ｍ）を前にして発話するが、話者が動く場合には、その話者の音声を収音するのに適するマイクロフォンが話者の位置に応じて変化する。例えば図２および図３に示す例において、話者の口が収音装置またはノート型パソコンの左側にある場合には、その話者の口の方角に最大感度軸を向けているマイクロフォン１１−１の出力信号のレベルが最大となり、この出力信号が話者の音声を示すものとして適している。しかし、話者が姿勢を変え、話者の口がマイクロフォン１１−２の真正面に移動すると、マイクロフォン１１−２の出力信号のレベルが最大となり、この出力信号を話者の音声を示すものとして採用した方がよい。 The speaker speaks in front of these microphones 11-k (k = 1 to m). When the speaker moves, a microphone suitable for picking up the voice of the speaker is selected. It changes according to the position. For example, in the example shown in FIGS. 2 and 3, when the speaker's mouth is on the left side of the sound pickup device or the notebook computer, the microphone 11-1 having the maximum sensitivity axis directed in the direction of the speaker's mouth. This output signal is suitable for indicating the speaker's voice. However, when the speaker changes posture and the speaker's mouth moves directly in front of the microphone 11-2, the level of the output signal of the microphone 11-2 becomes maximum, and this output signal is adopted as an indication of the speaker's voice. You should do it.

そこで、本実施形態における収音装置では、マイクロフォン１１−ｋ（ｋ＝１〜ｍ）の各出力信号の音声成分のレベルを監視し、原則的に最大レベルの信号を選択して最終的なデジタルオーディオ信号ＳＳとして出力し、収音装置全体としての指向性を音源（この例では話者の口）の方向に追従させる。そして、本実施形態における収音装置では、この最終的なデジタルオーディオ信号ＳＳを受け取って処理する後段の装置（図示略）の便宜のため、このデジタルオーディオ信号ＳＳのＳ／Ｎ比を示す信号（以下、Ｓ／Ｎ比信号という）を生成して出力するのである。以下、デジタルオーディオ信号ＳＳおよびＳ／Ｎ比信号を得るための収音装置の回路構成を説明する。 Therefore, in the sound collection device according to the present embodiment, the level of the sound component of each output signal of the microphone 11-k (k = 1 to m) is monitored, and in principle, the signal of the maximum level is selected to obtain the final digital signal. The audio signal SS is output and the directivity of the entire sound collection device is made to follow the direction of the sound source (in this example, the speaker's mouth). In the sound collecting device according to the present embodiment, for the convenience of a subsequent device (not shown) that receives and processes the final digital audio signal SS, a signal (S / N ratio) of the digital audio signal SS ( Hereinafter, the signal is generated and output). Hereinafter, the circuit configuration of the sound collection device for obtaining the digital audio signal SS and the S / N ratio signal will be described.

図１において、Ａ／Ｄ変換器１２−ｋ（ｋ＝１〜ｍ）は、マイクロフォン１１−ｋ（ｋ＝１〜ｍ）から出力されるアナログオーディオ信号を一定のサンプリング周期でサンプリングし、サンプル値を示すデジタルオーディオ信号Ｓ−ｋ（ｋ＝１〜ｍ）に変換する。デジタルオーディオ信号Ｓ−ｋ（ｋ＝１〜ｍ）は、抽出部２０−ｋ（ｋ＝１〜ｍ）に各々入力されるとともに、出力信号合成部３０に入力される。 In FIG. 1, an A / D converter 12-k (k = 1 to m) samples an analog audio signal output from a microphone 11-k (k = 1 to m) at a constant sampling period, and samples values. Is converted into a digital audio signal Sk (k = 1 to m). The digital audio signals Sk (k = 1 to m) are respectively input to the extraction unit 20-k (k = 1 to m) and also input to the output signal synthesis unit 30.

抽出部２０−ｋ（ｋ＝１〜ｍ）は、デジタルオーディオ信号Ｓ−ｋ（ｋ＝１〜ｍ）の各々から音声成分の強度を示す音声強度信号Ｅｓ−ｋ（ｋ＝１〜ｍ）および雑音成分の強度を示す雑音強度信号Ｅｎ−ｋ（ｋ＝１〜ｍ）を抽出する回路である。本実施形態では、音声強度信号Ｅｓ−ｋ（ｋ＝１〜ｍ）のレベル比較によりデジタルオーディオ信号Ｓ−ｋ（ｋ＝１〜ｍ）のいずれを最終的なデジタルオーディオ信号ＳＳとして出力するかの判断を行う。また、本実施形態では、音声強度信号Ｅｓ−ｋ（ｋ＝１〜ｍ）および雑音強度信号Ｅｎ−ｋ（ｋ＝１〜ｍ）からＳ／Ｎ比信号を演算する。 The extraction unit 20-k (k = 1 to m) is a voice intensity signal Es-k (k = 1 to m) indicating the intensity of the voice component from each of the digital audio signals Sk (k = 1 to m) and This is a circuit for extracting a noise intensity signal En-k (k = 1 to m) indicating the intensity of the noise component. In the present embodiment, which of the digital audio signals Sk (k = 1 to m) is output as the final digital audio signal SS by the level comparison of the sound intensity signal Es-k (k = 1 to m). Make a decision. In this embodiment, the S / N ratio signal is calculated from the voice intensity signal Es-k (k = 1 to m) and the noise intensity signal En-k (k = 1 to m).

図４は抽出部２０−ｋ（ｋ＝１〜ｍ）の各々の構成を示すブロック図である。図４において、ＢＰＦ（バンドパスフィルタ；帯域通過フィルタ）２１は、例えば３００〜３０００Ｈｚの通過帯域を有し、デジタルオーディオ信号Ｓ−ｋに含まれる音声周波数成分を通過させる。このＢＰＦ２１の出力信号は、デジタルオーディオ信号Ｓ−ｋにおける音声成分の強度を示しているが、その値が急激にかつ頻繁に変化する。従って、仮にＢＰＦ２１の出力信号をそのまま音声強度信号Ｅｓ−ｋとして出力すると、デジタルオーディオ信号ＳＳとして選択されるデジタルオーディオ信号Ｓ−ｋが頻繁に切り換えられることとなり、動作が不安定になる。そこで、ＢＰＦ２１の後段にエンベロープ生成部２２が設けられている。このエンベロープ生成部２２は、ＢＰＦ２１の出力信号の急激な変化を緩和したエンベロープ（包絡線）を示す音声強度信号Ｅｓ−ｋを出力する。具体的にはエンベロープ生成部２２は、実効値算出回路と、ＬＰＦ（ローパスフィルタ）とを有している。ここで、実効値算出回路は、ＢＰＦ２１の出力信号を所定個数のサンプルからなるフレームに区切り、フレーム毎に各サンプルの２乗平均である実効値を算出する。ＬＰＦは、フレーム毎に得られる実効値の急激な変化を取り除き、実効値のエンベロープを示す音声強度信号Ｅｓ−ｋを出力する。 FIG. 4 is a block diagram showing the configuration of each of the extraction units 20-k (k = 1 to m). In FIG. 4, a BPF (band pass filter; band pass filter) 21 has a pass band of 300 to 3000 Hz, for example, and passes an audio frequency component included in the digital audio signal Sk. The output signal of the BPF 21 indicates the intensity of the audio component in the digital audio signal Sk, but its value changes rapidly and frequently. Therefore, if the output signal of the BPF 21 is output as it is as the audio intensity signal Es-k, the digital audio signal Sk selected as the digital audio signal SS is frequently switched, and the operation becomes unstable. Therefore, an envelope generator 22 is provided at the subsequent stage of the BPF 21. The envelope generator 22 outputs an audio intensity signal Es-k indicating an envelope (envelope) in which an abrupt change in the output signal of the BPF 21 is relaxed. Specifically, the envelope generation unit 22 includes an effective value calculation circuit and an LPF (low pass filter). Here, the effective value calculation circuit divides the output signal of the BPF 21 into frames made up of a predetermined number of samples, and calculates an effective value that is the mean square of each sample for each frame. The LPF removes an abrupt change in the effective value obtained for each frame, and outputs an audio intensity signal Es-k indicating an envelope of the effective value.

ＢＥＦ（バンドエリミネーションフィルタ；帯域除去フィルタ）２３は、例えば３００〜３０００Ｈｚの遮断帯域を有し、デジタルオーディオ信号Ｓ−ｋに含まれる遮断帯域以外の帯域の成分を通過させる。このＢＥＦ２３の出力信号は、デジタルオーディオ信号Ｓ−ｋにおける雑音成分の強度を示しているが、その値が急激にかつ頻繁に変化する。従って、仮にＢＥＦ２３の出力信号をそのまま雑音強度信号Ｅｎ−ｋとして出力すると、音声強度信号Ｅｓ−ｋ（ｋ＝１〜ｍ）および雑音強度信号Ｅｎ−ｋ（ｋ＝１〜ｍ）から演算されるＳ／Ｎ比信号が不安定なものとなる。そこで、ＢＥＦ２３の後段にエンベロープ生成部２２と同様なエンベロープ生成部２４が設けられている。このエンベロープ生成部２４は、このＢＥＦ２３の出力信号の急激な変化を緩和したエンベロープを示す雑音強度信号Ｅｎ−ｋを出力する。 A BEF (band elimination filter; band elimination filter) 23 has a cut-off band of, for example, 300 to 3000 Hz, and allows a band component other than the cut-off band included in the digital audio signal Sk to pass therethrough. The output signal of the BEF 23 indicates the intensity of the noise component in the digital audio signal Sk, but its value changes rapidly and frequently. Therefore, if the output signal of the BEF 23 is output as it is as the noise intensity signal En-k, it is calculated from the voice intensity signal Es-k (k = 1 to m) and the noise intensity signal En-k (k = 1 to m). The S / N ratio signal becomes unstable. Therefore, an envelope generation unit 24 similar to the envelope generation unit 22 is provided at the subsequent stage of the BEF 23. The envelope generator 24 outputs a noise intensity signal En-k indicating an envelope in which an abrupt change in the output signal of the BEF 23 is mitigated.

図５は抽出部２０−ｋ（ｋ＝１〜ｍ）の他の構成例を示すブロック図である。この例では、図４におけるＢＥＦ２３が減算器２５に置き換えられている。この減算器２５は、デジタルオーディオ信号Ｓ−ｋからＢＰＦ２１の出力信号を減算してエンベロープ生成部２４に供給する。この構成においても、図４に示すものと同様な音声強度信号Ｅｓ−ｋおよび雑音強度信号Ｅｎ−ｋがエンベロープ生成部２２および２４から各々出力される。 FIG. 5 is a block diagram illustrating another configuration example of the extraction unit 20-k (k = 1 to m). In this example, the BEF 23 in FIG. The subtracter 25 subtracts the output signal of the BPF 21 from the digital audio signal Sk and supplies it to the envelope generation unit 24. Also in this configuration, the sound intensity signal Es-k and the noise intensity signal En-k similar to those shown in FIG. 4 are output from the envelope generators 22 and 24, respectively.

図１において、出力信号合成部３０は、デジタルオーディオ信号Ｓ−ｋ（ｋ＝１〜ｍ）のうちの１つを選択してデジタルオーディオ信号ＳＳとして出力し、あるいはデジタルオーディオ信号Ｓ−ｋ（ｋ＝１〜ｍ）のうちの２つの信号にクロスフェードを施してデジタルオーディオ信号ＳＳを出力する回路である。この出力信号合成部３０は、デジタルオーディオ信号Ｓ−ｋ（ｋ＝１〜ｍ）に係数ａ−ｋ（ｋ＝１〜ｍ）を各々乗じて出力する乗算器３１−ｋ（ｋ＝１〜ｍ）と、乗算器３１−ｋ（ｋ＝１〜ｍ）の出力信号を加算してデジタルオーディオ信号ＳＳとして出力する加算器３２と、係数ａ−ｋ（ｋ＝１〜ｍ）を制御する合成制御部３３とにより構成されている。 In FIG. 1, the output signal synthesis unit 30 selects one of the digital audio signals Sk (k = 1 to m) and outputs it as the digital audio signal SS, or the digital audio signal Sk (k = 1 to m) is a circuit that performs crossfading on two signals and outputs a digital audio signal SS. The output signal synthesizer 30 multiplies the digital audio signal Sk (k = 1 to m) by a coefficient ak (k = 1 to m) and outputs the multiplier 31-k (k = 1 to m). ), An adder 32 that adds the output signals of the multipliers 31-k (k = 1 to m) and outputs them as a digital audio signal SS, and a synthesis control that controls the coefficients ak (k = 1 to m) Part 33.

切換制御部４０は、音声強度信号Ｅｓ−ｋ（ｋ＝１〜ｍ）を監視し、監視結果に基づき、選択信号ＭｎｅｗおよびＭｏｌｄと、クロスフェード信号ＣＦとを出力する回路である。ここで、選択信号Ｍｎｅｗは、デジタルオーディオ信号Ｓ−ｋ（ｋ＝１〜ｍ）のうち最終的なデジタルオーディオ信号ＳＳとするのに最も相応しいもののインデックスｋを示す信号である。また、選択信号Ｍｏｌｄは、選択信号Ｍｎｅｗが現在の値に変更される直前の値を示す信号である。切換制御部４０は、原則として、周期的な検証パルスＰｃが与えられる度に、この選択信号ＭｎｅｗおよびＭｏｌｄの検証および必要な更新を行うための切換制御処理を行う。この切換制御処理では、クロスフェード信号ＣＦが“１”である期間を除き、音声強度信号Ｅｓ−ｋ（ｋ＝１〜ｍ）のレベル比較を行い、大雑把に言えば、最大レベルの音声強度信号Ｅｓ−ｋのインデックスｋを示すように選択信号Ｍｎｅｗの更新を行う。また、切換制御処理では、選択信号Ｍｎｅｗの内容を変える場合、その変化前の選択信号Ｍｎｅｗの内容により選択信号Ｍｏｌｄを更新する。なお、切換制御処理には、各種の態様が考えられるが、説明の重複を避けるため、その詳細については本実施形態の動作説明において明らかにする。 The switching control unit 40 is a circuit that monitors the voice intensity signal Es-k (k = 1 to m) and outputs the selection signals Mnew and Mold and the crossfade signal CF based on the monitoring result. Here, the selection signal Mnew is a signal indicating the index k of the most suitable digital audio signal SS among the digital audio signals Sk (k = 1 to m). The selection signal Mold is a signal indicating a value immediately before the selection signal Mnew is changed to the current value. In principle, the switching control unit 40 performs a switching control process for verifying the selection signals Mnew and Mold and performing necessary updates every time the periodic verification pulse Pc is given. In this switching control process, the level of the voice strength signal Es-k (k = 1 to m) is compared except during the period in which the crossfade signal CF is “1”, and roughly speaking, the voice strength signal of the maximum level. The selection signal Mnew is updated so as to indicate the index k of Es-k. In the switching control process, when the content of the selection signal Mnew is changed, the selection signal Mold is updated with the content of the selection signal Mnew before the change. Various modes can be considered for the switching control process, but the details will be clarified in the description of the operation of the present embodiment in order to avoid duplication of explanation.

出力信号合成部３０における合成制御部３３は、このようにして更新される選択信号Ｍｎｅｗを監視し、選択信号Ｍｎｅｗにより指定されるインデックスｋを持ったデジタルオーディオ信号Ｓ−ｋが最終的なデジタルオーディオ信号ＳＳとして出力されるように、係数ａ−ｋ（ｋ＝１〜ｍ）の値の制御を行う。具体的には、合成制御部３３は、選択信号Ｍｎｅｗにより指定されるインデックスｋを持った係数ａ−ｋを「１」とし、他の係数を「０」とする。 The synthesis control unit 33 in the output signal synthesis unit 30 monitors the selection signal Mnew updated in this way, and the digital audio signal Sk having the index k specified by the selection signal Mnew is the final digital audio. The value of the coefficient ak (k = 1 to m) is controlled so as to be output as the signal SS. Specifically, the composition control unit 33 sets the coefficient a-k having the index k specified by the selection signal Mnew to “1” and the other coefficients to “0”.

ここで、本実施形態におけるｍ個のマイクロフォン１１−ｋ（ｋ＝１〜ｍ）は相互に方向の異なる最大感度軸を有しているため、一般的にデジタルオーディオ信号Ｓ−ｋ（ｋ＝１〜ｍ）間にはレベル差がある。このため、選択信号Ｍｎｅｗの内容が変化したとき、それに合わせて、デジタルオーディオ信号ＳＳとなるデジタル信号Ｓ−ｋを直ちに切り換えると、デジタルオーディオ信号ＳＳに不自然な不連続が生じる。そこで、本実施形態において切換制御部４０は、選択信号ＭｎｅｗおよびＭｏｌｄの内容を変化させるときには、所定期間を要して出力信号合成部３０にクロスフェードを実行させる。 Here, since the m microphones 11-k (k = 1 to m) in the present embodiment have the maximum sensitivity axes whose directions are different from each other, the digital audio signal Sk (k = 1) is generally used. There is a level difference between ~ m). For this reason, when the content of the selection signal Mnew changes, if the digital signal Sk that is the digital audio signal SS is immediately switched accordingly, an unnatural discontinuity occurs in the digital audio signal SS. Therefore, in the present embodiment, the switching control unit 40 causes the output signal synthesis unit 30 to perform a crossfade over a predetermined period when changing the contents of the selection signals Mnew and Mold.

具体的には、切換制御部４０は、選択信号ＭｎｅｗおよびＭｏｌｄの内容を変化させた場合には、その時点においてクロスフェード信号ＣＦを“０”から“１”に立ち上げ、クロスフェード信号ＣＦを所定期間に亙って“１”とした後、再び“０”に戻す。出力信号合成部３０における合成制御部３３は、クロスフェード信号ＣＦが“１”である期間に、選択信号Ｍｎｅｗによりインデックスが指定される係数（例えばａ−ｎｅｗｋとする）を「０」から「１」に、選択信号Ｍｏｌｄによりインデックスが指定される係数（例えばａ−ｏｌｄｋとする）を「１」から「０」に連続的に変化させる。このようにして、新旧２つのデジタルオーディオ信号Ｓ−ｋのクロスフェードが行われるため、デジタルオーディオ信号ＳＳには不自然な不連続が発生しない。 Specifically, when the contents of the selection signals Mnew and Mold are changed, the switching control unit 40 raises the crossfade signal CF from “0” to “1” at that time, and outputs the crossfade signal CF. After setting it to “1” for a predetermined period, it returns to “0” again. The synthesis control unit 33 in the output signal synthesis unit 30 changes the coefficient (for example, a-newk) for which the index is designated by the selection signal Mnew from “0” to “1” during the period when the crossfade signal CF is “1”. The coefficient (for example, a-oldk) whose index is designated by the selection signal Mold is continuously changed from “1” to “0”. In this way, since the old and new digital audio signals Sk are cross-faded, an unnatural discontinuity does not occur in the digital audio signal SS.

Ｓ／Ｎ比信号生成部５０は、音声強度信号Ｅｓ−ｋ（ｋ＝１〜ｍ）の中から選択信号Ｍｎｅｗにより指定されたインデックスｋを持つものをＳ成分として選択し、雑音強度信号Ｅｎ−ｋ（ｋ＝１〜ｍ）のうち最も強度の高いものをＮ成分として選択し、Ｓ成分の信号レベルをＮ成分の信号レベルにより除算した結果をＳ／Ｎ比信号として出力する回路である。また、出力部６０は、出力信号合成部３０から得られる最終的なデジタルオーディオ信号ＳＳとＳ／Ｎ比信号生成部５０から得られるＳ／Ｎ比信号とを出力する回路である。
以上が本実施形態の構成である。 The S / N ratio signal generation unit 50 selects a speech intensity signal Es-k (k = 1 to m) having an index k specified by the selection signal Mnew as an S component, and a noise intensity signal En−. This is a circuit that selects the highest intensity among k (k = 1 to m) as an N component, and outputs the result of dividing the signal level of the S component by the signal level of the N component as an S / N ratio signal. The output unit 60 is a circuit that outputs the final digital audio signal SS obtained from the output signal synthesis unit 30 and the S / N ratio signal obtained from the S / N ratio signal generation unit 50.
The above is the configuration of the present embodiment.

＜実施形態の動作＞
（１）全体動作
次に本実施形態の動作について説明する。図６は本実施形態の動作例を示すタイムチャートである。この動作例は、図２または図３に例示したような３個のマイクロフォン１１−ｋ（ｋ＝１〜３）を有する収音装置の動作例である。この動作例のように、本実施形態では、周期的な検証パルスＰｃが発生する度に、切換制御部４０により切換制御処理が実行され、音声強度信号Ｅｓ−ｋ（ｋ＝１〜３）のレベル比較が行われる。 <Operation of Embodiment>
(1) Overall Operation Next, the operation of this embodiment will be described. FIG. 6 is a time chart showing an operation example of the present embodiment. This operation example is an operation example of a sound collection device having three microphones 11-k (k = 1 to 3) as illustrated in FIG. 2 or FIG. As in this operation example, in this embodiment, every time the periodic verification pulse Pc is generated, the switching control unit 40 executes the switching control process, and the voice intensity signal Es-k (k = 1 to 3) Level comparison is performed.

この動作例では、音源である話者の口が収音装置の正面から右側に移動している。音源が収音装置の正面にある場合、音声強度信号Ｅｓ−ｋ（ｋ＝１〜３）のうち音声強度信号Ｅｓ−２のレベルが最大となる。このため、繰り返し実行される切換制御処理では、選択信号Ｍｎｅｗは、中央のマイクロフォン１１−２から得られるデジタルオーディオ信号Ｓ−２を指定するインデックスである「２」とされる。 In this operation example, the speaker's mouth, which is a sound source, moves from the front of the sound collection device to the right side. When the sound source is in front of the sound collection device, the level of the sound intensity signal Es-2 is the maximum among the sound intensity signals Es-k (k = 1 to 3). Therefore, in the switching control process that is repeatedly executed, the selection signal Mnew is set to “2”, which is an index that designates the digital audio signal S-2 obtained from the central microphone 11-2.

しかし、音源が収音装置の中央から右側に移動してゆくと、音声強度信号Ｅｓ−２のレベルが次第に低下し、音声強度信号Ｅｓ−３のレベルが次第に上昇する。そして、動作例では、時刻ｔ１において切換制御処理が実行されたとき、音声強度信号Ｅｓ−２およびＥｓ−３のレベルの大小関係が逆転しているため、選択信号Ｍｎｅｗが「３」とされ、選択信号Ｍｏｌｄが「２」とされる。そして、この時点以降、所定期間に亙ってクロスフェード信号ＣＦが“１”とされる。このクロスフェード信号ＣＦが“１”である間は、検証パルスＰｃが発生しても切換制御処理は実行されない。 However, as the sound source moves from the center of the sound collecting device to the right side, the level of the sound intensity signal Es-2 gradually decreases and the level of the sound intensity signal Es-3 gradually increases. In the operation example, when the switching control process is executed at time t1, the magnitude relationship between the levels of the sound intensity signals Es-2 and Es-3 is reversed, so the selection signal Mnew is set to “3”. The selection signal Mold is set to “2”. After this point, the crossfade signal CF is set to “1” for a predetermined period. While the crossfade signal CF is “1”, the switching control process is not executed even if the verification pulse Pc is generated.

出力信号合成部３０では、このクロスフェード信号ＣＦが“１”である期間を要して、デジタルオーディオ信号Ｓ−２に乗じる係数ａ−２を「１」から「０」に低下させる動作と、デジタルオーディオ信号Ｓ−３に乗じる係数ａ−３を「０」から「１」に上昇させる動作とが行われる。これにより最終的に出力されるデジタルオーディオ信号ＳＳは、デジタルオーディオ信号Ｓ−２からデジタルオーディオ信号Ｓ−３へと自然に移行する。 The output signal synthesizer 30 takes a period in which the crossfade signal CF is “1” and reduces the coefficient a-2 to be multiplied by the digital audio signal S-2 from “1” to “0”; An operation of increasing the coefficient a-3 multiplied by the digital audio signal S-3 from “0” to “1” is performed. As a result, the digital audio signal SS finally output naturally shifts from the digital audio signal S-2 to the digital audio signal S-3.

Ｓ／Ｎ比信号生成部５０では、上述したように音声強度信号Ｅｓ−ｋ（ｋ＝１〜３）と雑音強度信号Ｅｎ−ｋ（ｋ＝１〜３）とからＳ／Ｎ比信号が演算される。この動作例の場合、選択信号Ｍｎｅｗが「２」である期間は、このインデックス「２」に対応した音声強度信号Ｅｓ−２と、雑音強度信号Ｅｎ−ｋ（ｋ＝１〜３）のうち最大レベルのものとからＳ／Ｎ比信号が演算される。また、選択信号Ｍｎｅｗが「３」である期間は、このインデックス「３」に対応した音声強度信号Ｅｓ−３と、雑音強度信号Ｅｎ−ｋ（ｋ＝１〜３）のうち最大レベルのものとからＳ／Ｎ比信号が演算される。出力部６０は、このようにして得られるデジタルオーディオ信号ＳＳとＳ／Ｎ比信号とを後段の装置に出力する。 As described above, the S / N ratio signal generation unit 50 calculates the S / N ratio signal from the voice intensity signal Es-k (k = 1 to 3) and the noise intensity signal En-k (k = 1 to 3). Is done. In the case of this operation example, the period during which the selection signal Mnew is “2” is the maximum of the audio intensity signal Es-2 corresponding to the index “2” and the noise intensity signal En-k (k = 1 to 3). An S / N ratio signal is calculated from the level. Further, during the period when the selection signal Mnew is “3”, the voice intensity signal Es-3 corresponding to the index “3” and the noise intensity signal En-k (k = 1 to 3) having the maximum level are selected. To calculate the S / N ratio signal. The output unit 60 outputs the digital audio signal SS and the S / N ratio signal obtained in this way to a subsequent device.

（２）切換制御処理の諸態様
本実施形態において切換制御部４０が実行する切換制御処理は、話者の口の位置の移動に追従できる程度の応答性があれば足りる。切換制御処理が音声強度信号Ｅｓ−ｋ（ｋ＝１〜ｍ）の変化にあまりに過敏に応答すると、最終的なデジタルオーディオ信号ＳＳとなるデジタルオーディオ信号Ｓ−ｋが頻繁に切り換えられ、最終的なデジタルオーディオ信号ＳＳが聴感上不自然なものとなる。以下、ｍ＝３である場合を例に、このような不都合を防止するための切換制御処理の諸態様について説明する。 (2) Various aspects of the switching control process The switching control process executed by the switching control unit 40 in the present embodiment is sufficient if it has a response level enough to follow the movement of the speaker's mouth position. If the switching control process responds too sensitively to changes in the sound intensity signal Es-k (k = 1 to m), the digital audio signal Sk that is the final digital audio signal SS is frequently switched, and the final The digital audio signal SS becomes unnatural for hearing. Hereinafter, aspects of the switching control process for preventing such inconvenience will be described by taking m = 3 as an example.

ａ．第１の態様
この態様では、音声のレベルと暗騒音のレベルとの境界である閾値ｔｈを利用し、音声強度信号Ｅｓ−ｋ（ｋ＝１〜３）のうち閾値ｔｈ以上のレベルのものだけをデジタルオーディオ信号Ｓ−ｋの選択における判断の資料とする。図７（ａ）および（ｂ）は、この態様における切換制御処理の実行例を示している。図７（ａ）および（ｂ）に示す各例では、時刻ｔ１１および時刻ｔ１２に検証パルスＰｃが発生し、切換制御処理が実行されている。なお、これらの図では、図示が煩雑になるのを防止するため、時刻ｔ１１およびｔ１２において発生した音声強度信号Ｅｓ−ｋ（ｋ＝１〜３）を左右方向に並べて図示している。 a. First Aspect In this aspect, a threshold th that is a boundary between a voice level and a background noise level is used, and only those having a level equal to or higher than the threshold th among voice intensity signals Es-k (k = 1 to 3). Are used as materials for determination in the selection of the digital audio signal Sk. FIGS. 7A and 7B show an execution example of the switching control process in this mode. In each example shown in FIGS. 7A and 7B, the verification pulse Pc is generated at time t11 and time t12, and the switching control process is executed. In these drawings, in order to prevent the illustration from being complicated, the sound intensity signals Es-k (k = 1 to 3) generated at times t11 and t12 are shown side by side in the horizontal direction.

図７（ａ）に示す例において、時刻ｔ１１における切換制御処理では、音声強度信号Ｅｓ−２のレベルが最大であり、かつ、閾値ｔｈ以上であるため、選択信号Ｍｎｅｗは「２」とされ、デジタルオーディオ信号Ｓ−２がデジタルオーディオ信号ＳＳとして選択される。時刻ｔ１２における切換制御処理では、音声強度信号Ｅｓ−１のレベルが最大であり、かつ、閾値ｔｈ以上であるため、選択信号Ｍｎｅｗは「１」とされ、デジタルオーディオ信号Ｓ−１がデジタルオーディオ信号ＳＳとして選択される。 In the example shown in FIG. 7A, in the switching control process at time t11, since the level of the voice strength signal Es-2 is the maximum and is equal to or higher than the threshold th, the selection signal Mnew is “2”. The digital audio signal S-2 is selected as the digital audio signal SS. In the switching control process at time t12, since the level of the sound intensity signal Es-1 is the maximum and is equal to or higher than the threshold th, the selection signal Mnew is set to “1”, and the digital audio signal S-1 is converted to the digital audio signal. Selected as SS.

ところが、図７（ｂ）に示す例では、時刻ｔ１２における切換制御処理において、いずれの音声強度信号Ｅｓ−ｋ（ｋ＝１〜３）のレベルも閾値ｔｈに達しておらず、デジタルオーディオ信号Ｓ−ｋを選択するための判断の資料となる音声強度信号Ｅｓ−ｋがない。このため、時刻ｔ１２における切換制御処理では、時刻ｔ１１における切換制御処理において得られた選択信号Ｍｎｅｗ＝「２」が維持される。 However, in the example shown in FIG. 7B, in the switching control process at time t12, the level of any voice intensity signal Es-k (k = 1 to 3) has not reached the threshold th, and the digital audio signal S There is no voice intensity signal Es-k that serves as a reference for selecting -k. For this reason, in the switching control process at time t12, the selection signal Mnew = “2” obtained in the switching control process at time t11 is maintained.

この態様によれば、暗騒音のレベルの範囲内において音声強度信号Ｅｓ−ｋ（ｋ＝１〜３）のレベルの大小関係が変化しても、そのような変化は無視され、現状の選択信号Ｍｎｅｗが維持される。従って、収音される音声のレベルが低いときにデジタルオーディオ信号ＳＳとなるデジタルオーディオ信号Ｓ−ｋが頻繁に切り換えられるのを防止することができる。 According to this aspect, even if the magnitude relationship of the level of the voice intensity signal Es-k (k = 1 to 3) changes within the range of the background noise level, such a change is ignored, and the current selection signal Mnew is maintained. Therefore, it is possible to prevent the digital audio signal Sk that becomes the digital audio signal SS from being frequently switched when the level of collected sound is low.

ｂ．第２の態様
この態様においても、第１の態様と同様、閾値ｔｈ以上のレベルの音声強度信号Ｅｓ−ｋ（ｋ＝１〜３）のみを切換制御処理における判断の資料とする。また、この態様では、切換制御処理において、あるデジタルオーディオ信号Ｓ−ｋがデジタルオーディオ信号ＳＳとして選択されるためには、そのデジタルオーディオ信号Ｓ−ｋに対応した音声強度信号Ｅｓ−ｋのレベルが音声強度信号Ｅｓ−ｋ（ｋ＝１〜３）の中で最大であるだけでは不十分である。デジタルオーディオ信号Ｓ−ｋがデジタルオーディオ信号ＳＳとして選択されるためには、それに対応した音声強度信号Ｅｓ−ｋのレベルが、前回の切換制御処理においてレベルが最大であった音声強度信号のレベルを越えていなければならない。 b. Second Mode Also in this mode, as in the first mode, only the voice intensity signal Es-k (k = 1 to 3) having a level equal to or higher than the threshold th is used as a material for determination in the switching control process. Further, in this aspect, in order to select a digital audio signal Sk as the digital audio signal SS in the switching control process, the level of the audio intensity signal Es-k corresponding to the digital audio signal Sk is set. It is not sufficient that the maximum is the voice intensity signal Es-k (k = 1 to 3). In order for the digital audio signal Sk to be selected as the digital audio signal SS, the level of the voice strength signal Es-k corresponding to the digital audio signal SS is the level of the voice strength signal that has the maximum level in the previous switching control process. It must be exceeded.

図８は、この態様における切換制御処理の実行例を示すものである。この例において、時刻ｔ２２における切換制御処理では、音声強度信号Ｅｓ−２のレベルが最大であり、かつ、閾値ｔｈ以上である。また、この音声強度信号Ｅｓ−２のレベルは、前回の切換制御処理（時刻ｔ２１の切換制御処理）においてレベルが最大であった音声強度信号Ｅｓ−１のレベルよりも正の値ｉＶＧＣだけ大きい。このため、時刻ｔ２２の切換制御処理では、選択信号Ｍｎｅｗが「２」とされ、デジタルオーディオ信号Ｓ−２がデジタルオーディオ信号ＳＳとして選択される。 FIG. 8 shows an execution example of the switching control process in this mode. In this example, in the switching control process at time t22, the level of the sound intensity signal Es-2 is the maximum and is equal to or greater than the threshold th. Further, the level of the voice strength signal Es-2 is higher by a positive value iVGC than the level of the voice strength signal Es-1 that has the maximum level in the previous switching control process (switching control process at time t21). Therefore, in the switching control process at time t22, the selection signal Mnew is set to “2”, and the digital audio signal S-2 is selected as the digital audio signal SS.

図示は省略したが、仮に時刻ｔ２２の切換制御処理において最大である音声強度信号Ｅｓ−２のレベルが、時刻ｔ２１の切換制御処理時における音声強度信号Ｅｓ−１のレベル以下である場合には、デジタルオーディオ信号Ｓ−２はデジタルオーディオ信号ＳＳとして選択されない。 Although illustration is omitted, if the level of the voice strength signal Es-2 that is maximum in the switching control process at time t22 is equal to or lower than the level of the voice strength signal Es-1 at the time of switching control process at time t21, The digital audio signal S-2 is not selected as the digital audio signal SS.

この態様によれば、音声強度信号Ｅｓ−ｋ（ｋ＝１〜３）の大小関係に明確な変化が生じた場合に限り、選択信号Ｍｎｅｗの切り換えが行われるので、デジタルオーディオ信号ＳＳとなるデジタルオーディオ信号Ｓ−ｋが頻繁に切り換えられるのを防止することができる。 According to this aspect, the selection signal Mnew is switched only when there is a clear change in the magnitude relationship of the voice intensity signal Es-k (k = 1 to 3), so that the digital audio signal SS becomes the digital audio signal SS. It is possible to prevent the audio signal S-k from being frequently switched.

ｃ．第３の態様
この態様は、第２の態様における選択信号Ｍｎｅｗの安定性をさらに高めたものである。この態様においても、第１および第２の態様と同様、閾値ｔｈ以上のレベルの音声強度信号Ｅｓ−ｋ（ｋ＝１〜３）のみを切換制御処理における判断の資料とする。また、この態様では、切換制御処理において、ある音声強度信号Ｅｓ−ｋに対応したデジタルオーディオ信号Ｓ−ｋがデジタルオーディオ信号ＳＳとして選択されるためには、次の条件を満たすことが必要である。
条件１：その音声強度信号Ｅｓ−ｋのレベルが音声強度信号Ｅｓ−ｋ（ｋ＝１〜３）の中で最大であること。
条件２：その音声強度信号Ｅｓ−ｋの前回の切換制御処理における最大レベルの音声強度信号に対する増分ｉＶＧＣと、その音声強度信号Ｅｓ−ｋの前々回の切換制御処理における最大レベルの音声強度信号に対する増分ｉＶＧＣＲとを比較した場合に、ｉＶＧＣＲ＞ｉＶＧＣであること。 c. Third Aspect In this aspect, the stability of the selection signal Mnew in the second aspect is further enhanced. Also in this aspect, as in the first and second aspects, only the sound intensity signal Es-k (k = 1 to 3) having a level equal to or higher than the threshold th is used as a material for determination in the switching control process. In this aspect, in order to select the digital audio signal Sk corresponding to a certain sound intensity signal Es-k as the digital audio signal SS in the switching control process, it is necessary to satisfy the following condition. .
Condition 1: The level of the voice strength signal Es-k is the maximum among the voice strength signals Es-k (k = 1 to 3).
Condition 2: Increment iVGC of the voice strength signal Es-k with respect to the maximum level voice strength signal in the previous switching control processing and the increment of the voice strength signal Es-k with respect to the maximum level voice strength signal in the previous switching control processing. iVGCR> iVGC when compared with iVGCR.

図９は、この態様における切換制御処理の実行例を示すものである。この例において、時刻ｔ３３における切換制御処理では、音声強度信号Ｅｓ−２のレベルが最大であり、かつ、閾値ｔｈ以上である。また、この時刻ｔ３３の切換制御処理時における音声強度信号Ｅｓ−２のレベルは、前回の切換制御処理（時刻ｔ３２の切換制御処理）においてレベルが最大であった音声強度信号Ｅｓ−２のレベルよりも正の値ｉＶＧＣだけ大きい。さらに、時刻ｔ３３の切換制御処理時における音声強度信号Ｅｓ−２のレベルは、前々回の切換制御処理（時刻ｔ３１の切換制御処理）においてレベルが最大であった音声強度信号Ｅｓ−１のレベルよりも正の値ｉＶＧＣＲだけ大きい。そして、ｉＶＧＣＲ＞ｉＶＧＣである。このため、時刻ｔ３３の切換制御処理では、選択信号Ｍｎｅｗが「２」とされ、デジタルオーディオ信号Ｓ−２がデジタルオーディオ信号ＳＳとして選択される。 FIG. 9 shows an execution example of the switching control process in this mode. In this example, in the switching control process at time t33, the level of the sound intensity signal Es-2 is the maximum and is equal to or greater than the threshold th. Further, the level of the voice strength signal Es-2 at the time of the switching control process at time t33 is higher than the level of the voice strength signal Es-2 that has the maximum level in the previous switching control process (switching control process at time t32). Is also increased by a positive value iVGC. Furthermore, the level of the voice strength signal Es-2 at the time of the switching control process at time t33 is higher than the level of the voice strength signal Es-1 that has the maximum level in the previous switching control process (switching control process at time t31). Increased by a positive value iVGCR. And iVGCR> iVGC. Therefore, in the switching control process at time t33, the selection signal Mnew is set to “2”, and the digital audio signal S-2 is selected as the digital audio signal SS.

図示は省略したが、仮に時刻ｔ３３の切換制御処理において音声強度信号Ｅｓ−２のレベルが最大であったとしても、ｉＶＧＣＲ＞ｉＶＧＣなる条件が満たされない場合には、デジタルオーディオ信号Ｓ−２はデジタルオーディオ信号ＳＳとして選択されない。 Although illustration is omitted, if the condition of iVGCR> iVGC is not satisfied even if the level of the audio intensity signal Es-2 is maximum in the switching control process at time t33, the digital audio signal S-2 is digital. It is not selected as the audio signal SS.

この態様によれば、音声強度信号Ｅｓ−ｋ（ｋ＝１〜３）の大小関係に一時的な変化があってもそれは無視され、ある音声強度信号Ｅｓ−ｋが最大レベルであり、かつ、増加傾向にあることが明らかに認められる場合に限り、それに対応したデジタルオーディオ信号Ｓ−ｋが最終的なデジタルオーディオ信号ＳＳとして選択される。従って、デジタルオーディオ信号ＳＳとなるデジタルオーディオ信号Ｓ−ｋが頻繁に切り換えられるのを防止することができる。 According to this aspect, even if there is a temporary change in the magnitude relationship of the voice strength signal Es-k (k = 1 to 3), it is ignored, and a certain voice strength signal Es-k is at the maximum level, and Only when it is clearly recognized that there is an increasing tendency, the corresponding digital audio signal Sk is selected as the final digital audio signal SS. Therefore, it is possible to prevent the digital audio signal Sk as the digital audio signal SS from being frequently switched.

（３）デジタルオーディオ信号ＳＳおよびＳ／Ｎ比信号の出力の態様
出力部６０におけるデジタルオーディオ信号ＳＳおよびＳ／Ｎ比信号の出力に関しては各種の態様がある。 (3) Modes of Output of Digital Audio Signal SS and S / N Ratio Signal There are various modes for the output of the digital audio signal SS and S / N ratio signal in the output unit 60.

ある態様において、出力部６０は、図１０に例示するように、１サンプル毎にＳ／Ｎ比信号とデジタルオーディオ信号ＳＳの組を出力する。この場合において、Ｓ／Ｎ比信号とデジタルオーディオ信号の各サンプルは、別々のワードであってもよいが、例えばＳ／Ｎ比信号を上位ビット列、デジタルオーディオ信号ＳＳを下位ビット列とするワードを順次出力するように出力部６０を構成してもよい。この態様によれば、収音装置の出力信号を受け取る後段の装置は、任意のタイミングにおいて、デジタルオーディオ信号とそれに対応したＳ／Ｎ比信号を得ることができるという利点がある。 In an aspect, the output unit 60 outputs a set of an S / N ratio signal and a digital audio signal SS for each sample as illustrated in FIG. In this case, each sample of the S / N ratio signal and the digital audio signal may be a separate word, but for example, the words having the S / N ratio signal as the upper bit string and the digital audio signal SS as the lower bit string are sequentially arranged. The output unit 60 may be configured to output. According to this aspect, the subsequent apparatus that receives the output signal of the sound collecting apparatus has an advantage that a digital audio signal and an S / N ratio signal corresponding to the digital audio signal can be obtained at an arbitrary timing.

他の態様において、出力部６０は、図１１に例示するように、デジタルオーディオ信号ＳＳを所定個数のサンプルからなるフレームに分割し、フレーム単位でそのフレームにおける代表的なＳ／Ｎ比信号（例えば平均値）と、そのフレームに属する所定個数のデジタルオーディオ信号ＳＳのサンプルとを出力する。この態様によれば、全体としてのデータ量を減らすことができるという利点がある。 In another aspect, as illustrated in FIG. 11, the output unit 60 divides the digital audio signal SS into frames each including a predetermined number of samples, and a representative S / N ratio signal (for example, the frame) in the frame unit. Average value) and a predetermined number of samples of the digital audio signal SS belonging to the frame. According to this aspect, there is an advantage that the data amount as a whole can be reduced.

＜実施形態の効果＞
以上のように本実施形態では、音源の位置が変化する状況においても、音声成分の強度が最大であるデジタルオーディオ信号Ｓ−ｋが選択され、最終的なデジタルオーディオ信号ＳＳとして出力される。従って、音源の位置の変化によらず、常に最大の受音感度でデジタルオーディオ信号を取得することができる。また、本実施形態では、最終的なデジタルオーディオ信号ＳＳとして出力するデジタルオーディオ信号を切り換える場合に、一定時間を要して、新旧２つのデジタルオーディオ信号間でクロスフェードを行うので、出力されるデジタルオーディオ信号ＳＳに不自然な不連続を生じさせないという利点がある。 <Effect of embodiment>
As described above, in this embodiment, even in a situation where the position of the sound source changes, the digital audio signal Sk having the maximum sound component intensity is selected and output as the final digital audio signal SS. Therefore, a digital audio signal can always be acquired with the maximum sound receiving sensitivity regardless of the change in the position of the sound source. Further, in the present embodiment, when switching the digital audio signal to be output as the final digital audio signal SS, a certain time is required and a crossfade is performed between the old and new digital audio signals. There is an advantage that an unnatural discontinuity does not occur in the audio signal SS.

＜他の実施形態＞
以上、この発明の一実施形態について説明したが、この発明にはこれ以外にも他の実施形態が考えられる。例えば上記実施形態では、音声強度信号Ｅｓ−ｋ（ｋ＝１〜ｍ）に基づいて、最終的なデジタルオーディオ信号ＳＳとなるデジタルオーディオ信号Ｓ−ｋの選択を行ったが、音声強度信号Ｅｓ−ｋ（ｋ＝１〜ｍ）の各々を雑音強度信号Ｅｎ−ｋ（ｋ＝１〜ｍ）の各々により除算してＳ／Ｎ比信号Ｓ／Ｎ−ｋ（ｋ＝１〜ｍ）を生成し、最もレベルの高いＳ／Ｎ比信号Ｓ／Ｎ−ｋに対応したデジタルオーディオ信号Ｓ−ｋを最終的なデジタルオーディオ信号ＳＳとして選択してもよい。この態様によれば、例えば特定の方向において雑音が発生した場合に、雑音を収音したマイクロフォンの出力信号に基づいて生成された雑音強度信号のレベルが増大し、同マイクロフォンから得られたデジタルオーディオ信号が最終的なデジタルオーディオ信号ＳＳとして選択されるのを回避することができる。従って、局所的な雑音が突発的に発生する状況下でも、高いＳ／Ｎ比で収音を行うことができる。 <Other embodiments>
Although one embodiment of the present invention has been described above, other embodiments are possible for the present invention. For example, in the above-described embodiment, the digital audio signal Sk that is the final digital audio signal SS is selected based on the audio intensity signal Es-k (k = 1 to m). Each of k (k = 1 to m) is divided by each of noise intensity signals En-k (k = 1 to m) to generate an S / N ratio signal S / Nk (k = 1 to m). The digital audio signal Sk corresponding to the S / N ratio signal S / Nk having the highest level may be selected as the final digital audio signal SS. According to this aspect, for example, when noise occurs in a specific direction, the level of the noise intensity signal generated based on the output signal of the microphone that picks up the noise increases, and the digital audio obtained from the microphone is increased. It can be avoided that the signal is selected as the final digital audio signal SS. Therefore, sound can be collected with a high S / N ratio even in a situation where local noise suddenly occurs.

この発明の一実施形態である収音装置の構成を示すブロック図である。It is a block diagram which shows the structure of the sound collection device which is one Embodiment of this invention. 同実施形態におけるマイクロフォンの実装例を示す図である。It is a figure which shows the example of mounting of the microphone in the embodiment. 同実施形態におけるマイクロフォンの他の実装例を示す図である。It is a figure which shows the other mounting example of the microphone in the embodiment. 同実施形態における抽出部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the extraction part in the embodiment. 同実施形態における抽出部の他の構成例を示すブロック図である。It is a block diagram which shows the other structural example of the extraction part in the embodiment. 同実施形態の動作を示すタイムチャートである。It is a time chart which shows operation | movement of the embodiment. 同実施形態における切換制御処理の第１の態様を示すタイムチャートである。It is a time chart which shows the 1st aspect of the switching control process in the embodiment. 同実施形態における切換制御処理の第２の態様を示すタイムチャートである。It is a time chart which shows the 2nd aspect of the switching control process in the embodiment. 同実施形態における切換制御処理の第３の態様を示すタイムチャートである。It is a time chart which shows the 3rd aspect of the switching control process in the embodiment. 同実施形態における出力部のＳ／Ｎ比信号とデジタルオーディオ信号の出力の態様を示す図である。It is a figure which shows the aspect of the output of the S / N ratio signal and digital audio signal of the output part in the embodiment. 同実施形態における出力部のＳ／Ｎ比信号とデジタルオーディオ信号の出力の他の態様を示す図である。It is a figure which shows the other aspect of the output of the S / N ratio signal and digital audio signal of the output part in the embodiment.

Explanation of symbols

１１−ｋ（ｋ＝１〜ｍ）……マイクロフォン、１２−ｋ（ｋ＝１〜ｍ）……Ａ／Ｄ変換器、２０−ｋ（ｋ＝１〜ｍ）……抽出部、３０……出力信号合成部、３１−ｋ（ｋ＝１〜ｍ）……乗算器、３２……加算器、３３……合成制御部、４０……切換制御部、５０……Ｓ／Ｎ比信号生成部、６０……出力部。 11-k (k = 1 to m)... Microphone, 12-k (k = 1 to m)... A / D converter, 20-k (k = 1 to m). Output signal synthesis unit, 31-k (k = 1 to m)... Multiplier, 32... Adder, 33... Synthesis control unit, 40 ... switching control unit, 50 ... S / N ratio signal generation unit , 60 ... Output section.

Claims

Multiple microphones that pick up sound from the outside world and output electrical signals;
Output signal synthesis means for synthesizing and outputting an audio signal to be output from each output signal of the plurality of microphones;
Extraction means for extracting at least a sound component from each output signal of the plurality of microphones and outputting a signal indicating the intensity or S / N ratio of the sound component of each signal;
Based on the output signal of the extracting means, the output signal synthesizing means is arranged so that a signal having a high intensity of a sound component or a high S / N ratio among the output signals of the plurality of microphones is output by the output signal synthesizing means. And a switching control means for executing a switching control process for controlling the sound collecting device.

The sound collecting apparatus according to claim 1, wherein the switching control unit periodically repeats the switching control process.

The extraction means outputs a plurality of sound intensity signals each indicating the intensity of the sound component of each output signal of the plurality of microphones, and the switching control means performs the switching control processing based on the plurality of sound intensity signals. The sound collecting device according to claim 2, wherein the sound collecting device is executed.

In the switching control process periodically repeated, the switching control means recognizes that the intensity of the sound component of the output signal of the microphone recognized as the maximum in the current switching control process is the maximum in the previous switching control process. The output signal of the microphone recognized as having the maximum intensity of the sound component in the current switching control process is output by the output signal synthesizing means when the intensity is higher than the intensity of the sound component of the output signal of the selected microphone. 4. The sound collecting device according to claim 3, wherein the output signal synthesizing means is controlled.

The switching control means was recognized to be the maximum in the current switching control process with respect to the intensity of the sound component of the output signal of the microphone, which was recognized as the maximum in the previous switching control process, in the switching control process periodically repeated. In the current switching control process for the first increase, which is the increase in the intensity of the audio component of the output signal of the microphone, and the intensity of the audio component of the output signal of the microphone recognized as the maximum in the previous switching control process When the second increase is compared with the second increase, which is the increase in the intensity of the audio component of the recognized microphone output signal, and the second increase is greater than the first increase, the current switching In the control process, the output signal of the microphone recognized as having the maximum intensity of the voice component is output by the output signal synthesizing means. Pickup device according to Claim 3, wherein the controller controls the output signal combining means.

6. The sound collecting device according to claim 2, wherein the switching control unit executes the switching control process using only a sound intensity signal having a predetermined threshold value or more as a determination material.

The output signal synthesizing means takes a predetermined time when switching the output signal from the old one to the new one among the output signals of the plurality of microphones, and the signal after switching from the signal before switching to the signal after switching. The sound collecting device according to any one of claims 1 to 6, wherein the sound collecting device is cross-faded.

4. The sound collection device according to claim 3, wherein the extraction unit outputs an envelope of an effective value of a signal in a sound frequency band included in output signals of the plurality of microphones as the sound intensity signal.

The sound collecting device according to any one of claims 1 to 8, wherein the plurality of microphones are fixed with different directions in which the maximum sound receiving sensitivity can be obtained.