JP2006058395A

JP2006058395A - Sound signal input/output device

Info

Publication number: JP2006058395A
Application number: JP2004237530A
Authority: JP
Inventors: Shigeru Ando; 繁安藤; Toshio Hagiwara; 俊男萩原
Original assignee: SPECTRA KK
Current assignee: SPECTRA KK
Priority date: 2004-08-17
Filing date: 2004-08-17
Publication date: 2006-03-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a sound signal input/output device for voice recognition devices of various systems each of which is equipped with a speech recognition function. <P>SOLUTION: The sound signal input/output device 30 is equipped with a sound signal input means 1 which is constituted by combining microphones 1a to 1d with prescribed intervals d apart, an A/D converter 2 which converts analog sound signals Sa and Sd of the respective microphones 1a to 1d to digital sound signals Da to Dd, a sound source localization means 3 which detects the direction of a sound source by analyzing the converted digital sound signals Da to Dd, a sound source angle deciding means 11 which decides whether the sound source A localized by the sound source localization means 3 is the sound source from a direction within a preset angle range δ, a voice detecting means 12 which decides whether the sound source A is human voice or not by analyzing the digital sound signals Dd, and a gate circuit 20 which outputs the digital sound signals Dd only when the sound source A is in the direction within the preset angle δ and the human voice is included therein. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、音声認識機能を備えるカーナビゲーションシステムや人認証システムなど、人の声の認識とその音源の方位が重要となる各種システムの音声認識装置への音響信号の入出力インターフェイスの技術に関するものである。 The present invention relates to a technology of an input / output interface of an acoustic signal to a voice recognition device of various systems in which recognition of a human voice and a direction of a sound source are important, such as a car navigation system or a human authentication system having a voice recognition function. It is.

従来、音声認識機能を備えるカーナビゲーションシステムや人認証システムに内蔵されている音声認識装置（音声認識回路）への音響信号の入力は、概ね所定の場所に居る対象となる人の顔の位置と思われる方向に向けられた一つのマイクロフォンが拾った音響信号がそのまま入力されている。 Conventionally, the input of an acoustic signal to a voice recognition device (speech recognition circuit) built in a car navigation system or a human authentication system having a voice recognition function generally includes the position of the face of a person who is in a predetermined place. The acoustic signal picked up by one microphone directed in the expected direction is input as it is.

例えば、音声認識機能を備えるカーナビゲーションシステムでは、車両の天井やダッシュボード或いはハンドルに配設された運転手の口近辺に向けられた指向性のマイクロフォンにて運転手の声を拾い、音声認識装置がマイクロフォンで拾った音声（運転手の声（指示命令））を解析してシステムコントローラ（制御用マイクロコンピュータ）に伝送してシステムの制御を行うようになっている。 For example, in a car navigation system having a voice recognition function, a voice recognition device is provided by picking up a driver's voice with a directional microphone directed to the vicinity of the driver's mouth disposed on the ceiling, dashboard or steering wheel of the vehicle. The voice (driver's voice (instruction command)) picked up by the microphone is analyzed and transmitted to the system controller (control microcomputer) to control the system.

したがって、前記マイクロフォンが拾った音響信号には、いかに指向性を持たせても、近くに居る他の人（助手席や後席の同乗者）の声やラジオの音声、その他の雑音などが多少とも含まれていることになり、これが無条件に前記音声認識装置に入力されている。 Therefore, the acoustic signal picked up by the microphone may have some voices from other people (passengers and passengers in the front seat), radio sound, and other noises, no matter how directional. This is included in the voice recognition device unconditionally.

なお、音声認識に関する公知技術として、入力された音響信号が人の声か否かを判定する音声分析装置に関する技術が下記［特許文献１］に記載されている。 In addition, as a known technique related to speech recognition, a technique related to a speech analysis apparatus that determines whether or not an input acoustic signal is a human voice is described in [Patent Document 1] below.

この［特許文献１］に記載された第１の音声分析装置は、正規化平均振幅差関数の極小値と直近の極大値とを検出器で検出して、この検出された極小値から真の極小値を補間器で求めてピッチ検出器に入力し、更にこのピッチ検出器で検出されたピッチの予想領域を与えるガイドピッチをガイドピッチ演算器で求めてピッチ検出器に入力し、ピッチ検出器に重み変数を導入するように構成されており、ピッチ抽出の誤り率を著しく低減でき、音質の劣化を極力防止することができるというものである。 The first speech analyzer described in [Patent Document 1] detects the minimum value of the normalized average amplitude difference function and the latest maximum value with a detector, and detects the true value from the detected minimum value. The minimum value is obtained by an interpolator and input to a pitch detector, and a guide pitch that gives an expected area of the pitch detected by the pitch detector is obtained by a guide pitch calculator and input to the pitch detector. Is configured to introduce a weight variable to the pitch, and the error rate of pitch extraction can be significantly reduced, and deterioration of sound quality can be prevented as much as possible.

また、第２の音声分析装置は、アナログ−デジタル変換器で変換された音声信号と、スペクトル包絡パラメータ抽出器で抽出されたパラメータと、ピッチ検出器で検出されたピッチに対応する正規化平均振幅差関数の真の極小値と直近の極大値との差分の各値を有声音・無声音判別器に入力するように構成されており、有声音・無声音の判別誤り率を著しく低減できるというものである。 In addition, the second speech analysis apparatus includes a speech signal converted by the analog-digital converter, a parameter extracted by the spectrum envelope parameter extractor, and a normalized average amplitude corresponding to the pitch detected by the pitch detector. It is configured to input each difference value between the true minimum value of the difference function and the latest maximum value to the voiced / unvoiced sound discriminator, which can significantly reduce the discrimination error rate of voiced / unvoiced sound. is there.

次に、何処から音が聞こえてくるかを定める音源定位の聴覚センサシステムとして、下記［特許文献２］及び下記［非特許文献１］には、時空間微分法（時空間勾配法）と微分積和量を用いた音源定位アルゴリズムによる音源定位技術を駆使した３次元音源定位センサシステムに関する技術が記載されている。その音源定位の原理は以下の通りである。 Next, the following [Patent Document 2] and [Non-Patent Document 1] include a spatiotemporal differential method (a spatiotemporal gradient method) and a differential as a sound source localization auditory sensor system that determines where sound is heard. A technique related to a three-dimensional sound source localization sensor system using a sound source localization technique based on a sound source localization algorithm using a sum of products is described. The principle of sound source localization is as follows.

即ち、球面波として一定速度で伝播する音波は、球面の面積が距離の２乗に比例して拡がるために音波の振幅は距離に逆比例して減衰する。そこで、音波の振幅を近接した正方形上に設置された４つのマイクロフォンの４点で観測すると、音波の波面は音源に近いマイクロフォンではやや早く、遠いマイクロフォンではやや遅く到達する。音波の伝播速度とマイクロフォンの互いの間隔は既知なので、４つのマイクロフォンの捉えた音響信号の時間差から波面法線、つまり音源の方向が解る。 That is, a sound wave propagating as a spherical wave at a constant velocity has a spherical surface area that expands in proportion to the square of the distance, so that the amplitude of the sound wave attenuates in inverse proportion to the distance. Therefore, when the amplitude of the sound wave is observed at four points of four microphones installed on adjacent squares, the wavefront of the sound wave reaches a little earlier for a microphone close to the sound source and a little later for a far microphone. Since the propagation speed of the sound wave and the distance between the microphones are known, the wavefront normal, that is, the direction of the sound source can be determined from the time difference between the acoustic signals captured by the four microphones.

また、音波の振幅が距離に逆比例して減衰する性質から、音源との距離も、音源に近いマイクロフォンと遠いマイクロフォンでの振幅比とマイクロフォン同士の音源からの距離差から簡単に求められる。 Further, because of the property that the amplitude of the sound wave attenuates in inverse proportion to the distance, the distance to the sound source can also be easily obtained from the amplitude ratio between the microphones close to and far from the sound source and the distance difference between the microphones.

上記原理に基づく音源定位センサシステムは、集音装置の方位角を制御して音源を定位する時空間微分法を用いたシステムであり、音源から出力された音響信号が前記集音装置に到達するまでの時間差を所定時間ごとに演算する時間差演算手段と、前記演算された時間差の有効度を判定するための基準となる自己評価量を演算する自己評価量演算手段と、前記演算された自己評価量に基づいて、前記演算された時間差の有効度を判定する判定手段と、前記有効度が低いと判定された場合に前記方位角を変更する方位角制御手段と、前記有効度が高いと判定された前記時間差に基づいて前記音源を定位する音源定位手段とを備える構成である。 The sound source localization sensor system based on the above principle is a system using a spatio-temporal differential method in which the sound source is localized by controlling the azimuth angle of the sound collector, and an acoustic signal output from the sound source reaches the sound collector. A time difference calculating means for calculating a time difference until a predetermined time, a self-evaluation amount calculating means for calculating a self-evaluation amount serving as a reference for determining the effectiveness of the calculated time difference, and the calculated self-evaluation A determination unit that determines the effectiveness of the calculated time difference based on a quantity; an azimuth angle control unit that changes the azimuth when it is determined that the effectiveness is low; and a determination that the effectiveness is high And a sound source localization unit that localizes the sound source based on the time difference.

特公平１３−３１９５７００号公報Japanese Patent Publication No. 13-3195700

特公平９−２６４１４１７号公報Japanese Patent Publication No. 9-264417 安藤繁、篠田裕之、小川勝也、光山訓著、計測自動制御学会論文集，第２９巻，第５号，５２０頁〜５２８頁（１９９３）「時空間勾配法に基づく３次元音源定位センサシステム」。Shigeru Ando, Hiroyuki Shinoda, Katsuya Ogawa, Kunomi Mitsuyama, Proceedings of the Society of Instrument and Control Engineers, Vol. 29, No. 5, pp. 520-528 (1993) “3D sound source localization sensor system based on spatiotemporal gradient method” .

前述のように、従来の音声認識機能を備えるカーナビゲーションシステムや人認証システムなどでは、対象となるべき人の声以外の、近辺に居る他人の声や人の声でない雑音などが不可避的に前記音声認識装置に入力されてしまい、少なからず音声認識装置の誤認識、延いてはシステムの誤動作を誘発していた。 As described above, in a car navigation system or a human authentication system having a conventional voice recognition function, other than the voice of the person to be targeted, other people's voice in the vicinity or noise that is not the voice of the person is unavoidable. It was input to the voice recognition device, and it caused a number of misrecognitions by the voice recognition device, which in turn caused a malfunction of the system.

本発明は、音声認識機能を有する種々のシステムにおける音声認識装置（音声認識回路）に入力される対象となる人以外の音声とそれによるシステム誤動作の問題点に鑑み、予めシステムの音声認識装置に入力されるマイクロフォンが拾った音響信号に条件を与えて選別することにより、前記音声認識機能を備えるシステムの誤動作を極力低減するようにした音響信号入出力装置を提供するものである。 The present invention provides a speech recognition apparatus for a system in advance in view of the problem of speech other than a person to be input to a speech recognition apparatus (speech recognition circuit) in various systems having a speech recognition function and a system malfunction caused thereby. It is an object of the present invention to provide an acoustic signal input / output device capable of reducing malfunctions of a system having the voice recognition function as much as possible by selecting and selecting conditions for acoustic signals picked up by an input microphone.

本発明は、
（１）複数のマイクロフォン１ａ、１ｂ、１ｃ、１ｄを所定間隔ｄ離して組み合わせてなる音響信号入力手段１と、前記音響信号入力手段１によって得られた前記各マイクロフォン１ａ、１ｂ、１ｃ、１ｄのアナログ音響信号Ｓａ、Ｓｂ、Ｓｃ、Ｓｄをデジタル音響信号Ｄａ、Ｄｂ、Ｄｃ、Ｄｄに変換するＡ／Ｄコンバータ２と、前記Ａ／Ｄコンバータ２にて変換された前記各マイクロフォン１ａ、１ｂ、１ｃ、１ｄのデジタル音響信号Ｄａ、Ｄｂ、Ｄｃ、Ｄｄを分析して音源の方向を検出する音源定位手段３と、前記音源定位手段３にて定位された音源Ａが予め設定された角度範囲δ内の方向からの音源か否かを判定する音源角度判定手段１１と、前記音源Ａが人の音声か否かを前記Ａ／Ｄコンバータ２にて変換された前記マイクロフォン１ｄのデジタル音響信号Ｄｄを分析して判定する音声検出手段１２と、前記音響信号入力手段１で得られた音源Ａが予め設定された前記角度範囲δ内の方向であり且つ人の音声が含まれる場合のみ前記デジタル音響信号Ｄｄを出力するゲート回路２０と、を備えることを特徴とする音響信号入出力装置３０、を提供することにより上記課題を解決する。 The present invention
(1) An acoustic signal input means 1 formed by combining a plurality of microphones 1a, 1b, 1c, 1d with a predetermined interval d, and each of the microphones 1a, 1b, 1c, 1d obtained by the acoustic signal input means 1 An A / D converter 2 that converts analog acoustic signals Sa, Sb, Sc, and Sd into digital acoustic signals Da, Db, Dc, and Dd, and the microphones 1a, 1b, and 1c converted by the A / D converter 2 A sound source localization means 3 for analyzing the 1d digital acoustic signals Da, Db, Dc, Dd and detecting the direction of the sound source; Sound source angle determining means 11 for determining whether the sound source is from the direction of the sound, and the microphone converted by the A / D converter 2 whether the sound source A is a human voice. sound detecting means 12 for analyzing and determining the digital sound signal Dd of d, and the sound source A obtained by the sound signal input means 1 is in a direction within the preset angle range δ and includes human speech The above-mentioned problem is solved by providing an acoustic signal input / output device 30 including a gate circuit 20 that outputs the digital acoustic signal Dd only when it is detected.

（２）また、複数のマイクロフォン１ａ、１ｂ、１ｃ、１ｄを所定間隔ｄ離して組み合わせてなる音響信号入力手段１と、前記音響信号入力手段によって得られた前記各マイクロフォン１ａ、１ｂ、１ｃ、１ｄのアナログ音響信号Ｓａ、Ｓｂ、Ｓｃ、Ｓｄをデジタル音響信号に変換するＡ／Ｄコンバータ２と、前記Ａ／Ｄコンバータ２にて変換された前記各マイクロフォン１ａ、１ｂ、１ｃ、１ｄのデジタル音響信号Ｄａ、Ｄｂ、Ｄｃ、Ｄｄを分析して音源の方向を検出する音源定位手段３と、前記音源定位手段３にて定位された音源が予め設定された角度範囲δ内の方向からの音源か否かを判定する音源角度判定手段１１と、前記音源Ａが人の音声か否かを前記Ａ／Ｄコンバータ２にて変換された前記マイクロフォン１ｄのデジタル音響信号Ｄｄを分析して判定する音声検出手段１２と、前記音響信号入力手段１で得られた音源Ａが予め設定された前記角度範囲δ内の方向であり且つ人の音声が含まれる場合のみ前記アナログ音響信号Ｓｄを出力するゲート回路２１と、を備えることを特徴とする音響信号入出力装置４０、を提供することにより上記課題を解決する。 (2) Also, an acoustic signal input means 1 formed by combining a plurality of microphones 1a, 1b, 1c, 1d with a predetermined interval d, and the microphones 1a, 1b, 1c, 1d obtained by the acoustic signal input means. A / D converter 2 that converts analog acoustic signals Sa, Sb, Sc, and Sd into digital acoustic signals, and the digital acoustic signals of the microphones 1a, 1b, 1c, and 1d converted by the A / D converter 2 Sound source localization means 3 that detects the direction of the sound source by analyzing Da, Db, Dc, Dd, and whether the sound source localized by the sound source localization means 3 is a sound source from a direction within a preset angle range δ. And the sound source angle determination means 11 for determining whether the sound source A is a human voice or not, and the digital acoustic signal of the microphone 1d converted by the A / D converter 2. Only when the sound source 12 obtained by analyzing and determining Dd and the sound signal input means 1 is in the preset angle range δ and includes human speech, the analog The above problem is solved by providing an acoustic signal input / output device 40 including a gate circuit 21 that outputs an acoustic signal Sd.

本音響信号入出力装置は、入力設定角を所定角に設定した場合に、設定角の範囲内にある音源の音のみに対して且つそれが人の声である場合に対してのみ、対象システムの音声認識装置に音声を出力するので、上記の如く限定された範囲の人の音声の音源がある場合にのみ、カーナビゲーションや人認証システムなどのシステムに内蔵されている音声認識装置へ音源の音響信号が入力されて音声認識が起動するので、音声認識装置の誤認識が大幅に低減される。 This acoustic signal input / output device is applicable only to the sound of the sound source within the range of the set angle and when it is a human voice when the input set angle is set to a predetermined angle. Therefore, only when there is a sound source of human voice in a limited range as described above, the sound source is not transmitted to the voice recognition device built in a system such as a car navigation system or a human authentication system. Since the sound signal is input and the voice recognition is activated, the erroneous recognition of the voice recognition device is greatly reduced.

本発明の請求項１の音響信号入出力装置は、対象システムにおけるデジタル音声入力の音声認識装置の誤認識を低減し、延いては対象システムの誤動作を低減することができる。 The acoustic signal input / output device according to claim 1 of the present invention can reduce misrecognition of a digital speech input speech recognition device in the target system, and thereby reduce malfunction of the target system.

また、本発明の請求項２の音響信号入出力装置は、対象システムにおけるアナログ音声入力の音声認識装置の誤認識を低減することができ、延いては対象システムの誤動作を低減することができる。 The acoustic signal input / output device according to claim 2 of the present invention can reduce erroneous recognition of the analog speech input speech recognition device in the target system, and thus can reduce malfunction of the target system.

本発明に係る音響信号入出力装置の実施の形態について図面に基づいて説明する。 An embodiment of an acoustic signal input / output device according to the present invention will be described with reference to the drawings.

図１は本発明に係る第１の音響信号入出力装置の構成例を示すブロック図である。図２は本発明に係る第２の音響信号入出力装置の構成例を示すブロック図である。図３は本発明に係る複数のマイクロフォンを所定間隔離して組み合わせてなる音響信号入力手段の構成を示す斜視図である。図４は本発明のデジタル出力の音響信号入出力装置とデジタル入力の音声認識装置を内蔵するカーナビゲーションシステムとを組み合わせたシステムを説明するための図である。図５は本発明のアナログ出力の音響信号入出力装置とアナログ入力の音声認識装置を内蔵する人認証システムとを組み合わせたシステムを説明するための図である。 FIG. 1 is a block diagram showing a configuration example of a first acoustic signal input / output device according to the present invention. FIG. 2 is a block diagram showing a configuration example of a second acoustic signal input / output device according to the present invention. FIG. 3 is a perspective view showing the configuration of an acoustic signal input means formed by combining a plurality of microphones according to the present invention at a predetermined interval. FIG. 4 is a diagram for explaining a system in which a digital output acoustic signal input / output device of the present invention and a car navigation system incorporating a digital input speech recognition device are combined. FIG. 5 is a diagram for explaining a system in which an analog output acoustic signal input / output device of the present invention and a human authentication system incorporating an analog input speech recognition device are combined.

図１において、音響信号入出力装置３０は、図３に示されるように４つのマイクロフォン１ａ、１ｂ、１ｃ、１ｄを所定間隔ｄ離して組み合わせてなる（一辺の長さがｄの正方形の角の位置にそれぞれ向きを揃えて平行に配置する。）音響信号入力手段１と、前記音響信号入力手段１によって得られた前記各マイクロフォン１ａ、１ｂ、１ｃ、１ｄのアナログ音響信号Ｓａ、Ｓｂ、Ｓｃ、Ｓｄをデジタル音響信号Ｄａ、Ｄｂ、Ｄｃ、Ｄｄ（例えば１２ビット）に変換するＡ／Ｄコンバータ２と、前記Ａ／Ｄコンバータ２にて変換された前記各マイクロフォン１ａ、１ｂ、１ｃ、１ｄのデジタル音響信号Ｄａ、Ｄｂ、Ｄｃ、Ｄｄを分析して音源Ａの方向を検出する音源定位手段３と、前記音源定位手段３にて定位された音源Ａが予め設定された角度範囲δ内の方向からの音源か否かを判定する音源角度判定手段１１と、前記音源Ａが人の音声か否かを前記Ａ／Ｄコンバータ２にて変換された前記マイクロフォン１ｄのデジタル音響信号Ｄｄを分析して判定する音声検出手段１２と、前記音響信号入力手段１で得られた音源Ａが予め設定された前記角度範囲δ内の方向であり且つ人の音声が含まれる場合のみ前記デジタル音響信号Ｄｄを出力するゲート回路２０と、を備える構成である。 In FIG. 1, an acoustic signal input / output device 30 is formed by combining four microphones 1a, 1b, 1c, and 1d with a predetermined interval d as shown in FIG. The sound signal input means 1 and the analog sound signals Sa, Sb, Sc, of the microphones 1a, 1b, 1c, 1d obtained by the sound signal input means 1 are arranged in parallel at the positions. An A / D converter 2 that converts Sd into digital acoustic signals Da, Db, Dc, and Dd (for example, 12 bits), and digital signals of the microphones 1a, 1b, 1c, and 1d converted by the A / D converter 2 The sound source localization means 3 for detecting the direction of the sound source A by analyzing the acoustic signals Da, Db, Dc, Dd, and the sound source A localized by the sound source localization means 3 are preset. The sound source angle determination means 11 for determining whether the sound source is from a direction within the degree range δ, and the digital sound of the microphone 1d converted by the A / D converter 2 whether the sound source A is a human voice. Only when the sound detection unit 12 that analyzes and determines the signal Dd and the sound source A obtained by the acoustic signal input unit 1 is in the predetermined angle range δ and includes human speech. And a gate circuit 20 that outputs a digital acoustic signal Dd.

以下、詳細に述べると、先ず、上記音声検出手段１２に入力されるデジタル音響信号は、前記マイクロフォン１ｄのデジタル音響信号Ｄｄに限らず、他のマイクロフォン１ａ、１ｂ、１ｃのデジタル音響信号Ｄａ、Ｄｂ、Ｄｃの何れか１つでもよく、複数でもよい。また、前記ゲート回路２０に出力されるデジタル音響信号は、図１のデジタル音響信号Ｄｄに限らず、他のマイクロフォン１ａ、１ｂ、１ｃのデジタル音響信号Ｄａ、Ｄｂ、Ｄｃの何れか１つでもよく、複数でもよい。尤も、一般のシステムでは何れか１つで十分である。 Hereinafter, in detail, first, the digital acoustic signal input to the sound detection means 12 is not limited to the digital acoustic signal Dd of the microphone 1d, but the digital acoustic signals Da, Db of the other microphones 1a, 1b, 1c. , Dc, or a plurality of Dc. Further, the digital acoustic signal output to the gate circuit 20 is not limited to the digital acoustic signal Dd in FIG. 1, but may be any one of the digital acoustic signals Da, Db, and Dc of the other microphones 1a, 1b, and 1c. There may be more than one. However, any one is sufficient in a general system.

上記音響信号入出力装置３０では、ゲート回路２０が音源角度判定手段１１の出力信号Ｅ１と音声検出手段１２の出力信号Ｅ２とデジタル音響信号Ｄｄとを入力するＡＮＤ論理回路で構成されている。 In the acoustic signal input / output device 30, the gate circuit 20 is configured by an AND logic circuit that inputs the output signal E1 of the sound source angle determination means 11, the output signal E2 of the sound detection means 12, and the digital acoustic signal Dd.

前記各マイクロフォン１ａ、１ｂ、１ｃ、１ｄの配置は、例えばその隣り合う間隔ｄが一辺の長さｄ＝３０ｍｍの正方形の角にそれぞれ位置するように平行に配置する。また、人間の聴覚との対応を考慮して、一辺の長さｄの正方形の対角線の長さが人の両耳間隔とほぼ等しい１５０ｍｍとなるようにｄ＝１０６ｍｍ程度に離して配置してもよい。 The microphones 1a, 1b, 1c, and 1d are arranged in parallel so that, for example, the adjacent distance d is positioned at each corner of a square having a side length d = 30 mm. Further, in consideration of correspondence with human hearing, a square diagonal line having a side length d may be spaced apart by about d = 106 mm so that the length of a diagonal line of 150 mm is approximately equal to the distance between human ears. Good.

前記音源定位手段３は、公知技術の前記［特許文献２］及び［非特許文献１］に記載された聴覚センサシステムが利用できる。 As the sound source localization means 3, the auditory sensor system described in [Patent Document 2] and [Non-Patent Document 1] of publicly known technology can be used.

即ち、図１において、Ａ／Ｄ変換されたデジタル音響信号Ｄａ、Ｄｂ、Ｄｃ、Ｄｄは、それぞれサンプリング・データ・バッファ４、・・・に記憶され、それらデータの和（合成音場ｆ）を合成演算器５ａで求め、その合成音場ｆをローパスフィルタ６に入力するとともに時間微分演算器７に入力して時間勾配ｆｔを求める。 That is, in FIG. 1, A / D converted digital acoustic signals Da, Db, Dc, Dd are respectively stored in the sampling data buffer 4,..., And the sum (synthetic sound field f) of these data is stored. The synthesized sound field f is input to the low-pass filter 6 and input to the time differentiation calculator 7 to determine the time gradient ft.

また、Ｘ方向，Ｙ方向の空間勾配ｆｘ，ｆｙをそれぞれＸ方向空間勾配演算器５ｂ、Ｙ方向空間勾配演算器５ｃで求めて、前記Ｘ方向空間勾配演算器５ｂの出力ｆｘをローパスフィルタ８に入力し、前記Ｙ方向空間勾配演算器５ｃの出力ｆｙをローパスフィルタ９に入力する。 Further, the spatial gradients fx and fy in the X direction and the Y direction are obtained by the X direction spatial gradient calculator 5b and the Y direction spatial gradient calculator 5c, respectively, and the output fx of the X direction spatial gradient calculator 5b is sent to the low-pass filter 8. The output fy of the Y-direction spatial gradient calculator 5 c is input to the low-pass filter 9.

次に、音源角度演算器１０は、前記ローパスフィルタ６と時間微分演算器７とローパスフィルタ８とローパスフィルタ９のそれぞれの出力信号ｆ，ｆｔ，ｆｘ，ｆｙを用いて音源の方向（マイクロフォンの集音中心位置を原点とし、マイクロフォンの向きを極軸とする３次元の球座標（ｒ，θ，φ）で表示されるところの音源の極軸からの角度θ。）を算出する。 Next, the sound source angle calculator 10 uses the output signals f, ft, fx, and fy of the low-pass filter 6, the time differentiation calculator 7, the low-pass filter 8, and the low-pass filter 9, respectively, An angle θ from the polar axis of the sound source displayed in three-dimensional spherical coordinates (r, θ, φ) having the sound center position as the origin and the microphone direction as the polar axis is calculated.

前記音源角度判定手段１１は、算出された音源の方向が予め設定した角度範囲δ内であるか否かを判定して論理信号Ｅ１を出力する比較演算器である。単純に上記音源定位手段３で得られた音源の所定基準線（極軸）からの円錐角θ１（３次元なので球座標（ｒ，θ，φ）の円錐角（余緯度）θと極角（経度）φで方位が表示される。）が所定角度δの範囲内か否かを両者の差分（δ−θ１）を求める減算にて判定する論理回路が適用でき、角度範囲内であれば論理「１」の信号を出力し角度範囲外であれば論理「０」の信号を出力する。 The sound source angle determination means 11 is a comparator for determining whether or not the calculated direction of the sound source is within a preset angle range δ and outputting a logic signal E1. The cone angle θ1 from the predetermined reference line (polar axis) of the sound source obtained by the sound source localization means 3 (the three-dimensional cone angle (coordinate latitude) θ and the polar angle of the spherical coordinates (r, θ, φ)) (Longitude) The direction is displayed with φ.) A logic circuit that can determine whether or not is within the range of the predetermined angle δ by subtraction to obtain the difference (δ−θ1) between the two can be applied. A “1” signal is output, and if it is out of the angle range, a logic “0” signal is output.

前記音声検出手段１２は、公知技術の前記［特許文献１］に記載された音声分析装置を利用するのが好ましい。 The voice detection means 12 preferably uses a voice analysis device described in the above-mentioned [Patent Document 1].

即ち、図１において、Ａ／Ｄ変換されたデジタル音響信号Ｄｄは、線形予測分析器１３に入力されて、偏自己相関関数Ｋ［ｉ］と残差ＥＮを算出する。これらはＰＡＲＣＯＲ分析フィルタ１４で予測残差ｅｄ［ｉ］が算出され、ローパスフィルタ１５を介して平均振幅差関数演算器１６にて平均振幅差関数ｒａ［ｋ］を算出する。次に、極小点検出器１７にて前記平均振幅差関数の極小点ｒａｍｉｎを検出し、ピッチ抽出器１８でピッチｐｉｔｃｈを抽出する。音声区間判定器１９は前記偏自己相関関数Ｋ［ｉ］と残差ＥＮ、前記ピッチｐｉｔｃｈ、前記平均振幅差関数の極小点ｒａｍｉｎのデータから人の音声か否かを判定し、論理信号Ｅ２を出力する。例えば人の音声であれば「１」を出力しそれ以外の音であれば「０」を出力する。 That is, in FIG. 1, the A / D converted digital acoustic signal Dd is input to the linear prediction analyzer 13, and the partial autocorrelation function K [i] and the residual EN are calculated. The prediction residual ed [i] is calculated by the PARCOR analysis filter 14, and the average amplitude difference function ra [k] is calculated by the average amplitude difference function calculator 16 via the low-pass filter 15. Next, the minimum point detector 17 detects the minimum point ramin of the average amplitude difference function, and the pitch extractor 18 extracts the pitch pitch. The speech section determiner 19 determines whether or not it is a human speech from the data of the partial autocorrelation function K [i], the residual EN, the pitch pitch, and the minimum point ramin of the average amplitude difference function, and outputs a logical signal E2. Output. For example, “1” is output for a human voice, and “0” is output for other sounds.

而して、前記ゲート回路２０のＡＮＤ論理回路に入力されるデジタル音響信号Ｄｄと論理信号Ｅ１、Ｅ２から、ゲート回路２０の出力は、所定角度範囲δ内の音源からの音であり、且つ人の音声である場合に限ってそのデジタル音響信号Ｄｄが出力されることになり、図４に示されるように、音声認識機能を備えるカーナビゲーションシステム３２の音声認識装置３１に本音響信号入出力装置３０を介して入力されるのは所定角度範囲δ内の音源Ａである人の音声となる。なお、人の音声帯域以外の音は可及的に図示されないフィルタを介して除去することが望ましい。 Thus, from the digital acoustic signal Dd and the logic signals E1 and E2 input to the AND logic circuit of the gate circuit 20, the output of the gate circuit 20 is a sound from a sound source within a predetermined angle range δ, and The digital sound signal Dd is output only in the case of the sound of the sound, and as shown in FIG. 4, the sound signal input / output device is connected to the sound recognition device 31 of the car navigation system 32 having the sound recognition function. What is input via 30 is the voice of the person who is the sound source A within the predetermined angle range δ. In addition, it is desirable to remove sounds other than the human voice band through a filter (not shown) as much as possible.

次に、図２に示される音響信号入出力装置４０は、前述の音響信号入出力装置３０と同様に、複数のマイクロフォン１ａ、１ｂ、１ｃ、１ｄを所定間隔ｄ離して組み合わせてなる音響信号入力手段１と、前記音響信号入力手段１によって得られた前記各マイクロフォン１ａ、１ｂ、１ｃ、１ｄのアナログ音響信号Ｓａ、Ｓｂ、Ｓｃ、Ｓｄをデジタル音響信号に変換するＡ／Ｄコンバータ２と、前記Ａ／Ｄコンバータ２にて変換された前記各マイクロフォン１ａ、１ｂ、１ｃ、１ｄのデジタル音響信号Ｄａ、Ｄｂ、Ｄｃ、Ｄｄを分析して音源の方向を検出する音源定位手段３と、前記音源定位手段３にて定位された音源Ａが予め設定された角度範囲δの方向からの音源か否かを判定する音源角度判定手段１１と、前記音源Ａが人の音声か否かを前記Ａ／Ｄコンバータ２にて変換された前記マイクロフォン１ｄのデジタル音響信号Ｄｄを分析して判定する音声検出手段１２と、を備え、且つ、前記音響信号入力手段１で得られた音源Ａが予め設定された前記角度範囲δ内の方向であり且つ人の音声が含まれる場合のみ前記アナログ音響信号Ｓｄを出力するゲート回路２１と、を備える構成である。 Next, the acoustic signal input / output device 40 shown in FIG. 2 is similar to the acoustic signal input / output device 30 described above, and an acoustic signal input is formed by combining a plurality of microphones 1a, 1b, 1c, 1d with a predetermined distance d apart. Means 1, and an A / D converter 2 for converting the analog sound signals Sa, Sb, Sc, Sd of the microphones 1a, 1b, 1c, 1d obtained by the sound signal input means 1 into digital sound signals; Sound source localization means 3 that analyzes the digital acoustic signals Da, Db, Dc, Dd of the microphones 1a, 1b, 1c, 1d converted by the A / D converter 2 to detect the direction of the sound source, and the sound source localization A sound source angle determining means 11 for determining whether the sound source A localized by the means 3 is a sound source from a preset angle range δ; and whether the sound source A is a human voice. Voice detecting means 12 for analyzing and determining the digital acoustic signal Dd of the microphone 1d converted by the A / D converter 2, and the sound source A obtained by the acoustic signal input means 1 And a gate circuit 21 that outputs the analog acoustic signal Sd only when the direction is within the set angle range δ and human speech is included.

この音響信号入出力装置４０では、ゲート回路２１が音源角度判定手段１１の出力信号Ｅ１と音声検出手段１２の出力信号Ｅ２との論理積をＡＮＤ論理回路２１ｂで取り、そのＡＮＤ論理出力Ｅ３でマイクロフォン１ｄのアナログ音響信号Ｓｄの出力ゲートスイッチ２１ａ（スイッチングトランジスタなどで構成）の開閉を行う構成となっていて、前記音響信号入出力装置３０との相違点はデジタル音響信号Ｄｄを出力するゲート回路２０に代えてアナログ音響信号Ｓｄを出力するゲート回路２１を備える点にある。 In this acoustic signal input / output device 40, the gate circuit 21 takes the logical product of the output signal E1 of the sound source angle determination means 11 and the output signal E2 of the sound detection means 12 by the AND logic circuit 21b, and the microphone is output by the AND logic output E3. The output gate switch 21a (configured by a switching transistor or the like) for the 1d analog acoustic signal Sd is opened and closed. The difference from the acoustic signal input / output device 30 is a gate circuit 20 that outputs a digital acoustic signal Dd. Instead of this, a gate circuit 21 for outputting an analog acoustic signal Sd is provided.

なお、上記音声検出手段１２に入力されるデジタル音響信号は図２のマイクロフォン１ｄのデジタル音響信号Ｄｄに限らず、他のマイクロフォン１ａ、１ｂ、１ｃのデジタル音響信号Ｄａ、Ｄｂ、Ｄｃの何れか１つでもよく、複数でもよい。また、前記ゲート回路２１に出力されるアナログ音響信号は、図２のアナログ音響信号Ｓｄに限らず、他のマイクロフォン１ａ、１ｂ、１ｃのアナログ音響信号Ｓａ、Ｓｂ、Ｓｃの何れかでもよく、複数でもよい。尤も、一般のシステムでは何れか１つで十分である。 The digital sound signal input to the sound detection means 12 is not limited to the digital sound signal Dd of the microphone 1d in FIG. 2, but any one of the digital sound signals Da, Db, Dc of the other microphones 1a, 1b, 1c. One or more. The analog acoustic signal output to the gate circuit 21 is not limited to the analog acoustic signal Sd in FIG. 2, and may be any of the analog acoustic signals Sa, Sb, Sc of the other microphones 1 a, 1 b, 1 c, But you can. However, any one is sufficient in a general system.

上記音響信号入出力装置３０または４０の出力であるデジタル音響信号Ｄｄまたはアナログ音響信号Ｓｄは、図４のカーナビゲーションシステム３２や図５の人認証システム４２のようなデジタル入力またはアナログ入力の音声認識機能を備える種々のシステムの音声認識装置（音声認識回路）３１、４１に入力される。 The digital acoustic signal Dd or the analog acoustic signal Sd, which is the output of the acoustic signal input / output device 30 or 40, is a digital or analog input speech recognition as in the car navigation system 32 in FIG. 4 or the human authentication system 42 in FIG. Input to speech recognition devices (speech recognition circuits) 31 and 41 of various systems having functions.

而して、前記角度範囲δ外の音源Ｂ、音源Ｃからの音は、たとえそれが人の声であっても、本音響信号入出力装置３０、４０から音声認識装置３１、４１へ出力されず、音声認識装置３１、４１の誤認識、延いてはカーナビゲーションシステム３２や人認証システム４２などの各種システムの誤動作は格段に低減されることになる。 Thus, sounds from the sound source B and the sound source C outside the angle range δ are output from the sound signal input / output devices 30 and 40 to the speech recognition devices 31 and 41 even if they are human voices. Therefore, the erroneous recognition of the speech recognition devices 31 and 41 and the malfunction of various systems such as the car navigation system 32 and the human authentication system 42 are greatly reduced.

なお、図４、図５では２次元で設定範囲の角度δ、δ´を表記しているが、実際の入力設定角δ、δ´は３次元の球座標の極軸Ｚからの円錐角θで表される角度表示として設定される。 In FIGS. 4 and 5, the angles δ and δ ′ of the setting range are shown in two dimensions, but the actual input setting angles δ and δ ′ are the cone angles θ from the polar axis Z of the three-dimensional spherical coordinates. Is set as an angle display.

本音響信号入出力装置３０または４０は、設定角度範囲δを図４のようにマイクロフォン１ａ、・・・の正面側のみならず、図５のように任意の角度範囲δ´の設定が可能である。 In the acoustic signal input / output device 30 or 40, the set angle range δ can be set not only on the front side of the microphone 1a as shown in FIG. 4, but also in an arbitrary angle range δ ′ as shown in FIG. is there.

例えば、図４のカーナビゲーションシステム３２における音声認識装置３１の音響信号入力インターフェイスに本音響信号入出力装置３０を接続し、音源Ａが運転手の声、音源Ｂが助手席の人の声、音源Ｃがラジオのスピーカの出力音声とした場合に、図のように入力設定角を設定すると、運転手の発する音源Ａの音声のみにより本装置が起動して、音源Ａの運転手が発する声のみがカーナビゲーションシステム３２の音声認識装置３１に入力される。したがって、音源Ｂの助手席の人の声やラジオの音源Ｃの音声は角度δの範囲外であるために音声認識装置３１には音響信号として入力されず、音声認識装置３１の誤動作が防止されるのである。仮に助手席の人の声（音源Ｂ）を対象の音源とする場合は、音源定位手段３の設定角度範囲δを変更するだけで簡単に変更が可能である。 For example, the sound signal input / output device 30 is connected to the sound signal input interface of the voice recognition device 31 in the car navigation system 32 of FIG. 4, and the sound source A is the voice of the driver, the sound source B is the voice of the passenger in the passenger seat, and the sound source If C is the output sound of a radio speaker and the input setting angle is set as shown in the figure, this device is activated only by the sound of the sound source A emitted by the driver, and only the sound emitted by the driver of the sound source A Is input to the voice recognition device 31 of the car navigation system 32. Therefore, the voice of the passenger seat of the sound source B and the sound of the sound source C of the radio are out of the range of the angle δ, and thus are not input as an acoustic signal to the speech recognition device 31, and malfunction of the speech recognition device 31 is prevented. It is. If the voice of the passenger in the passenger seat (sound source B) is used as the target sound source, it can be easily changed by simply changing the set angle range δ of the sound source localization means 3.

本発明は、従来のカーナビゲーションシステムや人認証システムなどの音声認識機能を備える各種システムにおける音声認識装置の誤認識の低減を目的として、元々ロボットの聴覚センサシステムとして開発された３次元音源定位センサシステムの作用効果と、精密な音声分析のための音声分析装置の作用効果の両者に着眼して、それらを有機的に組み合わせて新たな音声認識のための精度向上の手段とした点に創意工夫が存し、既存の音声認識装置（音声認識回路）の入力側に付加することで簡単且つ飛躍的に音声認識の精度が向上する誠に有益なものであることは言うまでもない。 The present invention is a three-dimensional sound source localization sensor that was originally developed as an auditory sensor system for a robot in order to reduce misrecognition of a voice recognition device in various systems having a voice recognition function such as a conventional car navigation system and a human authentication system. Focusing on both the system effects and the effects of the speech analysis device for precise speech analysis, the idea is to organically combine them into a means for improving accuracy for new speech recognition. Needless to say, it is very useful to add to the input side of an existing speech recognition device (speech recognition circuit) and to improve speech recognition accuracy easily and dramatically.

本発明に係る第１の音響信号入出力装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the 1st acoustic signal input / output device which concerns on this invention. 本発明に係る第２の音響信号入出力装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the 2nd acoustic signal input / output device which concerns on this invention. 本発明に係る複数のマイクロフォンを所定間隔離して組み合わせてなる音響信号入力手段の構成を示す斜視図である。It is a perspective view which shows the structure of the acoustic signal input means formed by separating a plurality of microphones according to the present invention at predetermined intervals. 本発明のデジタル出力の音響信号入出力装置とデジタル入力の音声認識装置を内蔵するカーナビゲーションシステムとを組み合わせたシステムを説明するための図である。It is a figure for demonstrating the system which combined the acoustic signal input / output apparatus of the digital output of this invention, and the car navigation system which incorporates the speech recognition apparatus of a digital input. 本発明のアナログ出力の音響信号入出力装置とアナログ入力の音声認識装置を内蔵する人認証システムとを組み合わせたシステムを説明するための図である。It is a figure for demonstrating the system which combined the acoustic signal input / output device of the analog output of this invention, and the human authentication system incorporating the analog speech recognition apparatus.

Explanation of symbols

１音響信号入力手段
１ａ、１ｂ、１ｃ、１ｄマイクロフォン
２Ａ／Ｄコンバータ
３音源定位手段
４サンプリング・データ・バッファ
５ａ合成演算器
５ｂＸ方向空間勾配演算器
５ｃＹ方向空間勾配演算器
６、８、９ローパスフィルタ
７時間微分演算器
１０音源角度演算器
１１音源角度判定手段
１２音声検出手段
１３線形予測分析器
１４ＰＡＲＣＯＲ分析フィルタ
１５ローパスフィルタ
１６平均振幅差関数演算器
１７極小点検出器
１８ピッチ抽出器
１９音声区間判定器
２０、２１ゲート回路
２１ａ出力ゲートスイッチ
２１ｂＡＮＤ論理回路
３０、４０音響信号入出力装置
Ａ、Ｂ、Ｃ音源
Ｄａ、Ｄｂ、Ｄｃ、Ｄｄデジタル音響信号
Ｓａ、Ｓｂ、Ｓｃ、Ｓｄアナログ音響信号
Ｅ１音源角度判定手段の出力の論理信号
Ｅ２音声検出手段の出力の論理信号
Ｅ３ＡＮＤ論理回路のＡＮＤ論理出力
δ、δ´ 予め設定された角度範囲
ｄ所定間隔
θ１円錐角
ｆ合成音場
ｆｔ時間勾配
ｆｘＸ方向の空間勾配
ｆｙＹ方向の空間勾配
DESCRIPTION OF SYMBOLS 1 Acoustic signal input means 1a, 1b, 1c, 1d Microphone 2 A / D converter 3 Sound source localization means 4 Sampling data buffer 5a Synthesis | combination calculator 5b X direction space gradient calculator 5c Y direction space gradient calculator 6, 8, DESCRIPTION OF SYMBOLS 9 Low-pass filter 7 Time differentiation calculator 10 Sound source angle calculator 11 Sound source angle determination means 12 Speech detection means 13 Linear prediction analyzer 14 PARCOR analysis filter 15 Low-pass filter 16 Average amplitude difference function calculator 17 Minimum point detector 18 Pitch extractor 19 Voice section decision device 20, 21 Gate circuit 21a Output gate switch 21b AND logic circuit 30, 40 Acoustic signal input / output devices A, B, C Sound source Da, Db, Dc, Dd Digital acoustic signal Sa, Sb, Sc, Sd Analog Acoustic signal E1 Logic signal output from sound source angle determination means E2 Logic signal output from voice detection means E3 AND logic output from AND logic circuit δ, δ ′ Pre-set angle range d Predetermined interval θ1 Cone angle f Synthetic sound field ft Time gradient fx Spatial gradient in X direction fy Y direction Spatial gradient

Claims

An acoustic signal input means formed by combining a plurality of microphones with a predetermined separation, an A / D converter for converting an analog acoustic signal of each microphone obtained by the acoustic signal input means into a digital acoustic signal, and the A / D Sound source localization means for detecting the direction of the sound source by analyzing the digital sound signal of each microphone converted by the converter, and the sound source localized by the sound source localization means from a direction within a preset angle range Sound source angle determining means for determining whether the sound source is sound source; sound detecting means for determining whether the sound source is a human voice by analyzing the digital acoustic signal of the microphone converted by the A / D converter; The sound source obtained by the acoustic signal input means is in the direction within the preset angle range and includes the human voice only. Acoustic signal output device, characterized in that it comprises a gate circuit for outputting a barrel acoustic signal.

An acoustic signal input means formed by combining a plurality of microphones with a predetermined separation, an A / D converter for converting an analog acoustic signal of each microphone obtained by the acoustic signal input means into a digital acoustic signal, and the A / D Sound source localization means for detecting the direction of the sound source by analyzing the digital sound signal of each microphone converted by the converter, and the sound source localized by the sound source localization means from a direction within a preset angle range Sound source angle determining means for determining whether the sound source is sound source; sound detecting means for determining whether the sound source is a human voice by analyzing the digital acoustic signal of the microphone converted by the A / D converter; Only when the sound source obtained by the acoustic signal input means is in a direction within the preset angle range and includes human speech. Acoustic signal output device, characterized in that it comprises a gate circuit which outputs the log acoustic signal.