JPH04318900A - Multidirectional simultaneous sound collection type voice recognizing method - Google Patents

Multidirectional simultaneous sound collection type voice recognizing method

Info

Publication number
JPH04318900A
JPH04318900A JP3086645A JP8664591A JPH04318900A JP H04318900 A JPH04318900 A JP H04318900A JP 3086645 A JP3086645 A JP 3086645A JP 8664591 A JP8664591 A JP 8664591A JP H04318900 A JPH04318900 A JP H04318900A
Authority
JP
Japan
Prior art keywords
voice
identification
recognition
section
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP3086645A
Other languages
Japanese (ja)
Other versions
JP3163109B2 (en
Inventor
Takashi Miki
三木 敬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oki Electric Industry Co Ltd
Original Assignee
Oki Electric Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oki Electric Industry Co Ltd filed Critical Oki Electric Industry Co Ltd
Priority to JP08664591A priority Critical patent/JP3163109B2/en
Publication of JPH04318900A publication Critical patent/JPH04318900A/en
Application granted granted Critical
Publication of JP3163109B2 publication Critical patent/JP3163109B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Abstract

PURPOSE:To obtain stable recognition performance even in environment wherein the distance and direction between the voicing windpipe and a microphone change and background noise environment changes. CONSTITUTION:Voices which are collected through plural microphones at the same time are inputted from input terminals 101, 102, and 103 and passed through voice analysis parts 104, 105, and 106 and voice section detection part 107, 108, and 109, and comparison pattern memory parts 110, 111, and 112 are referred to, so that voice identification parts 113, 114, and 115 recognize them independently of one another. A total decision part 116 totally decides the results of the independent recognition and identification auxiliary information (identification accuracy, start and end times of voice, and signal-to-noise ratio) and outputs the final recognition result to an output terminal 117.

Description

【発明の詳細な説明】[Detailed description of the invention]

【0001】0001

【産業上の利用分野】この発明は多方向からの音声を同
時収音して認識する多方向同時収音式音声認識方法に関
するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multidirectional simultaneous sound recognition method for simultaneously collecting and recognizing sounds from multiple directions.

【0002】0002

【従来の技術】音声認識装置はコンピューター、その他
の種々の機器の有力な入力手段として利用され始めてい
る。図4は特開昭62−73298号公報に示されるよ
うな従来の認識装置の典型例のブロック図であり、40
1は音声入力端子、402は音声分析部、403は音声
区間検出部、404は比較マッチングパタンメモリー部
、405は類似度計算部、406は判定部である。
2. Description of the Related Art Speech recognition devices are beginning to be used as powerful input means for computers and various other devices. FIG. 4 is a block diagram of a typical example of a conventional recognition device as shown in Japanese Unexamined Patent Publication No. 62-73298.
1 is a voice input terminal, 402 is a voice analysis section, 403 is a voice section detection section, 404 is a comparison matching pattern memory section, 405 is a similarity calculation section, and 406 is a determination section.

【0003】音声入力端子401から入力された音声信
号は音声分析部402において音声特徴を表す特徴ベク
トルの時系列に変換される。
[0003] An audio signal input from an audio input terminal 401 is converted into a time series of feature vectors representing audio features in an audio analysis section 402 .

【0004】音声区間検出部403では音声分析部40
2からの特徴ベクトルに基づいて音声の存在する区間、
即ち音声区間を決定する。この音声始端から音声終端ま
での特徴ベクトル系列を音声マッチングパタンと呼ぶ。
[0004] In the speech section detection section 403, the speech analysis section 40
2. Based on the feature vector from 2, the section where the voice exists,
That is, the voice section is determined. This feature vector sequence from the beginning of the voice to the end of the voice is called a voice matching pattern.

【0005】次に比較マッチングパタンメモリー部40
4の機能について説明する。話者を限定する特定話者音
声認識では、認識対象となる単語(カテゴリと称する)
を予め発声し(学習音声と称する)、同一の音声分析を
施して求められた音声マッチングパタン(比較マッチン
グパタンと称する)を格納しておく必要がある。比較パ
タンメモリー部404にはこのような比較マッチングパ
タンが格納されている。この比較マッチングパタンの格
納動作を登録処理と呼ぶ。又話者を限定しない音声認識
(不特定話者音声認識と称する)の場合には、種々の話
者の比較マッチングパタンが比較パタンメモリー部40
4に予め格納されている。
Next, a comparison matching pattern memory section 40
Function 4 will be explained. In speaker-specific speech recognition that limits speakers, words to be recognized (referred to as categories)
It is necessary to utter in advance (referred to as a learning voice) and store a voice matching pattern (referred to as a comparison matching pattern) obtained by performing the same voice analysis. The comparison pattern memory section 404 stores such comparison matching patterns. This comparison matching pattern storage operation is called registration processing. In the case of speech recognition that does not limit speakers (referred to as speaker-independent speech recognition), comparison matching patterns of various speakers are stored in the comparison pattern memory section 40.
4 is stored in advance.

【0006】類似度計算部405では認識させようとす
る入力音声から生成された入力マッチングパタンと比較
マッチングパタンとの類似度計算を行う。類似度計算は
公知の手法であるDPマッチングや特開昭62−732
99号公報に示されるような簡便な線形マッチングをは
じめとして種々の方法が提案されており何れかの適切な
方法で類似度計算が行われる。
[0006] The similarity calculation unit 405 calculates the similarity between the input matching pattern generated from the input speech to be recognized and the comparison matching pattern. Similarity calculations are performed using well-known methods such as DP matching and Japanese Patent Application Laid-Open No. 62-732.
Various methods have been proposed, including a simple linear matching method as shown in Japanese Patent No. 99, and similarity calculation is performed using any suitable method.

【0007】この類似度計算部405から出力されるカ
テゴリ毎の類似度を用いて、判定部406ではその最大
類似度を与える比較マッチングパタンに与えられたカテ
ゴリ名を認識結果として出力する。
Using the degree of similarity for each category outputted from the similarity degree calculation section 405, the determination section 406 outputs the category name given to the comparison matching pattern that gives the maximum degree of similarity as a recognition result.

【0008】[0008]

【発明が解決しようとする課題】上述した従来の認識装
置では、1本のマイクから音声を収音しているために、
接話形マイクのような発声器官とマイクとの距離、方向
が一定している入力形態では、常にその音声認識装置が
持つ認識能力が最大限発揮出来ていた。しかしながら、
発声器官とマイクとの距離、方向が大きく変化するよう
な入力形態では、認識性能が極端に低下する事が多々有
った。更に接話形マイクを使用した場合でも、周囲の背
景雑源が大きく変化するような環境では、やはり認識性
能が安定せず、誤動作や誤認識が起こる場合があった。 この発明の目的は、発声器官とマイクとの距離、方向が
大きく変化し、且つ背景雑音環境が大きく変化するよう
な環境下でも安定した高い認識性能が得られる音声認識
方法を提供するものである。
[Problems to be Solved by the Invention] In the conventional recognition device described above, since the sound is collected from one microphone,
In an input mode such as a close-talk type microphone in which the distance and direction between the vocal organ and the microphone are constant, the recognition ability of the speech recognition device is always maximized. however,
In input formats in which the distance and direction between the vocal organ and the microphone change significantly, recognition performance often deteriorates dramatically. Furthermore, even when a close-talking microphone is used, recognition performance is still unstable in an environment where the surrounding background noise changes significantly, and malfunctions and erroneous recognition may occur. An object of the present invention is to provide a speech recognition method that can obtain stable and high recognition performance even under environments where the distance and direction between the vocal organ and the microphone change greatly, and the background noise environment changes greatly. .

【0009】[0009]

【課題を解決するための手段】この目的の達成を図るた
めの本音声認識方法は、第一の構成として、複数のマイ
クから同時に音声を収音し、複数の入力信号を得る処理
と、前記複数の入力信号を各々独立に音声識別して複数
の識別結果を得る処理と、前記複数の識別結果を統合判
定する処理とを備えたことを特徴とする。
[Means for Solving the Problems] The present speech recognition method for achieving this object includes, as a first configuration, a process of simultaneously collecting sounds from a plurality of microphones to obtain a plurality of input signals; The present invention is characterized by comprising a process of independently voice-identifying a plurality of input signals to obtain a plurality of identification results, and a process of performing an integrated judgment on the plurality of identification results.

【0010】また本音声認識方法は、第二の構成として
、前記音声認識方法により確定した認識結果に基づいて
、各入力系における当該音声信号/背景雑音比が最も良
い主入力系と、当該音声信号/背景雑音比の最も悪いノ
イズ入力系とを決定する処理と、主入力系とノイズ入力
系より適応ノイズ除去フィルタを構成する処理とを備え
、以降の認識処理では前記適応ノイズ除去フィルタリン
グ後の入力信号より音声識別することを特徴とする。
[0010] Furthermore, the present speech recognition method has a second configuration in which the main input system with the best speech signal/background noise ratio in each input system and the speech signal are selected based on the recognition results determined by the speech recognition method. It includes a process of determining the noise input system with the worst signal/background noise ratio, and a process of configuring an adaptive noise removal filter from the main input system and the noise input system. It is characterized by identifying voices from input signals.

【0011】[0011]

【作用】第一の構成は、複数のマイクから同時に収音し
た複数の入力信号に対してそれぞれ独立に識別動作を行
い、この複数の識別結果及び認識補助情報を統合判定部
で総合的に判定することにより認識判定を行うものであ
り、また第二の構成は第一の構成による認識判定により
確定した認識結果に基づいて適応ノイズ除去フィルタを
構成し、以降はこのフィルタリング処理後の入力信号よ
り認識判定を行うもので、上記構成によれば、雑音発声
源の位置を考慮して最も信号・雑音比の良い入力信号に
より音声認識を行い得るものである。従って、本発明の
音声認識方法を用いれば発声器官とマイクとの距離、方
向が大きく変化し、且つ背景雑音環境が大きく変化する
ような環境下でも安定した高い認識性能が得られる音声
認識装置を実現出来る。
[Operation] The first configuration performs identification operations independently for multiple input signals picked up simultaneously from multiple microphones, and comprehensively determines the multiple identification results and recognition auxiliary information by the integrated determination unit. The second configuration configures an adaptive noise removal filter based on the recognition result determined by the recognition determination using the first configuration. According to the above configuration, speech recognition can be performed using an input signal with the best signal-to-noise ratio in consideration of the position of the noise source. Therefore, by using the speech recognition method of the present invention, it is possible to create a speech recognition device that can obtain stable and high recognition performance even in environments where the distance and direction between the vocal organ and the microphone change greatly, and where the background noise environment changes greatly. It can be achieved.

【0012】0012

【実施例】以下、本発明の実施例について述べる。なお
、ここでは音声マッチングパタンとは、入力マッチング
パタンと比較マッチングパタンとに共通した生成過程で
作られるパタンを意味している。
[Examples] Examples of the present invention will be described below. Note that the speech matching pattern here means a pattern created in a common generation process for the input matching pattern and the comparison matching pattern.

【0013】図1は登録動作を行う特定話者認識方式に
適用した本発明の第一の実施例を示すブロック図であっ
て、101,102,103はそれぞれ第1チャンネル
、第2チャンネル、第3チャンネルの音声入力端子、1
04,105,106はそれぞれ第1チャンネル、第2
チャンネル、第3チャンネルの音声分析部、107,1
08,109はそれぞれ第1チャンネル、第2チャンネ
ル、第3チャンネルの音声区間検出部、110,111
,112はそれぞれ第1チャンネル、第2チャンネル、
第3チャンネルの比較パタンメモリー部、113,11
4,115はそれぞれ第1チャンネル、第2チャンネル
、第3チャンネルの音声識別部、116は統合判定部、
117は出力端子である。
FIG. 1 is a block diagram showing a first embodiment of the present invention applied to a specific speaker recognition system that performs a registration operation. 3 channel audio input terminal, 1
04, 105, 106 are the first channel and the second channel, respectively.
Channel, 3rd channel audio analysis section, 107,1
08 and 109 are audio section detection units for the first channel, second channel, and third channel, respectively; 110 and 111;
, 112 are the first channel, the second channel,
3rd channel comparison pattern memory section, 113, 11
4 and 115 are audio identification units for the first channel, second channel, and third channel, respectively; 116 is an integrated determination unit;
117 is an output terminal.

【0014】尚第一の実施例では説明を簡略化するため
に音声入力信号として3回線の例を示しているが、更に
多くの回線数を設けても何等差し支えない。又ここで1
01,104,107,110,113の各機能ブロッ
クをまとめて第1チャンネル音声処理部と呼ぶ。同様に
102,105,108,111,114の各機能ブロ
ックをまとめて第2チャンネル音声処理部と呼び、10
3,106,109,112,115の各機能ブロック
をまとめて第3チャンネル音声処理部と呼ぶ。
Although the first embodiment shows an example of three lines as the audio input signal to simplify the explanation, there is no problem in providing a larger number of lines. Again here 1
The functional blocks 01, 104, 107, 110, and 113 are collectively referred to as a first channel audio processing section. Similarly, the functional blocks 102, 105, 108, 111, and 114 are collectively called the second channel audio processing section.
The functional blocks 3, 106, 109, 112, and 115 are collectively referred to as a third channel audio processing section.

【0015】先ず第1チャンネルに入力された音声信号
の識別処理を説明する。音声入力端子101から入力さ
れた音声信号は音声分析部104において音声の特徴を
表す特徴ベクトルの時系列に変換される。特徴ベクトル
の導出方法には、中心周波数が少しずつ異なる複数のバ
ンドパス群を用いる方法や、FFT(高速フーリエ変換
)よるスペクトル分析を用いるもの等々考えられるが、
ここではバンドパスフィルタ群を使用する方法を例に挙
げる。
First, the identification process of the audio signal input to the first channel will be explained. An audio signal input from the audio input terminal 101 is converted into a time series of feature vectors representing the characteristics of the audio in the audio analysis unit 104. Possible methods for deriving feature vectors include using multiple bandpass groups with slightly different center frequencies, and using spectral analysis using FFT (fast Fourier transform).
Here, a method using a group of bandpass filters will be exemplified.

【0016】音声信号は音声分析部104にてアナログ
・デジタル変換された後、各バンドパスフィルタによっ
てその周波数成分のみを抽出する。この様にして各バン
ドパスフィルタによって振り分けられたデータの系列を
チャネルと称する。各チャネル毎のフィルタの出力に対
して整流して絶対値を取りフレーム単位でその平均値を
算出する。この算出値がそのフレームにおける各チャネ
ルの特徴ベクトルの大きさになる。即ち、i番目のフレ
ームにおける特徴ベクトルの大きさAijは、Aij=
(  Ai1,  Ai2,・・・,Aip)となる。 ここでpはチャネル数である。
After the audio signal is analog-to-digital converted by the audio analysis section 104, only its frequency components are extracted by each bandpass filter. A series of data distributed by each bandpass filter in this manner is called a channel. The output of the filter for each channel is rectified, the absolute value is taken, and the average value is calculated for each frame. This calculated value becomes the size of the feature vector of each channel in that frame. That is, the size Aij of the feature vector in the i-th frame is Aij=
(Ai1, Ai2, ..., Aip). Here p is the number of channels.

【0017】音声区間検出部107では音声分析部10
4からの特徴ベクトルに基づいて音声の存在する区間、
即ち音声区間を決定する。
In the speech section detection section 107, the speech analysis section 10
4. Based on the feature vector from 4, the section where the voice exists,
That is, the voice section is determined.

【0018】比較パタンメモリー部110には認識対象
カテゴリの比較マッチングパタンが格納されており、こ
の比較マッチングパタンの格納動作を登録処理と呼ぶ。 ここでは登録処理の例として、説明の簡単化のため1カ
テゴリ当り1回の学習音声を発声する場合を取り上げる
。カテゴリの総数をNとした場合、比較パタンメモリに
はN個の比較マッチングパタンSln(n=1,2,・
・・,N)が格納される。
Comparison matching patterns of categories to be recognized are stored in the comparison pattern memory section 110, and the operation of storing the comparison matching patterns is called registration processing. Here, as an example of the registration process, for the purpose of simplifying the explanation, we will take up a case where the learning voice is uttered once per category. When the total number of categories is N, the comparison pattern memory contains N comparison matching patterns Sln (n=1, 2, .
..., N) are stored.

【0019】音声のマッチングパタン同士の比較では、
両者の時間的な対応をとる必要がある。最適な対応をと
りながら両者のパタン間の類似度を算出する代表的な方
法に特公昭50−23941号に示されている様な通称
DPマッチング方法がある。音声識別部113では、こ
の様なDPマッチング法もしくはその他の好適な方法を
用いてパタン間の類似度計算を行う。即ち、認識させよ
うとする入力音声から生成された入力マッチングパタン
I1と比較パタンメモリー部110中の全ての比較マッ
チングパタンSlnとの類似度を求め、音声特徴類似度
Xlnを得る。
In comparing audio matching patterns,
It is necessary to accommodate both parties in terms of time. A typical method for calculating the similarity between two patterns while taking an optimal correspondence is the so-called DP matching method as shown in Japanese Patent Publication No. 50-23941. The speech identification unit 113 calculates the similarity between patterns using such a DP matching method or other suitable method. That is, the degree of similarity between the input matching pattern I1 generated from the input voice to be recognized and all the comparison matching patterns Sln in the comparison pattern memory section 110 is determined to obtain the degree of voice feature similarity Xln.

【0020】このパタン毎の音声特徴類似度Xlnの内
、その最大値P1を与える比較マッチングパタンに与え
られたカテゴリ名C1をそのチャンネルにおける音声識
別結果として統合判定部116へ出力する。更に識別補
助情報として、音声特徴類似度の最大値P1(識別確度
と称する)、音声の開始終了時刻VS1,VE1、音声
レベル(音声区間の信号レベル)と雑音レベル(音声区
間直前の無入力状態での信号レベル)の比SN1(信号
/雑音レシオと称する)等も統合判定部116へ出力す
る。
Among the voice feature similarities Xln for each pattern, the category name C1 given to the comparison matching pattern that gives the maximum value P1 is output to the integrated determination section 116 as the voice identification result for that channel. Furthermore, as identification auxiliary information, the maximum value P1 of voice feature similarity (referred to as identification accuracy), voice start and end times VS1, VE1, voice level (signal level of voice section) and noise level (no input state immediately before voice section) The ratio SN1 (referred to as signal/noise ratio) of the signal level at

【0021】第2チャンネルに入力された音声信号の識
別処理も上述した第1チャンネルの場合と全く同様にし
て処理を行う。即ち音声入力端子102から入力された
音声信号は音声分析部105において音声特徴を表す特
徴ベクトルの時系列に変換される。音声区間検出部10
8では音声分析部105からの特徴ベクトルに基づいて
音声区間を決定し、その部分の特徴ベクトル系列、即ち
入力マッチングパタンI2が生成される。次に音声識別
部114で入力マッチングパタンI2と比較パタンメモ
リー部111中の全ての比較マッチングパタンS2nと
の類似度を求め、音声特徴類似度X2nを得る。
The identification process for the audio signal input to the second channel is performed in exactly the same manner as in the case of the first channel described above. That is, the audio signal input from the audio input terminal 102 is converted into a time series of feature vectors representing audio features in the audio analysis section 105. Voice section detection unit 10
In step 8, a speech section is determined based on the feature vectors from the speech analysis section 105, and a feature vector series for that part, that is, an input matching pattern I2 is generated. Next, the voice identification section 114 determines the degree of similarity between the input matching pattern I2 and all comparison matching patterns S2n in the comparison pattern memory section 111, and obtains the degree of voice feature similarity X2n.

【0022】このパタン毎の音声特徴類似度X2nの内
、その最大値を与える比較マッチングパタンに与えられ
たカテゴリ名C2をそのチャンネルにおける音声識別結
果として統合判定部116へ出力する。更に識別補助情
報として、識別確度P2、音声の開始終了時刻VS2,
VE2、信号/雑音レシオSN2等も統合判定部116
へ出力する。
Of the voice feature similarities X2n for each pattern, the category name C2 given to the comparison matching pattern that gives the maximum value is output to the integrated determination section 116 as the voice identification result for that channel. Furthermore, as identification auxiliary information, identification accuracy P2, audio start and end time VS2,
VE2, signal/noise ratio SN2, etc. are also integrated in the judgment unit 116.
Output to.

【0023】第3チャンネルに入力された音声信号の識
別処理も上述した第1チャンネルの場合と全く同様に行
われ、統合判定部116へ音声識別結果C3、と識別補
助情報(識別確度P3、音声の開始終了時刻VS3,V
E3、信号/雑音レシオSN3等)が出力される。
The identification process for the audio signal input to the third channel is performed in exactly the same way as for the first channel described above, and the integrated determination unit 116 is sent the audio identification result C3 and the identification auxiliary information (identification accuracy P3, audio start and end time VS3,V
E3, signal/noise ratio SN3, etc.) are output.

【0024】統合判定部116では各チャンネルの音声
処理部から出力された音声識別結果、及び識別補助情報
を総合して判定し、最終認識結果を出力端子117より
外部ホストなどに送出する。この判定方法については種
々考えられるが、音声始端、終端時刻がほぼ同じである
識別情報群について、その過半数の識別結果が一致した
場合のみ、その識別結果を有効とする方法や、識別確度
Pが最も高い識別結果を有効と定める方法、又信号/雑
音レシオSNが最も高いチャンネルの識別結果を有効と
する方法等が挙げられる。上述した判定法もしくはその
他の好適な方法を用いて最終判定結果を算出すればよい
[0024] The integrated determination unit 116 performs a comprehensive determination on the voice recognition results output from the voice processing units of each channel and the identification auxiliary information, and sends the final recognition result to an external host or the like from an output terminal 117. Various methods can be considered for this determination, such as a method in which the identification results are valid only when a majority of the identification results match for a group of identification information with almost the same voice start and end times, and a method in which the identification accuracy P is Examples include a method in which the highest identification result is determined to be valid, and a method in which the identification result of the channel with the highest signal/noise ratio SN is determined to be valid. The final determination result may be calculated using the determination method described above or other suitable method.

【0025】図2は本発明の第二の実施例を示すブロッ
ク図である。第二の実施例では3つの音声入力信号20
1,202,203に対して音声分析部204、音声区
間検出部207、音声識別部213を時分割に動作させ
てそれぞれ独立に音声識別を行い、音声識別結果C1,
C2,C3と識別補助情報(識別確度P1,P2,P3
、音声の開始時刻VS1,VS2,VS3、音声の終了
時刻VE1,VE2,VE3、信号/雑音レシオSN1
,SN2,SN3)を求める。また比較パタンメモリー
部210は共通とする。総合判定部216の処理は第一
の実施例と同様であり、最終認識結果を出力端子217
より外部ホストなどに送出する。
FIG. 2 is a block diagram showing a second embodiment of the present invention. In the second embodiment, three audio input signals 20
1, 202, and 203, the speech analysis section 204, speech section detection section 207, and speech identification section 213 are operated in a time-sharing manner to independently perform speech identification, and the speech identification results C1,
C2, C3 and identification auxiliary information (identification accuracy P1, P2, P3
, audio start time VS1, VS2, VS3, audio end time VE1, VE2, VE3, signal/noise ratio SN1
, SN2, SN3). Further, the comparison pattern memory section 210 is common. The processing of the comprehensive judgment unit 216 is the same as that of the first embodiment, and the final recognition result is sent to the output terminal 217.
Send it to an external host, etc.

【0026】図3は本発明の第三の実施例を示すブロッ
ク図である。第三の実施例の動作は雑音適応処理モード
と未雑音適応処理モードに分けられる。未雑音適応処理
モードでは適応ノイズ除去部318を外した状態となり
、認識処理の動作は第二の実施例と全く同様である。 雑音適応処理モードには未雑音適応処理モードで音声認
識を実行した後移行する。雑音適応処理モードでは認識
に先立って以下の処理を行う。
FIG. 3 is a block diagram showing a third embodiment of the present invention. The operation of the third embodiment is divided into a noise adaptive processing mode and a non-noise adaptive processing mode. In the no-noise adaptive processing mode, the adaptive noise removal section 318 is removed, and the recognition processing operation is exactly the same as in the second embodiment. The noise adaptive processing mode is entered after speech recognition is performed in the non-noise adaptive processing mode. In the noise adaptive processing mode, the following processing is performed prior to recognition.

【0027】直前の認識時の音声識別結果C1,C2,
C3と識別補助情報(識別確度P1,P2,P3、音声
の開始時刻VS1,VS2,VS3、音声の終了時刻V
E1,VE2,VE3、信号/雑音レシオSN1,SN
2,SN3)及び最終確認後の認識結果Cr(外部ホス
トから送出される)から主入力系、ノイズ入力系を決定
する。決定アルゴリズムは以下の通りである。
[0027] Speech identification results C1, C2, at the time of immediately preceding recognition,
C3 and identification auxiliary information (identification accuracy P1, P2, P3, audio start time VS1, VS2, VS3, audio end time V
E1, VE2, VE3, signal/noise ratio SN1, SN
2, SN3) and the recognition result Cr after final confirmation (sent from the external host), the main input system and noise input system are determined. The decision algorithm is as follows.

【0028】(1)最終確認後の認識結果Crと一致す
る音声識別結果が出力された入力系を正解入力系として
選択する。もしそのような入力系がなければ雑音適応モ
ードへは移行せず次回の認識処理も未雑音適応モードで
行う。 (2)正解入力系中で最も信号/雑音レシオSNが高い
入力系を主入力系とする。 (3)主入力系以外の入力系の内で最も信号/雑音レシ
オSNが低い入力系をノイズ入力系とする。
(1) The input system that outputs the voice recognition result that matches the recognition result Cr after final confirmation is selected as the correct input system. If there is no such input system, the next recognition process will be performed in the non-noise adaptive mode without shifting to the noise adaptive mode. (2) The input system with the highest signal/noise ratio SN among the correct input systems is set as the main input system. (3) Among the input systems other than the main input system, the input system with the lowest signal/noise ratio SN is set as the noise input system.

【0029】次に雑音適応処理モードでの認識処理に付
いて説明する。先ず適応ノイズ除去部318で主入力系
の入力信号とノイズ入力系の入力信号より適応ノイズ除
去フィルタリングを行いノイズ除去音声信号を生成する
。この後はノイズ除去音声信号に対して第一の実施例と
同様に音声識別を行う。即ちノイズ除去音声信号は音声
分析部304において特徴を表す特徴ベクトルの時系列
に変換される。音声区間検出部307では音声分析部3
04からの特徴ベクトルに基づいて音声区間を決定し、
その部分の特徴ベクトル系列、即ち入力マッチングパタ
ンIが生成される。次に音声識別部313で入力マッチ
ングパタンIと比較パタンメモリー部310中の全ての
比較マッチングパタンSnとの類似度を求め、音声特徴
類似度Xnを得る。このパタン毎の音声特徴類似度Xn
の内、その最大値を与える比較マッチングパタンに与え
られたカテゴリ名Cを音声識別結果として統合判定部3
16へ出力する。統合判定部316ではカテゴリ名Cを
そのまま最終認識結果として出力端子317より外部ホ
ストなどに送出する。
Next, recognition processing in the noise adaptive processing mode will be explained. First, the adaptive noise removal section 318 performs adaptive noise removal filtering on the input signal of the main input system and the input signal of the noise input system to generate a noise removed audio signal. After this, voice identification is performed on the noise-removed voice signal in the same manner as in the first embodiment. That is, the noise-removed audio signal is converted by the audio analysis unit 304 into a time series of feature vectors representing features. In the speech section detection section 307, the speech analysis section 3
Determine the speech interval based on the feature vector from 04,
A feature vector series for that part, ie, an input matching pattern I, is generated. Next, the voice identification section 313 determines the degree of similarity between the input matching pattern I and all comparison matching patterns Sn in the comparison pattern memory section 310 to obtain the degree of voice feature similarity Xn. Voice feature similarity for each pattern Xn
The integrated determination unit 3 uses the category name C given to the comparison matching pattern that gives the maximum value as the voice recognition result.
Output to 16. The integrated determination unit 316 sends the category name C as it is to an external host or the like from the output terminal 317 as the final recognition result.

【0030】更に認識結果の確認処理後、ホストからの
認識結果の正誤情報を受け取る。この時認識結果が誤り
であれば、主入力系、ノイズ入力系の選定が雑音環境の
変化などにより不適切となったと考えられるので、未雑
音適応処理モードに戻り、主入力系、ノイズ入力系の選
定をやり直す。逆に、認識結果が正しければ全ての設定
は適切であると考えられるので雑音適応処理モードでの
認識を続行する。なお、この認識結果による処理モード
の選択は一例であり、例えば使用者が処理モードを選択
するようにしても良い。また誤認識の頻度で処理モード
変更の判定を行う方法も考えられる。図5は上述の第3
の実施例の動作フローについて示したものである。
Furthermore, after the recognition result is confirmed, information on whether the recognition result is correct or incorrect is received from the host. If the recognition result is incorrect at this time, it is considered that the selection of the main input system and noise input system has become inappropriate due to changes in the noise environment, so the main input system and noise input system are Redo the selection. On the other hand, if the recognition result is correct, all settings are considered appropriate, and recognition in the noise adaptive processing mode is continued. Note that the selection of the processing mode based on the recognition result is just one example, and the user may select the processing mode, for example. Another possible method is to determine whether to change the processing mode based on the frequency of misrecognitions. Figure 5 shows the third
This figure shows the operational flow of the embodiment.

【0031】尚適応ノイズ除去フィルタリング処理につ
いては例えば日本音響学会誌45巻2号(1989)の
講座「マイクロホン系におけるディジタルフィルタの応
用」及びそこに記載のある参考文献などで明らかである
からここでは説明を省略する。但し音声区間検出部30
7で音声始端が検出された時点から音声終端検出時点ま
では適応動作を停止し(図5参照)、適応ノイズ除去部
318の適応除去フィルタの係数は固定しておくものと
する。
[0031] The adaptive noise removal filtering process is clear, for example, in the course ``Applications of digital filters in microphone systems'' in the Journal of the Acoustical Society of Japan, Vol. 45, No. 2 (1989), and the references therein, so it will not be described here. The explanation will be omitted. However, the voice section detection unit 30
It is assumed that the adaptive operation is stopped from the time when the voice start point is detected in step 7 until the voice end point is detected (see FIG. 5), and the coefficients of the adaptive removal filter of the adaptive noise removal section 318 are fixed.

【0032】[0032]

【発明の効果】以上詳細に説明したように本発明によれ
ば、複数のマイクから同時に収音した入力信号に対して
それぞれ独立に識別動作を行い、この複数の認識結果を
用いて総合的に認識判定を行うため、また第三の実施例
にも示すように未雑音適応処理モードで得られた認識結
果に基づいて適応ノイズ除去フィルタを構成し、雑音適
応処理モードに移行後はノイズ除去音声信号により認識
判定を行うようにしたので、発声器官とマイクとの距離
、方向が大きく変化し、且つ背景雑音環境が大きく変化
するような環境下でも安定した高い認識性能が得られる
という利点がある。例えば、自動車内、オフィス、工場
、街頭等、騒音源の位置、大きさが不規則に変化するよ
うな場所で音声認識装置を使用する場合、本発明による
適応動作を行うことで認識性能を著しく向上することが
できる。
[Effects of the Invention] As explained in detail above, according to the present invention, the input signals picked up simultaneously from a plurality of microphones are each independently discriminated, and the plurality of recognition results are used to comprehensively perform the discrimination operation. In order to perform recognition judgment, and as shown in the third embodiment, an adaptive noise removal filter is configured based on the recognition result obtained in the no-noise adaptive processing mode, and after shifting to the noise-adaptive processing mode, the noise-removed voice is Since recognition is determined based on signals, the advantage is that stable and high recognition performance can be obtained even in environments where the distance and direction between the vocal organs and the microphone change significantly, and where the background noise environment changes significantly. . For example, when using a speech recognition device in a place where the position and size of the noise source change irregularly, such as in a car, office, factory, or on the street, the adaptive operation according to the present invention can significantly improve recognition performance. can be improved.

【図面の簡単な説明】[Brief explanation of drawings]

【図1】本発明の第一の実施例を示すブロック図である
FIG. 1 is a block diagram showing a first embodiment of the present invention.

【図2】本発明の第二の実施例を示すブロック図である
FIG. 2 is a block diagram showing a second embodiment of the invention.

【図3】本発明の第三の実施例を示すブロック図である
FIG. 3 is a block diagram showing a third embodiment of the present invention.

【図4】従来の認識装置のブロック図である。FIG. 4 is a block diagram of a conventional recognition device.

【図5】第三の実施例の動作フローを示す図である。FIG. 5 is a diagram showing an operation flow of a third embodiment.

【符号の説明】[Explanation of symbols]

101    第1チャンネルの音声入力端子102 
   第2チャンネルの音声入力端子103    第
3チャンネルの音声入力端子104    第1チャン
ネルの音声分析部105    第2チャンネルの音声
分析部106    第3チャンネルの音声分析部10
7    第1チャンネルの音声区間検出部108  
  第2チャンネルの音声区間検出部109    第
3チャンネルの音声区間検出部110    第1チャ
ンネルの比較パタンメモリー部111    第2チャ
ンネルの比較パタンメモリー部112    第3チャ
ンネルの比較パタンメモリー部113    第1チャ
ンネルの音声識別部114    第2チャンネルの音
声識別部115    第3チャンネルの音声識別部1
16    統合判定部 117    出力端子
101 First channel audio input terminal 102
2nd channel audio input terminal 103 3rd channel audio input terminal 104 1st channel audio analysis section 105 2nd channel audio analysis section 106 3rd channel audio analysis section 10
7 First channel audio section detection unit 108
2nd channel voice section detection section 109 3rd channel voice section detection section 110 1st channel comparison pattern memory section 111 2nd channel comparison pattern memory section 112 3rd channel comparison pattern memory section 113 1st channel voice Identification unit 114 Second channel audio identification unit 115 Third channel audio identification unit 1
16 Integrated determination unit 117 Output terminal

Claims (2)

【特許請求の範囲】[Claims] 【請求項1】  複数のマイクから同時に音声を収音し
、複数の入力信号を得る処理と、前記複数の入力信号を
各々独立に音声識別して複数の識別結果を得る処理と、
前記複数の識別結果を統合判定する処理とを備えたこと
を特徴とする音声認識方法。
1. A process of simultaneously collecting sounds from a plurality of microphones to obtain a plurality of input signals, and a process of independently identifying each of the plurality of input signals to obtain a plurality of identification results.
A speech recognition method comprising: a process of performing an integrated judgment on the plurality of identification results.
【請求項2】  請求項1記載の音声認識方法により確
定した認識結果に基づいて、各入力系における当該音声
信号/背景雑音比が最も良い主入力系と、当該音声信号
/背景雑音比の最も悪いノイズ入力系とを決定する処理
と、主入力系とノイズ入力系より適応ノイズ除去フィル
タを構成する処理とを備え、以降の認識処理では前記適
応ノイズ除去フィルタリング後の入力信号より音声識別
することを特徴とする音声認識方法。
2. Based on the recognition results determined by the speech recognition method according to claim 1, the main input system with the best speech signal/background noise ratio in each input system, and the main input system with the best speech signal/background noise ratio in each input system. It includes a process of determining a bad noise input system and a process of configuring an adaptive noise removal filter from the main input system and the noise input system, and in the subsequent recognition process, speech is identified from the input signal after the adaptive noise removal filtering. A speech recognition method characterized by:
JP08664591A 1991-04-18 1991-04-18 Multi-directional simultaneous voice pickup speech recognition method Expired - Lifetime JP3163109B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP08664591A JP3163109B2 (en) 1991-04-18 1991-04-18 Multi-directional simultaneous voice pickup speech recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP08664591A JP3163109B2 (en) 1991-04-18 1991-04-18 Multi-directional simultaneous voice pickup speech recognition method

Publications (2)

Publication Number Publication Date
JPH04318900A true JPH04318900A (en) 1992-11-10
JP3163109B2 JP3163109B2 (en) 2001-05-08

Family

ID=13892769

Family Applications (1)

Application Number Title Priority Date Filing Date
JP08664591A Expired - Lifetime JP3163109B2 (en) 1991-04-18 1991-04-18 Multi-directional simultaneous voice pickup speech recognition method

Country Status (1)

Country Link
JP (1) JP3163109B2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08263258A (en) * 1995-03-23 1996-10-11 Hitachi Ltd Input device, input method, information processing system and management method for input information
US7050974B1 (en) * 1999-09-14 2006-05-23 Canon Kabushiki Kaisha Environment adaptation for speech recognition in a speech communication system
JP2006171077A (en) * 2004-12-13 2006-06-29 Nissan Motor Co Ltd Device and method for voice recognition
KR100855592B1 (en) * 2007-01-11 2008-09-01 (주)에이치씨아이랩 Apparatus and method for robust speech recognition of speaker distance character
JP2010066360A (en) * 2008-09-09 2010-03-25 Hitachi Ltd Distributed speech recognition system
JP2012145636A (en) * 2011-01-07 2012-08-02 Mitsubishi Electric Corp Speech recognizer and speech recognizing method
JP2016536626A (en) * 2013-09-27 2016-11-24 アマゾン テクノロジーズ インコーポレイテッド Speech recognition with multi-directional decoding
JP2021015084A (en) * 2019-07-16 2021-02-12 Kddi株式会社 Sound source localization device and sound source localization method

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3510458B2 (en) 1997-09-05 2004-03-29 沖電気工業株式会社 Speech recognition system and recording medium recording speech recognition control program
WO2009145192A1 (en) * 2008-05-28 2009-12-03 日本電気株式会社 Voice detection device, voice detection method, voice detection program, and recording medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08263258A (en) * 1995-03-23 1996-10-11 Hitachi Ltd Input device, input method, information processing system and management method for input information
US7050974B1 (en) * 1999-09-14 2006-05-23 Canon Kabushiki Kaisha Environment adaptation for speech recognition in a speech communication system
JP2006171077A (en) * 2004-12-13 2006-06-29 Nissan Motor Co Ltd Device and method for voice recognition
JP4608670B2 (en) * 2004-12-13 2011-01-12 日産自動車株式会社 Speech recognition apparatus and speech recognition method
KR100855592B1 (en) * 2007-01-11 2008-09-01 (주)에이치씨아이랩 Apparatus and method for robust speech recognition of speaker distance character
JP2010066360A (en) * 2008-09-09 2010-03-25 Hitachi Ltd Distributed speech recognition system
JP2012145636A (en) * 2011-01-07 2012-08-02 Mitsubishi Electric Corp Speech recognizer and speech recognizing method
JP2016536626A (en) * 2013-09-27 2016-11-24 アマゾン テクノロジーズ インコーポレイテッド Speech recognition with multi-directional decoding
JP2021015084A (en) * 2019-07-16 2021-02-12 Kddi株式会社 Sound source localization device and sound source localization method

Also Published As

Publication number Publication date
JP3163109B2 (en) 2001-05-08

Similar Documents

Publication Publication Date Title
US7447634B2 (en) Speech recognizing apparatus having optimal phoneme series comparing unit and speech recognizing method
US5651094A (en) Acoustic category mean value calculating apparatus and adaptation apparatus
CA2366892C (en) Method and apparatus for speaker recognition using a speaker dependent transform
JP2768274B2 (en) Voice recognition device
JPH02238495A (en) Time series signal recognizing device
JPH11133992A (en) Feature extracting device and feature extracting method, and pattern recognizing device and pattern recognizing method
US5758021A (en) Speech recognition combining dynamic programming and neural network techniques
JP3163109B2 (en) Multi-directional simultaneous voice pickup speech recognition method
JPH0683388A (en) Speech recognition device
Al-Karawi Mitigate the reverberation effect on the speaker verification performance using different methods
Seltzer et al. Speech recognizer-based microphone array processing for robust hands-free speech recognition
KR20030010432A (en) Apparatus for speech recognition in noisy environment
JP2002123286A (en) Voice recognizing method
US5828998A (en) Identification-function calculator, identification-function calculating method, identification unit, identification method, and speech recognition system
JPS63502304A (en) Frame comparison method for language recognition in high noise environments
JP3437492B2 (en) Voice recognition method and apparatus
JP2004318026A (en) Security pet robot and signal processing method related to the device
CN112530452A (en) Post-filtering compensation method, device and system
JPH04324499A (en) Speech recognition device
CN110675890A (en) Audio signal processing device and audio signal processing method
JPH0442299A (en) Sound block detector
JP2975808B2 (en) Voice recognition device
JPH0316038B2 (en)
Seltzer et al. Parameter sharing in subband likelihood-maximizing beamforming for speech recognition using microphone arrays
JP3704080B2 (en) Speech recognition method, speech recognition apparatus, and speech recognition program

Legal Events

Date Code Title Description
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20010213

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090223

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090223

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100223

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110223

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110223

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120223

Year of fee payment: 11

EXPY Cancellation because of completion of term
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120223

Year of fee payment: 11