JPH0262879B2

JPH0262879B2 -

Info

Publication number: JPH0262879B2
Application number: JP57107871A
Authority: JP
Inventors: Kyoshi Tajima; Masayuki Iida; Hiroki Oonishi
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1982-06-22
Filing date: 1982-06-22
Publication date: 1990-12-26
Also published as: JPS58224398A

Description

【発明の詳細な説明】 (イ) 産業上の利用分野本発明は、音声を認識する音声認識装置に関す
る。DETAILED DESCRIPTION OF THE INVENTION (a) Field of Industrial Application The present invention relates to a speech recognition device that recognizes speech.

(ロ) 従来の技術音声を認識して何等かの機器を制御せんとした
場合、その殆どが認識動作に該当する応答時間が
常に問題になつて来る。応答時間を早めるには高
速動作の可能なCPUやメモリ或いはその周辺回
路を用いるのが一般に考えられる事であるが、シ
ステム全体が高価になる上に、それ程の高速化が
果されない。(b) Prior Art When trying to control some kind of equipment by recognizing speech, the response time, which mostly corresponds to the recognition operation, always poses a problem. In order to speed up the response time, it is generally considered to use a CPU, memory, or their peripheral circuits that can operate at high speed, but the entire system becomes expensive and the speed cannot be increased that much.

ここで現存する音声認識動作を第１図に示す。
認識すべき音声が入力される音声入力期間I₁、そ
の入力音声の終了を意味する無音期間N₁、入力
音声の認識を行う認識期間R₁を一つのサイクル
とし、この認識サイクルI₁，N₁，R₁に連続して
次の音声認識サイクルI₂，N₂，R₂が始まる。従
つて音声入力期間Ｉ、無音期間Ｎ、認識期間Ｒを
直列的に持つ限り、上述した如く認識動作の高速
化には限度がある。 The existing speech recognition operation is shown in FIG.
A speech input period I ₁ during which the speech to be recognized is input, a silent period N ₁ indicating the end of the input speech, and a recognition period R ₁ during which the input speech is recognized are considered as one cycle, and this recognition cycle I ₁ , N ₁ and R ₁ , the next speech recognition cycle I ₂ , N ₂ , and R ₂ starts. Therefore, as long as the voice input period I, the silent period N, and the recognition period R are arranged in series, there is a limit to the speeding up of the recognition operation as described above.

このような点に鑑みて、本発明等は凝似連続認
識システムと呼ぶ高速化を図つた認識装置を提案
した（特願昭57−74932号）。これは第２図に示す
如く、音声の入力I₁，I₂，I₃…とその入力音声の
認識R₁，R₂，R₃…とを個別のプロセツサを用い
て並列的に行わしめんとするものである。 In view of these points, the present invention and others have proposed a recognition device which aims to increase the speed and is called a condensed continuous recognition system (Japanese Patent Application No. 74932/1982). As shown in Figure 2, this is a system that performs voice inputs I ₁ , I ₂ , I _{3 .} . . and recognition of the input voices R ₁ , R ₂ , R _{3 .} . . in parallel using separate processors. That is.

ところがこの第２図からも明らかな如く、一般
に音声の入力I₁，I₂，I₃…動作時間と認識R₁，
R₂，R₃…動作時間とを比較した場合、認識R₁，
R₂，R₃…動作時間の方が長く、結果的に入力動
作を担当するプロセツサ側に待ち時間Ｔが生じて
しまい、これが時間的な無駄となる。この現象は
線形マツチングに依る認識の場合は認識時間が短
かつたのでそれ程問題とならなかつたが、認識率
の向上の為に開発されたDPマツチング法等を用
いて認識する場合、相当な時間が掛り、この待ち
時間Ｔの無駄が大きくなる。 However, as is clear from Fig. 2, generally the voice inputs I ₁ , I ₂ , I ₃ ...operation time and recognition R ₁ ,
R ₂ , R ₃ ...When comparing the operating time, the recognition R ₁ ,
R ₂ , R ₃ . . . operation time is longer, and as a result, a waiting time T occurs on the processor side responsible for the input operation, which is a waste of time. This phenomenon did not pose much of a problem in the case of recognition based on linear matching because the recognition time was short, but when recognition was performed using the DP matching method etc. developed to improve the recognition rate, it took a considerable amount of time. , and the waste of this waiting time T becomes large.

(ハ) 発明が解決しようとする課題本発明はこのような新たな問題点に着目して為
されたものであつて、上述の如く並列的に装備さ
れた両プロセツサ間の待ち時間を短縮でき、これ
によつて、時間的な無駄の削減を図つた音声認識
装置を提供するものである。(c) Problems to be Solved by the Invention The present invention has been made by focusing on these new problems, and it is possible to shorten the waiting time between both processors installed in parallel as described above. , thereby providing a speech recognition device that reduces time wastage.

(ニ) 課題を解決するための手段本発明の音声認識装置は、第１のプロセツサ部
で音声認識の取り込み動作及び予備比較認識動作
を行わせると共に、第２のプロセツサ部で本格比
較認識動作を行わせるものであつて、第１のプロ
セツサ部での予備的な比較認識処理時間を第２の
プロセツサ部での本格的な比較認識処理時間より
短時間で行うと共に、上記第１のプロセツサ部で
の音声信号取り込み動作及び予備比較認識動作に
要する時間と、上記第２のプロセツサ部での本格
比較認識動作に要する時間とを略等しく設定した
ものである。(d) Means for Solving the Problems The speech recognition device of the present invention causes the first processor section to perform the speech recognition capture operation and the preliminary comparison recognition operation, and the second processor section to perform the full comparison recognition operation. The preliminary comparison recognition processing time in the first processor section is shorter than the full-scale comparison recognition processing time in the second processor section, and the first processor section The time required for the audio signal capture operation and the preliminary comparison recognition operation is set to be approximately equal to the time required for the full comparison recognition operation in the second processor section.

(ホ) 作用本発明の音声認識装置は、第１のプロセツサ部
での予備的な比較認識処理時間を第２のプロセツ
サ部での本格的な比較認識処理時間より短時間で
行うと共に、上記第１のプロセツサ部での音声信
号取り込み動作及び予備比較認識動作に要する時
間と上記第２のプロセツサ部での本格比較認識動
作に要する時間とを略等しく設定しているので、
第１のプロセツサから連続的に順次得られる音声
の取り込み／予備比較認識処理の結果に対して、
第２のプロセツサでは略待ち時間無しに本格比較
認識処理が実行できる。(E) Effect The speech recognition device of the present invention performs the preliminary comparative recognition processing time in the first processor section in a shorter time than the full-scale comparative recognition processing time in the second processor section, and Since the time required for the audio signal acquisition operation and the preliminary comparison recognition operation in the first processor section and the time required for the full comparison recognition operation in the second processor section are set to be approximately equal,
With respect to the results of voice capture/preliminary comparison recognition processing that are sequentially obtained from the first processor,
The second processor can perform full comparison recognition processing with almost no waiting time.

(ヘ) 実施例第３図は本発明装置の具体的な構成を示すブロ
ツク図であつて、１は音声を電気的な音声信号に
変換するマイクロフオン、２はこのマイクロフオ
ン１からの音声信号の特徴を抽出する特徴抽出回
路であつて、例えば特開昭54−145407号公報に記
載されているような帯域フイルター群を用いた周
波数スペクトルパラメータの抽出手法が使用で
き、このパラメータ時系列からなる音声の特徴パ
ターンが出力される。なお、この特徴パターンと
しては、上述のパラメータ時系列を特定サンプル
数に正規化したものが一般的に用いられる。３は
この特徴抽出回路２から得られる入力音声の特徴
パターンを記憶する第１のバツフアメモリであ
る。４はこの特徴パターンを予備的に比較認識す
る為の参照パターンが多数貯えられている第１の
参照パターンメモリである。ここで貯えられてい
る参照パターンは比較対象である入力音声の特徴
パターンと同形式のパターンであつて、時系列サ
ンプル数が正規化された周波数スペクトルパラメ
ータ列からなる。５は入力音声の特徴パターンと
多数の参照パターンとを比較認識して予備的な認
識動作を行う第１の認識回路である。ここで云う
予備認識動作とは、第１の参照パターンメモリ４
に貯えられている多数の参照パターンのうちから
入力音声の特徴パターンに最も類似した特定のパ
ターンを選び出すのではなく、比較的類似してい
る複数個のパターンを選び出すもので、比較的そ
の認識時間が短い線形マツチング法等が採用され
る。６はこの第１の認識回路５での認識の結果、
選び出された入力音声の特徴パターンに比較的類
似している参照パターンの参照パターンメモリ４
に於ける番地を記憶する第１の番地メモリであ
る。７は上記音声抽出回路２での特徴抽出動作や
抽出した特徴パターンのバツフアメモリ３への取
り込み、入力音声の無音の状態から音声の終端を
検出する検出動作、或いは第１の認識回路５で予
備認識動作とかを司どる第１のCPUで、これ等
のマイクロフオン１〜第１のCPU７で第１のプ
ロセツサP₁が構成されている。(F) Embodiment FIG. 3 is a block diagram showing the specific configuration of the device of the present invention, in which 1 is a microphone that converts audio into an electrical audio signal, and 2 is an audio signal from this microphone 1. This is a feature extraction circuit that extracts the characteristics of a frequency spectrum parameter using a group of band filters as described in, for example, Japanese Patent Application Laid-open No. 54-145407. A voice characteristic pattern is output. Note that as this feature pattern, a pattern obtained by normalizing the above-mentioned parameter time series to a specific number of samples is generally used. Reference numeral 3 denotes a first buffer memory that stores the characteristic pattern of the input voice obtained from the feature extraction circuit 2. Reference numeral 4 denotes a first reference pattern memory in which a large number of reference patterns for preliminary comparison and recognition of this feature pattern are stored. The reference pattern stored here is a pattern in the same format as the characteristic pattern of the input voice to be compared, and is composed of a frequency spectrum parameter sequence in which the number of time-series samples is normalized. A first recognition circuit 5 performs a preliminary recognition operation by comparing and recognizing the characteristic pattern of the input voice with a large number of reference patterns. The preliminary recognition operation referred to here means that the first reference pattern memory 4
Rather than selecting a specific pattern that is most similar to the characteristic pattern of the input voice from among a large number of reference patterns stored in A linear matching method, etc., in which the distance is short, is adopted. 6 is the recognition result in this first recognition circuit 5,
Reference pattern memory 4 for reference patterns that are relatively similar to the selected feature pattern of the input voice
This is a first address memory that stores an address in . 7 is a feature extraction operation in the voice extraction circuit 2, loading of the extracted feature pattern into the buffer memory 3, a detection operation for detecting the end of the voice from a silent state of the input voice, or preliminary recognition in the first recognition circuit 5. The first CPU controls operations, etc., and these microphones 1 to 7 constitute a first processor _P1 .

また、１０は上記第１のプロセツサP₁のバツ
フアメモリ３に貯えられた入力音声の特徴パター
ンが転送記憶される第２のバツフアメモリ、１１
は入力音声を本格的に認識する為の参照パターン
が貯えられている第２の参照パターンメモリであ
り、この参照パターンとしては、上記第１のプロ
セツサP₁で用いるものと同じものであつてよい。
しかしながら、第２のプロセツサＰ２で精度の高
い比較認識処理を行う為に、この第２の参照パタ
ーンメモリの参照パターンのサンプル数を第１の
それより多くして、精度の高い参照パターンを用
意することもできる。１２はこの第２の参照パタ
ーン１１内の参照パターンのうち第１のプロセツ
サP₁に於ける予備比較認識の結果予備的に選択
された第１の番地メモリ６にある複数の参照パタ
ーンの番地が転送記憶される第２の番地メモリで
ある。１３はこの第２の番地メモリ１２で指定さ
れた参照パターンと第２のバツフアメモリ１０に
貯えられた入力音声の特徴パターンとの本格的な
比較認識動作をする第２の認識回路で、多少その
認識動作に時間が掛るが、厳密な認識動作が行わ
れるDP法（ダイナミツクプログラミング法）等
を用いて認識動作を行う。１４はこの第２の認識
回路１３で認識動作の結果、特定された音声を外
部回路に出力する出力ポート出ある。１５はこれ
等の第２の認識回路１３での認識動作や出力ポー
ト１４での出力動作を司どる第２のCPUで、こ
れ等の第２のバツフアメモリ１０〜第２のCPU
１５から第２のプロセツサP₂が構成されている。 Further, reference numeral 10 denotes a second buffer memory to which the characteristic pattern of the input voice stored in the buffer memory 3 of the first processor _P1 is transferred and stored;
is a second reference pattern memory in which a reference pattern for fully recognizing input speech is stored, and this reference pattern may be the same as that used in the first processor _P1 . .
However, in order to perform comparative recognition processing with high accuracy in the second processor P2, the number of samples of the reference pattern in this second reference pattern memory is made larger than that in the first one to prepare a highly accurate reference pattern. You can also do that. Reference numeral 12 indicates the addresses of a plurality of reference patterns in the first address memory 6 that are preliminarily selected as a result of preliminary comparison recognition in the first processor _P1 among the reference patterns in the second reference pattern 11. This is the second address memory that is transferred and stored. Reference numeral 13 denotes a second recognition circuit that performs a full-scale recognition operation of comparison between the reference pattern designated by the second address memory 12 and the characteristic pattern of the input voice stored in the second buffer memory 10. Recognition operations are performed using methods such as the DP method (dynamic programming method), which takes time to perform, but performs rigorous recognition operations. Reference numeral 14 denotes an output port for outputting the voice specified as a result of the recognition operation in this second recognition circuit 13 to an external circuit. Reference numeral 15 denotes a second CPU which controls the recognition operation in the second recognition circuit 13 and the output operation at the output port 14, and the second buffer memory 10 to the second CPU
15 constitutes a second processor _P2 .

次にこの第３図で示した本発明装置に於ける動
作について第４図を参照しつつ説明する。 Next, the operation of the apparatus of the present invention shown in FIG. 3 will be explained with reference to FIG. 4.

第４図の音声入力期間Ｉは、通常、200msec〜
1500msecであり、これは、単語音声認識に於け
る一般的な時間範囲である。また、無音期間Ｎ
は、通常、200msecであり、これは、音声中の中
断、例えば、「トツトリ」と発声した時の促音
「ツ」での中断時間が、150mseo程度であるので、
これを音声の終了とみなさないために、この中断
時間より、30％以上長い値、200msecが適切であ
る。さらに、予備比較認識期間SRは、本実施例
の如く、線形マツチングを用いて、認識語を64語
とした場合には、ハードの処理速度にも依存する
が、約300msecは必要であろう。 The audio input period I in Figure 4 is usually 200 msec~
1500 msec, which is a typical time range in word speech recognition. Also, the silent period N
is normally 200 msec, and this is because the interruption time in the voice, for example, the interruption time at the consonant "tsu" when uttering "totsutori" is about 150 mseo.
In order not to consider this as the end of the audio, a value of 200 msec, which is at least 30% longer than this interruption time, is appropriate. Further, if the number of recognized words is 64 using linear matching as in this embodiment, the preliminary comparison recognition period SR will be approximately 300 msec, although it depends on the processing speed of the hardware.

一方、本格比較認識期間MRは、本実施例の如
く、DPマツチングを用いて、認識語を20語（上
記64語が予備選択された結果）とした場合には、
ハードの処理速度にも依存するが、約600msecが
必要であろう。 On the other hand, in the full-scale comparative recognition period MR, when DP matching is used as in this example and the number of recognized words is 20 (the result of the preliminary selection of the above 64 words),
Although it depends on the processing speed of the hardware, approximately 600 msec will be required.

先ず、第１のプロセツサP₁に於て、マイクロ
フオン１から入力された音声は、特徴抽出回路２
に順次取り込まれて行くが、その入力音声I₁が途
切れてから一定の無音期間N₁が続くと第１の
CPU７は入力音声I₁の終端を検出してそれまでに
特徴抽出回路２に取り込まれた入力音声の特徴パ
ターンを該特徴抽出回路２で抽出し、第１のバツ
フアメモリ３に貯える。一方、第１の認識回路５
は第１の参照パターンメモリ４に貯えられている
参照パターンと第１のバツフアメモリ３に貯えら
れた特徴パターンとの予備比較認識SR₁を行い、
特徴パターンと比較的類している参照パターンの
いくつかを選択し、その選択した参照パターンの
パターンメモリ４は於る番地を第１の番地メモリ
６に記憶せしめる。この第１の認識回路５に依る
予備比較認識SR₁動作は先にも述べた如く、必ず
しも厳密な認識動作は必要としないので、その認
識動作が比較的短時間に遂行される線形マツチン
グ法が採用される。尚、この第１のプロセツサ
P₁に於ける全ての動作の制御は第１のCPU７に
依つて司どられる。 First, in the first processor _P1 , the voice input from the microphone 1 is processed by the feature extraction circuit 2.
However, if a certain silent period N ₁ continues after the input audio I ₁ is interrupted, the first
The CPU 7 detects the end of the input voice I ₁ , extracts the feature pattern of the input voice that has been taken into the feature extraction circuit 2 up to that point, and stores it in the first buffer memory 3 . On the other hand, the first recognition circuit 5
performs preliminary comparison recognition SR ₁ between the reference pattern stored in the first reference pattern memory 4 and the feature pattern stored in the first buffer memory 3;
Some reference patterns that are relatively similar to the characteristic pattern are selected, and the pattern memory 4 of the selected reference patterns causes the first address memory 6 to store an address thereof. As mentioned earlier, the preliminary comparison recognition _SR1 operation by the first recognition circuit 5 does not necessarily require a strict recognition operation, so the linear matching method, in which the recognition operation is performed in a relatively short time, Adopted. Note that this first processor
All operations in P ₁ are controlled by the first CPU 7.

この予備認識SR₁動作が完了すると、第２のプ
ロセツサP₂に於ては、第１の番地メモリ６に導
入された予備認識SR₁結果である番地情報が第２
のプロセツサP₂の第２の番地メモリ１２に転送
され、その番地情報に基づいて、参照パターンが
貯えられている第２の参照パターンメモリ１１の
うち予備認識SR₁で選択したブロツクパターンの
みを第２の認識回路１３に読み出し、該認識回路
１３で予め第１のバツフアメモリ３から第２のバ
ツフアメモリ１０に転送されている入力音声の特
徴パターンとの本格的な比較認識MR₁動作は、
参照パターンと特徴パターンとの厳密な比較認識
が遂行され、その為に例えばDP法が採用されて
いる。先にも述べた如く、この本格的比較認識
MR₁動作は本来であれば第１のプロセツサP₁に
於ける予備認識SR₁に比して時間が掛るのである
が、この認識回路１３で特徴パターンと比較され
る参照パターンは第１のプロセツサP₁に於て予
め選択されてその数が少くなつているので、参照
パターンメモリ１１に貯えられている全ての参照
パターンとの比較をする場合に比してその認識に
要する時間は短く、結果的に第４図に示すように
第１のプロセツサP₁に於ける入力音声取り込み
期間Ｉと予備比較認識SR並びに無音期間Ｎの合
計期間と略等しくなる。但し、この関係は、音声
ものものの長さや人間の発声時間に依存するとこ
ろがあるので、不確定な要因を含んだ比較的大ざ
ざつばな程度で略等しい関係と言わざるを得な
い。 When this preliminary recognition SR ₁ operation is completed, the second processor P ₂ stores the address information that is the result of the preliminary recognition SR ₁ introduced into the first address memory 6 into the second processor P 2 .
The block pattern is transferred to the second address memory 12 of the processor P ₂ , and based on the address information, only the block pattern selected in the preliminary recognition SR ₁ from the second reference pattern memory 11 storing reference patterns is transferred to the second address memory 12 of the processor P 2. The full-scale recognition MR 1 operation is performed by reading the input voice into the second recognition circuit 13 and comparing it with the characteristic pattern of the input voice that has been previously transferred from the first buffer memory 3 to the second buffer memory 10 by the recognition circuit ₁₃ .
Strict comparative recognition between a reference pattern and a feature pattern is performed, and for this purpose, for example, the DP method is adopted. As mentioned earlier, this full-scale comparative recognition
Normally, the MR ₁ operation takes more time than the preliminary recognition SR ₁ in the first processor P ₁ , but the reference pattern that is compared with the feature pattern in this recognition circuit 13 is Since the number of patterns selected in advance in P ₁ is small, the time required for recognition is shorter than when comparing with all the reference patterns stored in the reference pattern memory 11, and the result is Generally speaking, as shown in FIG. 4, it is approximately equal to the total period of the input voice capture period I, the preliminary comparison recognition SR, and the silent period N in the first processor _P1 . However, since this relationship depends in part on the length of the audio material and the human utterance time, it must be said that it is a relatively rough and approximately equal relationship that includes uncertain factors.

例えば、短い音声の入力の場合（Ｉ＝
200msec）は、第１のプロセツサＰ１の処理時間
（Ｉ＋Ｎ＋SR）は、700msec程度となり、第２の
プロセツサの処理時間MR₂のそれ（600msec）と
は勿論略等しい。 For example, in the case of short audio input (I=
200 msec), the processing time (I+N+SR) of the first processor P1 is about 700 msec, which is of course approximately equal to that of the _second processor MR2 (600 msec).

しかし、長い音声の入力の場合（Ｉ＝
1500msec）は、第１のプロセツサＰ１の処理時
間（Ｉ＋Ｎ＋SR）は、200msec程度となり、第
２プロセツサＰ２のそれとは、完全に等しいと言
えないが、略等しい範囲と見做して問題はない。 However, in the case of long speech input (I=
1500 msec), the processing time (I+N+SR) of the first processor P1 is about 200 msec, which cannot be said to be completely equal to that of the second processor P2, but there is no problem as it can be regarded as approximately the same range.

即ち、長い入力時間の音声を対象とした場合に
上記両時間関係が等しくなるように設定するより
も、短い入力時間の音声を対象とした場合の上記
両時間関係が等しくなるように設定しておけば、
より高速処理が必要となる短時間入力音声の連続
入力に対して、有効に連続認識が行える。 In other words, rather than setting the above two temporal relationships to be equal when targeting audio with a long input time, it is better to set the above two temporal relationships to be equal when targeting audio with a short input time. If you keep it,
Continuous recognition can be performed effectively for continuous input of short-term input speech that requires higher-speed processing.

一方、これと並行して第１のプロセツサP₁に
於ては、予備認識SR₁が終了すると、次の入力音
声の取り込みI₂が開始される。上述したようにこ
の取り込みI₂動作とその後の無音期間N₂及び予
備認識SR₂とが先行する入力音声の第２のプロセ
ツサP₂に於ける本格認識MR₁期間と略等しい。
従つて第１のプロセツサP₁に於ける２番目の入
力音声の取り込みI₂及び予備認識SR₂が完了した
時点で第２のプロセツサP₂も１番目の入力音声
に対する本格認識動作MR₁を完了しており、両
プロセツサP₁，P₂は全くの待ち時間なしに夫々
次の音声に対する所定の動作、即ち第１のプロセ
ツサP₁は３番目に入力音声の取り込み動作I₃を、
又第２のプロセツサP₂は２番目の入力音声の本
格認識動作MR₂に移行する。 On the other hand, in parallel with this, in the first processor _P1 , when the preliminary recognition _SR1 is completed, the capture _I2 of the next input voice is started. As described above, this capture _I2 operation, the subsequent silent period _N2 , and preliminary recognition _SR2 are approximately equal to the preceding full-scale recognition _MR1 period in the second processor _P2 of the input speech.
Therefore, at the time when the first processor P ₁ completes the capture I ₂ of the second input voice and the preliminary recognition SR ₂ , the second processor P ₂ also completes the full-scale recognition operation MR ₁ for the first input voice. Both processors P ₁ and P ₂ perform predetermined operations on the next audio without any waiting time, that is, the first processor P ₁ performs the third input audio input operation I ₃ , and
Further, the second processor _P2 shifts to full-scale recognition operation _MR2 of the second input voice.

尚、上述の実施例では、予備的な比較認識処理
として、線形マツチング法、本格的な比較認識処
理として、DPマツチング法を夫々説明したが、
本発明はこれらに限定されるものでなく、予備的
なそれは本格的なそれより認識精度は低くても短
時間の処理で参照パターンのある程度の絞り込み
が可能な手法が使用できる。例えば、予備的比較
認識処理としては、音声パワーの極大点、極小点
のみを用いる方法（特開昭56−55995号）等が使
用でき、更に、本格的な比較認識処理としては、
認識関数を用いる方法（書籍「音声認識」117〜
118頁、新美著、共立出版、昭和54年10月10日発
行）等が使用できる。 In the above embodiment, the linear matching method was explained as a preliminary comparison recognition process, and the DP matching method was explained as a full-scale comparison recognition process.
The present invention is not limited to these methods, and a preliminary method can use a method that can narrow down the reference patterns to some extent in a short processing time even if the recognition accuracy is lower than that of a full-scale method. For example, as a preliminary comparative recognition process, a method using only the maximum and minimum points of voice power (Japanese Patent Laid-Open No. 56-55995) can be used, and as a full-scale comparative recognition process,
Method using recognition functions (Book ``Speech Recognition'' 117~
118 pages, written by Niimi, published by Kyoritsu Shuppan, October 10, 1978), etc. can be used.

(ト) 発明の効果本発明は、以上の説明から明らかな如く、入力
音声に対する認識動作を予備認識と本格認識とに
２分すると共に独立して動作する２個のプロセツ
サを用意し、入力音声の取り込み動作と予備認識
動作とを一方のプロセツサで実行し、本格認識動
作を他方のプロセツサで実行せしめているので、
多少は入力音声の時間長に左右されるが、両プロ
セツサでの各々の実行時間を略等しく設定でき
る。従つて、入力音声の時間長によつては、両プ
ロセツサに於ける待時間を完全に皆無にできない
までも、夫々のプロセツサで無駄な空き時間を有
効に削減することが可能となり、結果的に音声認
識動作の高速化を果し得る。(g) Effects of the Invention As is clear from the above description, the present invention divides the recognition operation for input speech into two into preliminary recognition and full-scale recognition, and prepares two processors that operate independently. The capture operation and preliminary recognition operation are executed by one processor, and the full recognition operation is executed by the other processor.
Although it depends to some extent on the time length of the input audio, the execution times of both processors can be set approximately equal. Therefore, depending on the time length of the input audio, although it may not be possible to completely eliminate waiting time in both processors, it is possible to effectively reduce wasted idle time in each processor, and as a result, This can speed up speech recognition operations.

[Brief explanation of drawings]

第１図、第２図は夫々現存する音声認識装置の
動作を示す模式図、第３図は本発明装置の構成を
示すブロツク図、第４図は本発明に依る音声認識
動作を示す模式図である。 P₁，P₂…プロセツサ、３，１０…バツフアメ
モリ、４，１１…参照パターンメモリ、５，１３
…認識回路、６，１２…番地メモリ、７，１５…
CPU。 1 and 2 are schematic diagrams showing the operation of existing speech recognition devices, FIG. 3 is a block diagram showing the configuration of the device of the present invention, and FIG. 4 is a schematic diagram showing the speech recognition operation according to the present invention. It is. _P1 , _P2 ...Processor, 3, 10...Buffer memory, 4, 11...Reference pattern memory, 5, 13
...Recognition circuit, 6,12...Address memory, 7,15...
CPU.

Claims

[Claims] 1. A microphone that converts audio into an electrical audio signal, a feature extraction circuit that captures the audio signal and extracts its features, and a preliminary feature pattern extracted by the feature extraction circuit. a first reference pattern memory storing a reference pattern for comparative recognition; a first recognition circuit for performing preliminary comparative recognition between the reference pattern and the feature pattern; and a first recognition circuit for performing preliminary comparative recognition between the reference pattern and the feature pattern; a first processor section consisting of a buffer memory for storing results, a first CPU for controlling these feature extraction operations and preliminary comparison recognition operations; A second reference pattern memory is used to carry out full-scale comparative recognition between the reference pattern preliminarily recognized by the first processor and the feature pattern among the reference patterns in the second reference pattern memory. 2 recognition circuit and an I/I that outputs the recognition result of this second recognition circuit to an external circuit.
O port and this full-scale comparison recognition operation and I/O
A second CPU that controls port operations; This process is performed in a shorter time than the full-scale comparison recognition processing time in the second recognition circuit, and the time required for the audio signal acquisition operation and the preliminary comparison recognition operation in the first processor section and the second processor section A speech recognition device characterized in that the time required for a full-scale comparative recognition operation is set to be approximately equal to the time required for a full-scale comparative recognition operation.