JPS6239900A

JPS6239900A - Voice recognition equipment

Info

Publication number: JPS6239900A
Application number: JP60178510A
Authority: JP
Inventors: 宮芝　晃一
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1985-08-15
Filing date: 1985-08-15
Publication date: 1987-02-20

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】し産業上の利用分野］本発明は、音声認識装置に関し、特に、音声標準パター
ンの読み出し時間、音声マツチング処理時間を短縮させ
た音声認識装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition device, and more particularly to a speech recognition device that reduces the reading time of a speech standard pattern and the speech matching processing time.

［従来の技術Ｊ従来のこの種の音声認識装置は、入力音声と、予め登録
しておいた全音声標準パターンを順々に重ね合わせ、そ
の都度両者間の距離を計算して距離が最小のパターンを
抽出し、これをもって音声認識結果としていた。従って
、認識可能語数を多くするとそれに伴い、登録語数が増
大し、認識に要する時間も大幅に増加し、かつ認識率も
低下する現象が顕著となった。[Prior Art J] This type of conventional speech recognition device sequentially superimposes the input speech and all speech standard patterns registered in advance, calculates the distance between the two each time, and selects the one with the minimum distance. The patterns were extracted and used as speech recognition results. Therefore, as the number of recognizable words increases, the number of registered words increases, the time required for recognition increases significantly, and the recognition rate decreases.

これを避けるためには、例えば音声標準語を単語、数字
、単音節等の概念別に分類して登録し、認識時にはその
語集団が格納されている記憶部を選択できるようにして
その中で厳密なマツチング処理を行なう方法が有効とさ
れている。In order to avoid this, for example, standard spoken words can be classified and registered according to concepts such as words, numbers, monosyllables, etc., and when recognition is performed, it is possible to select the memory section in which the word group is stored, and then A method that performs matching processing is considered effective.

この語集団記憶部の選択・変更をする方法にはキー操作
によるもの又は音声によるもの等がある。There are two ways to select and change the word group memory, such as by key operation or by voice.

［発明が解決しようとする問題点］しかし、キー操作によるものは語集団記憶部の選択・変
更が確実に行なえるが、キー人力と音声入力を同時にし
なければならず操作が複雑となり、使用者の負担が大き
い。[Problems to be solved by the invention] However, although the key operation allows the selection and change of the word group memory section to be performed reliably, the operation becomes complicated as manual key input and voice input must be performed at the same time. This places a heavy burden on people.

また、音声によるものは、本来の登録音声パターン群の
他にそれら登録音声パターン群の記憶部を選択−変更す
るためのコマンドが必要であり、このために各語集団記
憶部を代表する名称の音声パターンからなる変更用語集
団の記憶部を別設しなければならない。In addition, in addition to the original registered voice pattern group, the voice type requires a command to select and change the memory of the registered voice pattern group, and for this purpose, a name representative of each word group memory is required. A separate storage unit must be provided for the modified vocabulary group consisting of voice patterns.

つまり、本来の音声パターンがその語の特徴によりいく
つかの語集団記憶部に分けられ、夫々には変更用コマン
ドとして、例えば「ヘンコラＪを組み込む。語集団記憶
部の選択・変更の際には、まず「ヘンコラＪを音声入力
すると、それがその時点で選択されていた語集団記憶部
の中で認識され、これにより変更用語集団の記憶部が選
択される。次に選択拳変更を希望する語集団記憶部の名
称を音声入力することにより、選択・変更が可能になる
というものである。しかしながら、この方法では選択・
変更のために二度の音声入力が必要であるため手間がか
かる。In other words, the original speech pattern is divided into several word group memory sections depending on the characteristics of the word, and each one is given a change command such as "Henkora J." When selecting or changing the word group memory section, , First, when you input "Henkora J" by voice, it is recognized in the word group memory that was selected at that time, and the memory of the change term group is selected.Next, if you wish to change the selected word, Selection/change is possible by inputting the name of the word group memory unit by voice.However, this method does not allow selection/change.
It is time-consuming because it requires voice input twice to make changes.

また前述した変更用コマンドの音声パターンを各語集団
記憶部毎に登録するため、同一語でありながら発声音量
レベルや発声時間長の異なる音声パターンが登録される
ことになる。それゆえ同一の選択・変更を行なう場合で
も認識のされ方が異なるため、最悪の場合はある語集団
記憶部から変更用語集団が選択できなくなる場合もあっ
た。Furthermore, since the voice patterns of the above-mentioned change commands are registered for each word group storage unit, voice patterns of the same word but with different voice volume levels and voice time lengths are registered. Therefore, even when the same selection or change is made, the recognition is different, and in the worst case, it may become impossible to select a changed word group from a certain word group storage unit.

［問題点を解決するための手段］本発明は、上述した従来技術の欠点を解決することを目
的としてなされたものであり、この問題点を解決する一
手段として、例えば発声長に従ってグループ分けした複
数の音声標準パターンを記憶している音声パターン記憶
手段と、音声情報を入力する音声入力手段と、該音声入
力手段より入力される音声の発声長を検出する発声長検
出手段と、該発声長検出手段で検出された発声長に従い
、前記音声パターン記憶手段より対応する音声標準パタ
ーンを読み出す音声パターン読み出し手段と、該音声パ
ターン読み出し手段で読み出した音声標準パターンと入
力音声パターンを順次比較して音声認識する音声認識手
段とを備える。[Means for Solving the Problems] The present invention has been made for the purpose of solving the above-mentioned drawbacks of the prior art. A voice pattern storage means for storing a plurality of standard voice patterns, a voice input means for inputting voice information, a voice length detection means for detecting a voice length of voice input from the voice input means, and a voice length detection means for detecting the voice length of the voice input from the voice input means. voice pattern reading means for reading out a corresponding standard voice pattern from the voice pattern storage means according to the utterance length detected by the detection means; and voice recognition means for recognizing the voice.

［作用］かかる構成において、入力音声の発声時間情報を音声認
識手法に取りいれることにより、音声パターン記憶手段
に対する読み書きを高速で行なえ、音声認識のためのマ
ツチング処理時間を短縮させかつ高認識率が得られる。[Function] In this configuration, by incorporating the utterance time information of the input speech into the speech recognition method, it is possible to read and write to the speech pattern storage means at high speed, shorten the matching processing time for speech recognition, and achieve a high recognition rate. can get.

［実施例］以下、添付図面を参照して本発明の実施例を詳細に説明
する。[Embodiments] Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

第１図は本発明に係る一実施例の音声認識装置のブロッ
ク構成図である。FIG. 1 is a block diagram of a speech recognition device according to an embodiment of the present invention.

図において１は音声を電気信号に変換するマイクロホン
、２は周波数範囲２００〜６０００Ｈｚを８〜３０ｃｈ
に分けた帯域通過フィルタ群から成り、パワー信号やホ
ルマント信号等の特徴量抽出を行う特徴抽出部、３は抽
出された特徴間を５〜１０　ｍ　Ｓ毎に標本化し、量子
化するＡ／Ｄ変換器である。また、４及び１４は標準音
声登録の際と入力音声認識の際とで信号経路を切替える
登録／認識切替スイッチ手段、５及び１２は登録又は認
識の際に入力音声の発声時間長が算出されるまでの間入
力音声特微量を蓄えておくバッファメモリであり、６は
入力音声のパワー信号から語檗の始端・終端に相当する
点を検出する始端Φ終端検出回路である。In the figure, 1 is a microphone that converts audio into electrical signals, and 2 is a 8-30 channel with a frequency range of 200-6000Hz.
3 is an A/D that samples and quantizes the extracted features every 5 to 10 mS. It is a converter. Further, 4 and 14 are registration/recognition changeover switch means for switching the signal path between standard voice registration and input voice recognition, and 5 and 12 are for calculating the utterance duration of input voice at the time of registration or recognition. 6 is a buffer memory for storing the input voice feature amount until the input voice, and 6 is a start/Φ end detection circuit that detects points corresponding to the start and end of a word syllable from the power signal of the input voice.

７は始端・終端検出回路６の検出点情報より入力音声の
始端から終端までの時間を測定する発声長測定回路、８
は発声長測定回路７で検出測定した発声時間長に応じて
諸量集団記憶部１０１〜Ｉｏｎの選択信号を発生する発
声長選別回路、１０は語檗集団記憶部１０１〜１０ｎを
有するメモリ、９は音声登録の際に諸量集団記憶部１０
１〜１ｏｎを切替えるスイッチ、１１は音声認識の際に
語檗集団記憶部１０１〜Ｉｏｎを切替えるスイッチであ
る。１３は音声認識の際に入力音声パターンと語檗集団
記憶部１０１〜Ｉｏｎを切替えて読み出した登録音声パ
ターンとを比較するパターンマツチング部、１５は認識
結果を処理する汎用のセントラルプロセツシングユニッ
ト（ＣＰＵ）、１６は操作キーボード、１７は認識結果
等を表示する表示部、１８はメモリ１０に記憶されてい
る音声標準パターンを記録カード上に記憶させるカード
ライタである。１９は装置使用の際、以前にカードライ
タ１８により記憶させた記録カードの登録した音声標準
パターンをメモリ１０にロードするカードリーダである
。7 is a utterance length measurement circuit that measures the time from the start to the end of the input voice based on the detection point information of the start/end detection circuit 6;
9 is a utterance length selection circuit that generates a selection signal for various quantity group storage units 101 to Ion in accordance with the utterance time length detected and measured by the utterance length measurement circuit 7; is stored in the collective storage unit 10 during voice registration.
A switch 11 switches between 1 and 1on, and 11 is a switch that switches between the vocabulary group storage units 101 and 101 during speech recognition. Reference numeral 13 denotes a pattern matching unit that compares the input speech pattern with the registered speech pattern read out by switching the vocabulary group storage units 101 to Ion during speech recognition, and 15 a general-purpose central processing unit that processes the recognition results. (CPU), 16 is an operation keyboard, 17 is a display section for displaying recognition results, etc., and 18 is a card writer for storing the voice standard pattern stored in the memory 10 on a recording card. Reference numeral 19 denotes a card reader that loads into the memory 10 the voice standard pattern registered in the recording card previously stored by the card writer 18 when the device is used.

なお、本実施例では記録カードとして磁気カードを用い
ている。このため、磁気フレキシブルディスク装置等と
比較して、小型であり、また、取り扱いも容易であり、
非常に使い易いものとなっている。なお、光カード、Ｉ
Ｃカードであってもよいことは言うまでもない。Note that in this embodiment, a magnetic card is used as the recording card. Therefore, it is smaller and easier to handle than magnetic flexible disk devices, etc.
It is extremely easy to use. In addition, optical card, I
Needless to say, it may be a C card.

以下、以上の構成よりなる本実施例の動作を詳細に説明
する。The operation of this embodiment having the above configuration will be explained in detail below.

まず、マイクロホン１より入力された入力音声の発声時
間長は入力音声の始端・終端の時間差で求められる。音
声の始端・終端検出については種々の方法が考えられる
が、本実施例ではＡ／Ｄ変換器３によるＡ／Ｄ変換後の
パワー（電力）情報Ｐを用いている。First, the utterance time length of the input voice input from the microphone 1 is determined by the time difference between the start and end of the input voice. Various methods can be considered for detecting the start and end points of audio, but in this embodiment, power information P after A/D conversion by the A/D converter 3 is used.

第２図はＡ／Ｄ変換器３より５〜１．０　ｍ　Ｓ毎に出
力される入力音声のパワー情報Ｐを、縦軸をパワー量Ｐ
、横軸を時間軸に表わした図である。Figure 2 shows the power information P of the input audio output from the A/D converter 3 every 5 to 1.0 mS, and the vertical axis represents the power amount P.
, in which the horizontal axis is expressed as a time axis.

第２図において、まず入力音声に混入する暗雑音を除去
するため、予め実験室内での雑音電力の平均値を計算し
ておき、これを閾値ＰＮとする。In FIG. 2, first, in order to remove background noise mixed into the input audio, the average value of noise power in the laboratory is calculated in advance, and this is set as the threshold value PN.

更に無声化し易い語頭子音や電力の小さい語頭子音レベ
ルの閾値をＰｃとして、２つの閾値ＰＮとＰｃの平均値
をＰＭとする。また入力音声があってから次の入力音声
があるまでのポーズ時間の最小値をＴｐとし、かつ入力
音声として認める最小発声時間をＴｗとする。Further, a threshold value for the level of a word-initial consonant that is easily devoiced or a word-initial consonant with low power is set as Pc, and the average value of the two thresholds PN and Pc is set as PM. Also, let Tp be the minimum pause time from when there is an input voice until the next input voice, and let Tw be the minimum utterance time recognized as input voice.

［始端ＳＯの検出］まず、Ａ／Ｄ変換器３より５〜１０ｍ５毎に出力される
パワー信号Ｐが２２２Ｍとなる最初の点を見つける。こ
の点より後に２２２Ｍなる状態がＴｗ時間以」二継続し
ていれば２２２Ｍとなる最初の点を始端Ｓｏとする。Ｔ
ｗ時間未満で終わってしまう場合いはノイズと見なし、
次の２２２Ｍとなる点を見つけ上記と同様の操作を行な
う。[Detection of Start Point SO] First, the first point where the power signal P output from the A/D converter 3 every 5 to 10 m5 becomes 222M is found. After this point, if the state of 222M continues for more than Tw time, the first point that becomes 222M is set as the starting point So. T
If it ends in less than w time, it is considered as noise,
Find the next point 222M and perform the same operation as above.

［終端ＥＯの検出］始端Ｓｏ検出の後にパワー信号ＰがＰ　＜　Ｐ　ｓとな
る最初の点を見つける。この点より後にＰ　＜　Ｐ　Ｍ
である状態がＴｐ時間以」二継続していればＰ　＜　Ｐ
　Ｍとなる最初の点を終端Ｅｏとする。[Detection of end EO] Find the first point where the power signal P satisfies P < P s after the start end So is detected. After this point P < P M
If the state continues after Tp time, then P < P
Let the first point that becomes M be the terminal point Eo.

このようにして入力音声の始端・終端が検出される。In this way, the start and end of the input audio are detected.

発声長測定回路７は始端・終端検出回路６で始端ＳＯが
検出されるとタイマをスタートさせ、かつ終端Ｅｏが検
出されるとタイマをストップさせて発声時間長を算出し
、その値を発声長選別回路８に送る。The utterance length measurement circuit 7 starts a timer when the start edge SO is detected by the start edge/end edge detection circuit 6, and stops the timer when the end edge Eo is detected, calculates the utterance time length, and uses that value as the utterance length. It is sent to the sorting circuit 8.

尚、上述した動作は第３図の制御プログラムを内蔵した
マイクロプロセッサにより実現可能であ１する。Incidentally, the above-mentioned operation can be realized by a microprocessor incorporating the control program shown in FIG.

以下、第３図のフローチャー１・に従い発声長時間検出
制御の詳細を説明する。The details of the utterance duration detection control will be explained below according to the flowchart 1 in FIG.

まずステップＳ１ではタイマｔを°Ｏ“に初期化する。First, in step S1, a timer t is initialized to °O".

そして続くステップＳ２でパワー信号ＰがＰＭ以−Ｌに
なるのを待つ。ステップＳ２の判別で２２２Ｍを満足す
るとステップＳ３に進み、その時点のタイマｔの内容を
始端レジスタＳｏに保存する。そしてステップＳ４及び
Ｓ５で２２２Ｍの状態がＴｗ時間以上継続するのを待つ
。途中で２２２Ｍを満足しないときはステップＳ１に戻
り、それまでの部分はノイズとして扱われる。Then, in the following step S2, it waits for the power signal P to go from PM to -L. If 222M is satisfied in the determination in step S2, the process proceeds to step S3, and the contents of timer t at that time are saved in the start register So. Then, in steps S4 and S5, it waits until the state of 222M continues for Tw time or more. If 222M is not satisfied halfway through, the process returns to step S1, and the portion up to that point is treated as noise.

２２２ＭがＴｗ時間以」−継続するとステップＳ６に進
み、始端レジスタＳｏの内容を確足し、更にＰ　＜　Ｐ
　Ｍになるのを待つ。そしてステップＳ６の判別でＰ＜
ＰＭを満足するとステップＳ７に進み、その時点のタイ
マｔの内容を後端レジスタＥｏに保存する。そして続く
ステップＳ８及びＳ９でＰくＰＭの状態がＴＰ時間以上
継続するのを待つ。途中でＰ＜ＰＭを満足しないときは
ステップＳ６に戻り、それまでのパワー信号Ｐは有効と
し、まだ入力音声が連続しているものとして扱われる。222M is longer than Tw time" - If it continues, the process advances to step S6, the contents of the start register So are ensured, and further P < P
Wait until it becomes M. Then, in the determination in step S6, P<
When PM is satisfied, the process proceeds to step S7, and the contents of timer t at that time are saved in the rear end register Eo. Then, in subsequent steps S8 and S9, it is waited for the state of PM to continue for a period of time TP or more. If P<PM is not satisfied during the process, the process returns to step S6, the power signal P up to that point is valid, and the input audio is treated as being continuous.

またＴＰ時間以上継続するとステップＳＩＯに進み、入
力音声の終端と判別し後端レジスタＥｏの内容は確定す
る。そしてステップＳ１０で時刻ＳｏからＥｏまでの区
間を発声長Ｖ１として確定する。以上の処理により入力
音声の発声長が測定される。If the process continues for more than TP time, the process proceeds to step SIO, where it is determined that the input audio is at the end, and the contents of the trailing end register Eo are finalized. Then, in step S10, the section from time So to Eo is determined as the utterance length V1. Through the above processing, the utterance length of the input voice is measured.

次に標準音声パターン等を記憶するメモリ１０の記憶構
造について説明する。本実施例で採用したメモリ１０の
記憶構成の具体例を表に示す。Next, the storage structure of the memory 10 that stores standard voice patterns and the like will be explained. A specific example of the storage configuration of the memory 10 employed in this embodiment is shown in the table.

メモリ１０は発声時間長側に分類した諸量集団記憶部１
０１〜Ｉｏｎを有している。そして表に示した如く認識
したい語の発声時間長として０．４Ｓ〜３Ｓの発声時間
長を有する語粟を採用し、各諸量集団記憶部１０．〜ｌ
Ｏｎはそれぞれ発声時間長が０．４３より０，２Ｓ増加
する毎に分けて対応する諸量を記憶している。The memory 10 is a collective memory unit 1 for various quantities classified into utterance duration length side.
01 to Ion. Then, as shown in the table, words having a utterance time length of 0.4S to 3S are used as the utterance time length of the word to be recognized, and each quantity group storage unit 10. ~l
On stores the various quantities corresponding to each increase in utterance time length by 0.2S from 0.43.

表標準音声登録の際は第１図のようにスイッチ４．１４の
各接点Ｃが夫々接点４１側及び接点１４１側に接続され
る。そしてマイクロホン１より入力された登録すべき音
声信号は上述と同様の制御で特徴抽出部２、Ａ／Ｄ変換
器３を介してバッファメモリ５にセットされる。それと
共に、Ａ／Ｄ変換器３よりの出力は始端・終端検出回路
６にも送られ、その出力は発声長検出回路７に入力され
る。そして発声長検出回路７で検出された入力音声の発
声長ｖ１が発声長選別回路８に送られると、そこで発声
長別に分類した諸量集団記憶部１０１〜１０ｎの選択信
号に変換される。該選択信号はスイッチ１４の接点１４
１を介して登録用の語檗集団記憶部切替スイッチ９に送
られ、そこで該当する諸量集団記憶部を選択する。こう
して選択された語粟集団記憶部にはバッファメモリ５に
蓄えられている音声の特徴パターン（例えばＳｏからＥ
ｏまでの部分）が標準パターンとして蓄えられる。この
ようにして種々の発声長の音声パターンが発声長別に決
められた語粟集団記憶部に蓄えられる。When registering the table standard voice, each contact C of the switch 4.14 is connected to the contact 41 side and the contact 141 side, respectively, as shown in FIG. The audio signal to be registered inputted from the microphone 1 is set in the buffer memory 5 via the feature extractor 2 and the A/D converter 3 under the same control as described above. At the same time, the output from the A/D converter 3 is also sent to the start/end detection circuit 6, and the output is input to the utterance length detection circuit 7. Then, when the utterance length v1 of the input voice detected by the utterance length detection circuit 7 is sent to the utterance length selection circuit 8, it is converted into a selection signal for the various quantity group storage sections 101 to 10n classified by utterance length. The selection signal is applied to the contact 14 of the switch 14.
1 to the terminology group storage unit changeover switch 9 for registration, and the corresponding quantity group storage unit is selected there. The word aggregation memory unit selected in this way is stored in the voice characteristic pattern (for example, from So to E) stored in the buffer memory 5.
(up to o) is stored as a standard pattern. In this way, voice patterns of various utterance lengths are stored in the word group memory section determined for each utterance length.

このようにして各個人が登録した音声標準パターンはカ
ートライタ１８に送られ、記録カードにストアされる。The voice standard patterns registered by each individual in this way are sent to the cart writer 18 and stored on the recording card.

次回の利用の際には個人の標準音声パターンを記録した
記録カードをカードリータ１９からメモリ１０の各語粟
集団記憶部内に直接ロードすることにより音声標準パタ
ーンの登録の手間を省くことができる。When the device is used next time, the recording card on which the individual's standard voice pattern is recorded is directly loaded from the card reader 19 into each word group storage section of the memory 10, thereby saving the trouble of registering the voice standard pattern.

また音声認識の際は第１図のスイッチ４，１４の各接点
Ｃが夫々接点４２側及び接点１４２側に接続される。従
ってＡ／Ｄ変換器３よりの出力はバッファメモリ１２に
セットされる。また、発声長選別回路８から出力される
選択信号はスイッチ１４の接点１４２を介して認識用の
諸費集団記憶部切替スイッチ１１に送られ、検出発声長
ｖ１に対応する語粟集団記憶部が選択される。次に、選
択された語粱集団記憶部内の標準パターンが１つづつパ
ターンマツチング部１３に送られ、バッファメモリ１２
に格納された入力音声の特徴パターンとの間でパターン
マツチングを行い、両者の類似度が最大の標準パターン
を抽出し、その対応符号を認識結果としてＣＰＵ１５に
出力する。Further, during voice recognition, the contacts C of the switches 4 and 14 in FIG. 1 are connected to the contact 42 side and the contact 142 side, respectively. Therefore, the output from A/D converter 3 is set in buffer memory 12. Further, the selection signal outputted from the utterance length selection circuit 8 is sent to the miscellaneous cost group storage section changeover switch 11 for recognition via the contact 142 of the switch 14, and the word millet group storage section corresponding to the detected utterance length v1 is selected. be done. Next, the selected standard patterns in the word collection storage section are sent one by one to the pattern matching section 13, and the buffer memory 12
Pattern matching is performed between the standard pattern and the characteristic pattern of the input voice stored in , the standard pattern with the maximum similarity between the two is extracted, and its corresponding code is output to the CPU 15 as a recognition result.

次に上述動作を具体例に従って説明する。Next, the above-mentioned operation will be explained according to a specific example.

まず登録の際に、例えばある単語Ａが入力され、その音
声特徴量がバッファメモリ５に蓄えられ、始端・終端検
出回路６及び発声長測定回路７で発声時間長が０．８５
Ｓと算出されたとする。発声長選別回路８はこの時間情
報０．８５３から表に従って語粟集団記憶部１０３を選
択し、八ツファメモ”　　リ５の単語Ａの特徴パターン
が記憶部１０３に登録される。音声認識の際にもこれと
同様の動作でスイッチ１１により記憶部１０３が選択さ
れ、記憶部１０３内の複数標準パターンとバッファメモ
リ１２に蓄えられた単語Ａの特徴パターンとの間で順々
にマツチングが行なわれる。First, at the time of registration, for example, a certain word A is input, its voice feature amount is stored in the buffer memory 5, and the utterance time length is determined to be 0.85 by the start/end detection circuit 6 and the utterance length measurement circuit 7.
Suppose that S is calculated. The utterance length selection circuit 8 selects the word group storage unit 103 based on the time information 0.853 according to the table, and the characteristic pattern of word A in the Yatsufa memory 5 is registered in the storage unit 103. In a similar operation, the storage section 103 is selected by the switch 11, and the plurality of standard patterns in the storage section 103 and the characteristic pattern of word A stored in the buffer memory 12 are matched in order.

ところで、同一単語であっても標準パターン登録時の発
声時間と音声認識時の発声時間が異なると、音声認識時
に希望する諸量集団記憶部が選択されないことがある。By the way, even if the word is the same, if the utterance time at the time of standard pattern registration and the utterance time at the time of voice recognition are different, the desired quantity group storage unit may not be selected at the time of voice recognition.

例えば表でいうと、単語Ｂの登録時の発声長が０．７９
５３であり、認識時の発声長が０．８Ｓであったとする
と、単語Ｂは記憶部１０２に登録され、しかも認識マツ
チングは記憶部１０３中の標準パターンとの間で行なわ
れるから、単語Ｂが認識されなくなる。本実施例ではそ
こで発声長変動による問題を回避するため、認識時間短
縮の真の発声長に所定の変動幅を考慮した発声時間情報
で諸量集団記憶部を選択することとする。例えば単語Ｂ
認識時の真の発声長Ｑ、８Ｓに対し仮に±ｏ、ｏｉｓの
変動幅を考慮して加えると単語Ｂの発声長を０．７Ｈ〜
０．８０１Ｓとできる。この値は記憶部１０２と記憶部
１０ａにまたがるから、まず記憶部１０２の標準パター
ンでマツチングを行ない、次に記憶部１０３の標準パタ
ーンでマツチングを行なうことになる。For example, in the table, the utterance length at the time of registration of word B is 0.79
53, and the utterance length at the time of recognition is 0.8S, word B is registered in the storage unit 102, and recognition matching is performed with the standard pattern in the storage unit 103, so that the word B is will no longer be recognized. In this embodiment, in order to avoid problems caused by variations in utterance length, the various quantity collective storage units are selected using utterance time information that takes into account a predetermined variation range in the true utterance length for shortening the recognition time. For example, word B
If we consider the fluctuation range of ±o and ois and add it to the true utterance length Q, 8S during recognition, the utterance length of word B will be 0.7H~
It can be 0.801S. Since this value spans the storage unit 102 and the storage unit 10a, matching is first performed using the standard pattern in the storage unit 102, and then matching is performed using the standard pattern in the storage unit 103.

一方、単語Ｃにて登録時の発声長が１．０５３であり、
認識時の真の発声長が１．１Ｏ３であるような場合には
±０．０１３の変動値を考慮した認識時の発声長も登録
時と同じ記憶部１０４の語集団内であるからその中での
マツチングを行なえばよい。このように本実施例によれ
ば発声長の変動にも強い音声認識装置が提供できる。On the other hand, the utterance length at the time of registration for word C is 1.053,
In the case where the true utterance length at the time of recognition is 1.1O3, the utterance length at the time of recognition considering the fluctuation value of ±0.013 is also within the same word group in the storage unit 104 as at the time of registration. All you have to do is perform matching. As described above, according to this embodiment, it is possible to provide a speech recognition device that is resistant to variations in utterance length.

こうして木実流側装置により５００語を認識させた結果
を従来方式によるものと比べると、認識処理時間は１０
０〜５００ｍ５短縮され、認識率は２０％以」−改善さ
れ、平均２８０ｍ５の認識処理時間と９８．５％の認識
率を得た。Comparing the results of recognizing 500 words using the Kinomi stream side device with those using the conventional method, the recognition processing time is 10%.
The recognition rate was improved by more than 20%, with an average recognition processing time of 280 m5 and a recognition rate of 98.5%.

本実施例では実験室内の暗雑音でＰＮを足めたが音声認
識装置の用途に応じた任意の雑音環境下でＰＮの値を自
由に可変可能である。また語索集団記憶部の分類個数、
各記憶部の容量、発声時間長幅、認識時に考慮する発声
時間長の変動幅等も用途に応じ、常に最良の認識結果が
得られるよう自由に可変できる。In this embodiment, the PN was added using background noise in the laboratory, but the value of PN can be freely varied in any noise environment depending on the purpose of the speech recognition device. Also, the number of classifications in the word search group memory,
The capacity of each storage section, the length of utterance time, the variation range of utterance time taken into account during recognition, etc. can be freely varied depending on the application so as to always obtain the best recognition results.

また、本実施例をタイプライタに応用することにより、
高速、高信頼性の音声タイプライタを構成できる。Furthermore, by applying this embodiment to a typewriter,
A high-speed, highly reliable voice typewriter can be constructed.

以上の説明において、記録カードとして磁気記録カード
を用い、カードライタ１８、カードライタ１９として磁
気カードライタ、磁気カードリーダを用いる例を説明し
たが、バックアップ用電源（電池）等を内蔵した半導体
メモリ（ＲＡＭ）パックを用い、メモリ１０の登録標準
音声パターン情報をこれに記憶させてもよい。このよう
にすることにより、読み出し、書き込み時間のほとんど
かからない、小型のものとすることができる。In the above explanation, an example has been described in which a magnetic recording card is used as the recording card, and a magnetic card writer and a magnetic card reader are used as the card writer 18 and the card writer 19. A RAM) pack may be used to store the registered standard voice pattern information in the memory 10. By doing so, it is possible to make the device compact and require almost no time for reading and writing.

また、大容量の記憶のできる磁気バブルカードを用いて
も、また光カードを用いてもよいことは勿論である。Furthermore, it goes without saying that a magnetic bubble card capable of storing a large amount of memory may be used, or an optical card may be used.

以上述べた如く本実施例によれば、音声特徴量に発声時
間情報を付加することにより音声認識処理の認識時間短
縮を実現した音声認識装置とすることができる。即ち、
音声認識時のパターンマツチング候補が発声時間情報に
よって小グループに絞られるため、全体として登録語量
が大きい場合においても認識処理時間が短くなり、また
、発声時間情報は同時に音声認識のための重要な情報に
ほかならす、これを認識処理に使用することにより認識
率向上にも効果がある。As described above, according to the present embodiment, it is possible to provide a speech recognition device that achieves shortening of the recognition time of speech recognition processing by adding utterance time information to the speech feature amount. That is,
Since pattern matching candidates during speech recognition are narrowed down to small groups based on utterance time information, the recognition processing time is shortened even when the total number of registered words is large.In addition, utterance time information is also important for speech recognition. By using this information in recognition processing, it is effective to improve the recognition rate.

また、標準音声パターンを記録カード等に記憶保存する
ことにより、フロッピーディスク等にこれらを記憶する
場合に比べ、小型かつ簡単であり、保守管理が容易とな
る。従って、手軽に音声標準パターンをセーブすること
ができ、各個人毎に音声標準パターンをセーブしておく
ことにより、多くの人が一台の認識装置を利用すること
ができる。又、音声標準パターンの読み出しも、容易か
つ高速で行なうことができる。Furthermore, by storing the standard voice patterns on a recording card or the like, it is smaller and simpler, and easier to maintain and manage than when storing them on a floppy disk or the like. Therefore, the standard speech patterns can be easily saved, and by saving the standard speech patterns for each individual, many people can use one recognition device. Furthermore, reading out the audio standard pattern can be performed easily and at high speed.

このように本実施例によれば、取り扱いの容易な、かつ
認識率の高い装置とすることができ、木実流側装置を広
く民生用に応用することにより特に大きな効果を上げる
事ができる。As described above, according to this embodiment, it is possible to obtain a device that is easy to handle and has a high recognition rate, and it is possible to achieve particularly great effects by widely applying the wood flow side device to consumer use.

［発明の効果］以上説明した様に本発明によれば、入力音声の発声時間
長情報に従い音声認識を行なうことにより、短時間で、
かつ、認識率の高い音声認識を行なることができる。[Effects of the Invention] As explained above, according to the present invention, by performing speech recognition according to the utterance time length information of input speech, in a short time,
In addition, speech recognition with a high recognition rate can be performed.

また音声標準パターン情報を磁気カードやＩＣカード等
に記録し、読出し可能としたことにより、保守管理が容
易でかつ小型の音声タイプライタ、ワープロ等の音声認
識装置が提供できる。Furthermore, by recording the voice standard pattern information on a magnetic card, an IC card, etc. and making it readable, it is possible to provide a voice recognition device such as a voice typewriter, word processor, etc. that is easy to maintain and manage and is small.

[Brief explanation of the drawing]

第１図は本発明に係る一実施例の音声認識装置のブロッ
ク構成図、第２図は本実施例の入力音声のパワー情報Ｐを時間軸に
表した図、第３図は本実施例の発声長測定回路を示すフローチャー
トである。ここで、１・・・マイクロホン、２・・・特徴抽出部、
３・・・Ａ／Ｄ変換器、４・・・登録／認識切替スイッ
チ手段、５・・・登録用バッファメモリ、６・・・始端
・終端検出回路、７・・・発声長測定回路、８・・・発
声長選別回路、９・・・登録用語粟集団切替スイッチ手
段、１０・・・メモリ、１０１〜ｌＯｎ川語学集団記憶
部、１１・・・認識用語全集団切替スイッチ手段、１２
・・・認識用バッファメモリ、１３・・・パターンマツ
チング部、１４・・・登録／認識切替スイッチ手段、１
５・・・ＣＰＵ、１６・・・キーボード、１７・・・表
示部、１８・・・カードライタ、１９・・・カードリー
グである。Fig. 1 is a block diagram of a speech recognition device according to an embodiment of the present invention, Fig. 2 is a diagram showing the power information P of input speech of this embodiment on a time axis, and Fig. 3 is a diagram of a speech recognition device according to an embodiment of the present invention. 3 is a flowchart showing a utterance length measurement circuit. Here, 1...microphone, 2...feature extraction unit,
3... A/D converter, 4... Registration/recognition changeover switch means, 5... Registration buffer memory, 6... Start/end detection circuit, 7... Utterance length measurement circuit, 8 . . . Utterance length selection circuit, 9 . . . Registered term millet group changeover switch means, 10 .
... Recognition buffer memory, 13... Pattern matching unit, 14... Registration/recognition changeover switch means, 1
5...CPU, 16...Keyboard, 17...Display section, 18...Card writer, 19...Card league.

Claims

[Claims]

(1) A voice pattern storage device that stores a plurality of standard voice patterns grouped according to utterance length, a voice input device that inputs voice information, and detects the utterance length of the voice input from the voice input device. utterance length detection means; voice pattern reading means for reading out a corresponding standard voice pattern from the voice pattern storage means according to the utterance length detected by the utterance length detection means; and a voice standard pattern read out by the voice pattern reading means; 1. A speech recognition device comprising: speech recognition means for sequentially comparing input speech patterns and recognizing speech.

(2) The voice pattern reading means determines a corresponding group of voice patterns to be read based on the utterance length detected by the utterance length detection means and the value obtained by adding a predetermined fluctuation range to the utterance length,
The speech recognition device according to claim 1, characterized in that the speech standard pattern for each group is read out.

(3) The speech recognition device according to claim 1 or 2, characterized in that a magnetic card is used as the speech pattern storage means.

(4) The speech recognition device according to claim 1 or 2, characterized in that an IC card is used as the speech pattern storage means.

(5) The speech recognition device according to claim 1 or 2, characterized in that an optical card is used as the speech pattern storage means.