JPS61175700A

JPS61175700A - Voice recognition equipment

Info

Publication number: JPS61175700A
Application number: JP60015435A
Authority: JP
Inventors: 宮芝　晃一
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1985-01-31
Filing date: 1985-01-31
Publication date: 1986-08-07

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［技術分野］本発明は音声認識装置に関し、特に音声登録型認識装置
の音声マツチング処理時間を短縮させた音声認識装置に
関するものである。DETAILED DESCRIPTION OF THE INVENTION [Technical Field] The present invention relates to a speech recognition device, and more particularly to a speech recognition device that reduces the speech matching processing time of a speech registration type recognition device.

［従来技術］従来のこの種の音声認識装置は、入力音声と予め登録し
ておいた全標準音声パターンを順々に重ね合わせ、その
都度両者間の距離を計算して距離が最小の標準パターン
を抽出し、これをもって音声認識結果としていた。従っ
て登録語全数が増大するにつれマツチング処理時間が増
大し、かつ認ｍ率低下の現象が顕著となった。[Prior art] This type of conventional speech recognition device sequentially superimposes input speech and all standard speech patterns registered in advance, calculates the distance between them each time, and selects the standard pattern with the smallest distance. was extracted and used as the speech recognition result. Therefore, as the total number of registered words increased, the matching processing time increased, and the phenomenon of a decrease in the recognition rate became noticeable.

これを避けるためには、例えば音声標準語を単語、数字
、単音節等の概念別に分類して登録し、認識時にはその
諸量集団が格納されている記憶部を選択できるようにし
てその中で厳密なマツチング処理を行なう方法が有効と
されている。In order to avoid this, for example, standard phonetic words can be classified and registered according to concepts such as words, numbers, monosyllables, etc., and when recognition is performed, it is possible to select the memory section in which the various quantity groups are stored. A method that performs strict matching processing is considered effective.

この諸量集団記憶部の選択・変更をする方法にはキー操
作によるもの又は音声によるもの等がある。キー操作に
よるものは諸量集団記憶部の選択・変更が確実に行なえ
るが、キー人力と音声入力を同時にしなくてはならない
から操作が複雑となり使用者の負担が大きい。There are two ways to select and change the various quantity collective storage units, such as by key operations and by voice. Although the selection and change of various quantity collective storage parts can be carried out reliably by key operation, the operation is complicated and the burden on the user is heavy because key input and voice input must be performed at the same time.

音声によるものは、本来の登録音声パターン群の他にそ
れら登録音声パターン群の記憶部を選択・変更するため
のコマンドが必要であり、このために各語前集団記憶部
を代表する名称の音声パターンから成る変更用語全集団
の記憶部を別設しなければならない、つまり、本来の音
声パターンがその語の特徴によりいくつかの語全集団記
憶部に分けら、れ、各々には変更用コマンドとして、例
えばｒヘンコラ」を組み込む０語蘭集団記憶部の選択・
変更の際には、まず「ヘンコラ」を音声入力するとそれ
がその時点で選択されていた諸量集団記憶部の中で認識
され、これにより変更用語全集団の記憶部が選択される
０次に選択・変更を希望する語全集団記憶部を代表する
名称を音声入力することにより選択・変更が可能になる
というものである。しかしながら、この方法では選択・
変更のために二度の音声入力が必要であるため手間がか
かる。また前述した変更用コマンドの音声パターンを各
語粟集団記憶部毎に登録するため、同一語でありながら
発声音量レベルや発声時間長の異なる音声パターンが登
録されることになる。それ放間−の選択争変更を行なう
場合でも認識のされ方が異なるため最悪の場合はあるｗ
１粟集団記憶部から変更用語全集団の記憶部が選択でき
なくなる場合もあった。In addition to the original registered voice pattern group, the voice method requires a command to select and change the memory of the registered voice pattern group, and for this purpose, a voice with a name representing each preword collective memory is required. A separate storage unit for the entire group of changed words consisting of patterns must be provided; that is, the original speech pattern is divided into several memory units for the entire group of words depending on the characteristics of the word, and each storage unit has a command for changing the word. As, for example, the selection of the 0-word orchid collective memory that incorporates "rhencora"
When changing, first input "Henkora" by voice, and it will be recognized in the various quantity group memory that was selected at that time, and this will select the memory of the entire group of changed terms. Selection/change can be made by vocally inputting a name representative of the entire group memory of the word desired to be selected/changed. However, this method
It is time-consuming because it requires voice input twice to make changes. Furthermore, since the voice pattern of the above-mentioned change command is registered for each word group storage unit, voice patterns of the same word but with different utterance volume levels and utterance durations are registered. Even if you make a change to the selection contest, the way it is perceived will be different, so there is a worst case scenario lol
In some cases, it became impossible to select the entire group of changed terms from the 1-millet group memory.

［目的］本発明は上述した従来技術の欠点に鑑みて成されたもの
であって、その目的とする所は、入力音声の発声時間情
報を音声認識手法に取り入れることにより、音声認識の
ためのマツチング処理時間を短縮させかつ高認識率が得
られる音声認識装置を提供することにある。[Objective] The present invention has been made in view of the above-mentioned drawbacks of the prior art, and its purpose is to improve speech recognition by incorporating utterance time information of input speech into a speech recognition method. An object of the present invention is to provide a speech recognition device capable of shortening matching processing time and obtaining a high recognition rate.

［実施例］以下、添付図面を参照して本発明の実施例を詳細に説明
する。[Embodiments] Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

第１図は本発明の実施例の音声認識装置のブロック構成
図である０図において、１は音声を電気信号に変換する
マイクロホン、２は周波数範囲２００〜６０００Ｈｚを
８〜３０ｃｈに分けた帯域通過フィルタ群から成り、パ
ワー信号やホルマント信号等の特徴量抽出を行う特徴抽
出部。３は抽出された特徴蓋を５〜１０ｍ５毎に標本化
し、量子化するＡ／Ｄ変換器、４及び１４は標準音声登
録の際と入力音声認識の際とで信号経路を切替える登録
／認識切替スイッチ手段、５及び１２は登録又は認識の
際に入力音声の発声時間長が算出されるまでの間入力音
声特微量を蓄えておくバッファメモリ、６は入力音声の
パワー信号から語脅の始端・終端に相当する点を検出す
る始端・終端検出回路、７は始端から終端までの時間を
測定する発声長測定回路、８は検出測定した発声時間長
に応じて語盆集団記憶部１０１〜ｔｏｎの選択信号を発
生する発声長選別回路、ｌＯは藷全集団記憶部１０□〜
Ｉｏｎを有するメモリ、９は音声登録の際に語変集団記
憶部１０．〜１０ｎを切替えるスイッチ、１１は音声認
識の際に諸量集団記憶部１０１〜１０ｎを切替えるスイ
ッチ、１３は音声認識の際に、入力音声パターンと語檗
集団記憶部１０１〜Ｉｏｎを切替えて読み出した登録音
声パターンを比較するパターンマツチング部、１５は認
識結果を処理する汎用のセントラルブロセッシングユニ
ツ）　（ＣＰＵ）、ｌ　６は操作キーボード、１７は認
識結果等を表示する表示部である。Fig. 1 is a block diagram of a speech recognition device according to an embodiment of the present invention. In Fig. 0, 1 is a microphone that converts speech into an electrical signal, and 2 is a band pass that divides the frequency range of 200 to 6000 Hz into 8 to 30 channels. The feature extraction unit consists of a group of filters and extracts features such as power signals and formant signals. 3 is an A/D converter that samples and quantizes the extracted feature cover every 5 to 10 m5, and 4 and 14 are registration/recognition switches that switch the signal path between standard voice registration and input voice recognition. Switch means 5 and 12 are buffer memories for storing input voice features until the utterance time length of the input voice is calculated during registration or recognition; 6 is a buffer memory for storing input voice features from the power signal of the input voice; A start/end detection circuit detects a point corresponding to the end; 7 is a utterance length measurement circuit that measures the time from the start to the end; 8 is a utterance length measuring circuit that detects and measures the utterance duration; The utterance length selection circuit that generates the selection signal, lO, is the collective memory unit 10□~
ion memory, 9 is a word variation group storage unit 10. - 10n, 11 is a switch that switches various quantity group storage units 101 to 10n during speech recognition, and 13 is a switch that switches and reads input speech patterns and vocabulary group storage units 101 to 10n during speech recognition. 15 is a general-purpose central processing unit (CPU) that processes recognition results; 16 is an operation keyboard; and 17 is a display unit that displays recognition results and the like.

以下、実施例の動作を詳細に説明する。The operation of the embodiment will be described in detail below.

まず、入力音声の発声時間長は入力音声の始端・終端の
時間差で求められる。音声の始端・終端検出については
種々の方法が考えられるが、本実施例ではＡ／Ｄ変換後
のパワー（電力）情報Ｐを用いる。First, the utterance time length of the input voice is determined by the time difference between the start and end of the input voice. Various methods can be considered for detecting the start and end of audio, but in this embodiment, power information P after A/D conversion is used.

第２図は５〜１０ｍ５毎に出力される入力音声のパワー
情報Ｐを時間軸に表した図である。因において、まず入
力音声に混入する暗雑音を除去するため予め実験室内で
の雑音電力の平均値を計算しておきこれを閾値ｐＮとす
る。更に無声化し易い語頭子音や電力の小さい語頭子音
レベルの閾値をＰｃとして、２つの閾値ＰＰＪとｐｃの
平均値をＰＭとする。また入力音声があってから次の入
力音声があるまでのポーズ時間の最小値をＴＰとし、か
つ入力音声として認める最小発声時間ヲＴＷとする。FIG. 2 is a diagram showing power information P of the input voice output every 5 to 10 m5 on a time axis. First, in order to remove background noise mixed into the input voice, the average value of the noise power in the laboratory is calculated in advance, and this is set as the threshold value pN. Further, a threshold value for the level of a word-initial consonant that is easily devoiced or a word-initial consonant with low power is set as Pc, and the average value of the two thresholds PPJ and pc is set as PM. Also, let TP be the minimum pause time from when there is an input voice until the next input voice, and let TW be the minimum utterance time recognized as input voice.

［始端ＳＯの検出］５〜１０ｍＳ毎に出力されるパワー信号ＰがＰ≧ＰＭと
なる最初の点を見つける。この点より後にＰ≧ＰＭなる
状態がＴｗ以上継続していればＰ≧ＰＭとなる最初の点
を始端ＳＯとする。Ｔｗ未満で終わってしまう場合には
ノイズと見なし、次のＰ≧ＰＭとなる点を見つけ上記と
同様の操作を行なう。[Detection of starting point SO] Find the first point where the power signal P output every 5 to 10 mS satisfies P≧PM. If the state where P≧PM continues for Tw or more after this point, the first point where P≧PM becomes the starting point SO. If it ends up being less than Tw, it is regarded as noise, the next point where P≧PM is found, and the same operation as above is performed.

［終端Ｅｏの検出］始端ＳＯ検出の後にパワー信号ＰがＰ　＜　Ｐ　Ｍとな
る最初の点を見つける。この点より後にＰ＜ＰＭである
状態が時間１２以上継続していればＰくＰＭとなる最初
の点を終端Ｅｏとする。[Detection of end point Eo] After detecting the start point SO, find the first point where the power signal P satisfies P < PM. If the state of P<PM continues for 12 hours or more after this point, the first point at which P becomes PM is set as the end point Eo.

このようにして入力音声の始端φ終端が検出され、発声
長測定回路７は始端・終端検出回路６で始端Ｓｏが検出
されるとタイマをスタートさせ、かつ終端Ｅｏが検、出
されるとタイマをストップさせて発声時間長を算出し、
その値を発声長選別回路８に送る。In this way, the start end φ end of the input voice is detected, and the utterance length measuring circuit 7 starts the timer when the start end So is detected by the start end/end end detection circuit 6, and starts the timer when the end end Eo is detected and output. Stop and calculate the duration of vocalization,
The value is sent to the utterance length selection circuit 8.

尚、上述した動作は第３図の制御プログラム寺内蔵した
マイクロプロセッサにより実現可能である。ステップＳ
１ではタイマｔを０に初期化する。ステップＳ２ではパ
ワー信号ＰがＰＭ以上になるのを待つ。ステップＳ２の
判別でＰ≧ＰＭを満足するとステップＳ３に進みその時
点のタイマｔの内容を始端レジスタＳｏに保存する。ス
テップＳ４及びＳ５ではＰ≧ＰＭの状態が時間Ｔｗ以上
継続するのを待つ。途中でＰ≧ＰＭを満足しないときは
ステップＳ１に戻りそれまでの部分はノイズとして扱わ
れる。また７ｗ以上継続するとステップＳ６に進み始端
レジスタＳ０の内容は確定する。ステップＳ６ではＰ＜
ＰＭになるのを待つ、ステップＳ６の判別でＰ＜ＰＭを
満足するとステップＳ７に進みその時点のタイマｔの内
容を後端レジスタＥＯに保存する。ステップＳ８及びＳ
ＯではＰ＜ＰＭの状態が時間１２以上継続するのを待つ
、途中でＰ　＜　Ｐ　ｙを満足しないときはステップＳ
６に戻りそれまでのパワー信号Ｐは有効として扱われる
。また１２以上継続するとステップＳｌＯに進み後端レ
ジスタＥ０の内容は確定する。ステップＳＩＯでは時刻
ＳｏからＥｏまでの区間を発声長ｖ１とする。Incidentally, the above-mentioned operation can be realized by a microprocessor having a built-in control program shown in FIG. Step S
1 initializes the timer t to 0. In step S2, the process waits until the power signal P becomes equal to or higher than PM. If P≧PM is satisfied as determined in step S2, the process advances to step S3 and the contents of timer t at that time are saved in the start register So. In steps S4 and S5, the process waits until the state of P≧PM continues for a time Tw or more. If P≧PM is not satisfied during the process, the process returns to step S1 and the portion up to that point is treated as noise. If the process continues for 7W or more, the process advances to step S6 and the contents of the start end register S0 are finalized. In step S6, P<
If it is determined in step S6 that P<PM is satisfied, the process advances to step S7, and the contents of timer t at that time are saved in the rear end register EO. Steps S8 and S
In O, wait until the state of P < PM continues for 12 or more times, and if P < P y is not satisfied on the way, proceed to step S.
6, and the power signal P up to that point is treated as valid. If the process continues for 12 or more times, the process advances to step SlO and the contents of the rear end register E0 are finalized. In step SIO, the interval from time So to Eo is set as utterance length v1.

次にメモリ１０の記憶構造について述べる０表１には本
実施例で採用した記憶構成を示す、装置は発声時間長側
に分類した諸量集団記憶部１０１〜Ｉｏｎを有している
。この場合に、認識したい語としてＣ１４５〜３Ｓの発
声時間長を有する諸量を採用し、各諸量集団の記憶部は
０．２Ｓの時間間隔で並列している。Next, the storage structure of the memory 10 will be described. Table 1 shows the storage structure adopted in this embodiment. The apparatus has various quantity group storage sections 101 to 101 classified on the utterance time length side. In this case, quantities having utterance time lengths of C145 to 3S are used as words to be recognized, and the storage units for each quantity group are arranged in parallel at time intervals of 0.2S.

以下余白表　　１標準音声登録の際は第１図のようにスイッチ４．１４の
各接点Ｃが夫々接点４１側及び接点１４１側に接続され
る。入力音声の発声長Ｖｌが発声長選別回路８に送られ
ると、そこで発声長別に分類した語ＩＪ＃集団記憶部１
０１〜Ｉｏｎの選択信号に変換される。該選択信号はス
イッチ１４の接点１４１を介して登録用の語実集団記憶
部切替スイッチ９に送られ、そこで該当する諸量集団記
憶部を選択する。こうして選択された語索果団記憶部に
はバッファメモリ５に蓄えられている音声の特徴パター
ン（例えばＳＯからＥｏまでの部分）が標準パターンと
して蓄えられる。このようにして種々の発声長の音声パ
ターンが発声長別に決められた語粂集団記憶部に蓄えら
れる。Margin table below 1 When registering the standard voice, each contact C of the switch 4.14 is connected to the contact 41 side and the contact 141 side, respectively, as shown in FIG. When the utterance length Vl of the input voice is sent to the utterance length selection circuit 8, the words IJ# group storage unit 1 are classified according to the utterance length.
It is converted into a selection signal of 01 to Ion. The selection signal is sent via the contact 141 of the switch 14 to the word-real-word-actual-collection-storage unit changeover switch 9 for registration, where the corresponding quantity group storage unit is selected. The characteristic pattern of the voice stored in the buffer memory 5 (for example, the portion from SO to Eo) is stored as a standard pattern in the word search group storage unit selected in this way. In this way, speech patterns of various utterance lengths are stored in word cluster storage units determined for each utterance length.

また音声認識の際は第１図のスイッチ４．１４の各接点
Ｃが夫々接点４２側及び接点１４２側に接続される。従
って発声長選別回路８から出力される選択信号はスイッ
チ１４の接点１４２を介して認識用の語粟集団記憶部切
替スイッチ１１に送られ、検出発声長Ｖｌに対応する語
粂集団記憶部が選択される０次に、選択された語粂集団
記憶部内の標準パターンが１つづつパターンマッチング
部１３に送られ、バッファメモリ１２の入力音声の特徴
パターンとの間でパターンマツチングを行い、両者の類
似度が最大の標準パターンを抽出し、その対応符号を認
識結果としてＣＰＵ１５に出力する。Further, during speech recognition, each contact C of the switch 4.14 in FIG. 1 is connected to the contact 42 side and the contact 142 side, respectively. Therefore, the selection signal outputted from the utterance length selection circuit 8 is sent to the recognition word group storage unit changeover switch 11 via the contact 142 of the switch 14, and the word group memory unit corresponding to the detected utterance length Vl is selected. Next, the selected standard patterns in the word collection storage section are sent one by one to the pattern matching section 13, where they are pattern matched with the characteristic patterns of the input speech in the buffer memory 12, and the two are matched. The standard pattern with the highest degree of similarity is extracted, and its corresponding code is output to the CPU 15 as a recognition result.

次に上述動作を具体例に従って説明する。まず登録の際
に、例えばある単語Ａが入力され、その音声特徴量がバ
ッファメモリ５に蓄えられ、始端・終端検出回路６及び
発声長測定回路７で発生時間長が０．８５５と算出され
たとする０発声長選別回８はこの時間情報０．８５３か
ら表１に従って語実集団記憶部１０３を選択すし、バッ
ファメモリ５の単語Ａの特徴パターンが記憶部１０３に
登録される。音声認識の際にもこれと同様の動作でスイ
ッチ１１により記憶部１０３が選択され、記憶部１０３
内の複数標準パターンとバッファメモリ１２に蓄えられ
た単語Ａの特徴パターンとの間で順々にマツチングが行
なわれる。Next, the above-mentioned operation will be explained according to a specific example. First, at the time of registration, for example, a certain word A is input, its voice feature amount is stored in the buffer memory 5, and the occurrence time length is calculated as 0.855 by the start/end detection circuit 6 and the utterance length measurement circuit 7. In the zero utterance length selection cycle 8, the word fact group storage unit 103 is selected from this time information 0.853 according to Table 1, and the feature pattern of word A in the buffer memory 5 is registered in the storage unit 103. During voice recognition, the storage section 103 is selected by the switch 11 in a similar operation, and the storage section 103 is selected by the switch 11.
Matching is sequentially performed between the plurality of standard patterns in the buffer memory 12 and the characteristic patterns of the word A stored in the buffer memory 12.

ところで、同一単語であっても標準パターン登録時の発
声時間と音声認識時の発声時間が異ると音声認識時に希
望する諸費集団記憶部が選択されないことがある０例え
ば表１でいうと、単語Ｂの登録時の発声長が０．７９５
５であり、認識時の発声長が０．８５であったとすると
、単語Ｂは記憶部１０２に登録され、しかも認識マツチ
ングは記憶部１０ａ中の標準パターンとの間で行なわれ
るから、単語Ｂが認識されなくなる。そこで発声長変動
による問題を回避するため、認識時入力語の真の発、戸
長に所定の変動幅を考慮した発声時間情報で語檗集団記
憶部を選択することとする０例えば単語ＢＷ１１に時の
真の発声長０．８５に対し仮に±０．０１５の変動幅を
考慮して加えると単語Ｂの発声長を０．７９９〜０．８
０１５とできる。この個は記憶部１０２と記憶部１０３
にまたがるから、まず記憶部１０２の標準パターンでマ
ツチングを行ない、次に記憶部１０３の標準パターンで
マツチングを行なうことになる。一方、単語Ｃにいて登
録時の発声長が１．０５９であり、認識時の真の発声長
が１．１Ｏ３であるような場合には±ｏ、ｏｉｓの変動
値を考慮した認識時の発声長も登録時と同じ記憶部１０
４の語集団内であるからその中でのマツチングを行なえ
ばよい、このように発声長の変動にも強い音声認識装置
が提供できる。こうして本実施例装置により５００語を
認識させた結果を従来方式によるものと比べると、認識
処理時間は１００〜５００ｍ５短縮され、認識率は２０
％以上俄善され、平均２８０ｍ５の認識処理時間と９８
．５％の認識率を得た。By the way, even if the word is the same, if the utterance time at the time of standard pattern registration and the utterance time at the time of voice recognition are different, the desired expense group storage unit may not be selected at the time of voice recognition.For example, in Table 1, the word B's utterance length at registration is 0.795
5, and the utterance length at the time of recognition is 0.85, word B is registered in the storage unit 102, and recognition matching is performed between it and the standard pattern in the storage unit 10a. will no longer be recognized. Therefore, in order to avoid problems caused by variations in utterance length, the utterance group memory section is selected using utterance time information that takes into account the true utterance of the input word during recognition and a predetermined variation range for the word length. If we consider the fluctuation range of ±0.015 and add it to the true utterance length of 0.85, the utterance length of word B becomes 0.799 to 0.8.
015. These units are storage unit 102 and storage unit 103.
Therefore, matching is first performed using the standard pattern in the storage unit 102, and then matching is performed using the standard pattern in the storage unit 103. On the other hand, if the utterance length at the time of registration is 1.059 for word C and the true utterance length at the time of recognition is 1.1O3, the utterance at the time of recognition takes into account the fluctuation values of ±o and ois. The length is also the same storage unit 10 as at the time of registration.
Since this is within the word group No. 4, matching can be performed within that group. In this way, it is possible to provide a speech recognition device that is resistant to variations in utterance length. Comparing the results of recognizing 500 words using the device of this embodiment with the conventional method, the recognition processing time was shortened by 100 to 500 m5, and the recognition rate was 20 m5.
% or more, with an average recognition processing time of 280 m5 and 98
．． A recognition rate of 5% was obtained.

尚、本実施例では実験室内の暗雑音でＰＲを定めたが音
声認識装置の用途に応じた任意の雑音環境下でＰＭの値
を自由に可変可能である。In this embodiment, the PR was determined using background noise in a laboratory, but the value of PM can be freely varied under any noise environment depending on the purpose of the speech recognition device.

また諸量集団記憶部の分類個数、各記憶部の容量、発声
時間長幅、認識時に考慮する発声時間長の変動幅等も用
途に応じ、常に最良の認識結果が得られるよう自由に可
変できる。また、本発明を応用することにより、高速、
高信頼性の音声タイプライタを構成できる。In addition, the number of classifications in the collective storage section, the capacity of each storage section, the duration of utterances, the range of variation in the length of utterances taken into account during recognition, etc. can be freely varied depending on the application so as to always obtain the best recognition results. . In addition, by applying the present invention, high speed,
A highly reliable voice typewriter can be constructed.

［効果］以上述べた如く本発明によれば、音声特徴量に発声時間
情報を付加することにより音声認識処理の時間短縮を可
能にした。即ち、音声認識時のパターンマツチング候補
が発声時間情報によって小グループに絞られるから、全
体として登録諸費が大きい場合でも認識処理時間が短い
、また１発声時間情報は同時に音声認識のための重要な
情報にほかならないから、これを認識処理に使用してい
ることにより認ｍ率向上にも効果がある。[Effects] As described above, according to the present invention, it is possible to shorten the time for speech recognition processing by adding utterance time information to the speech feature amount. In other words, pattern matching candidates during speech recognition are narrowed down to small groups based on utterance time information, so even if the overall registration costs are large, the recognition processing time is short, and one utterance time information is also important for speech recognition. Since this is nothing but information, using this for recognition processing is also effective in improving the recognition rate.

[Brief explanation of the drawing]

第１図は本発明の実施例の音声認識装置のブロック構成
図、第２図は入力音声のパワー情報Ｐを時間軸に表した図、第３図は実施例の発声長測定処理を示すフローチャート
である。ここで、１・・・マイクロホン、２・・・特徴抽出部、
３・・・Ａ／Ｄ変換器、４・・・登録／認識切替スイッ
チ手段、５・・・登録用バッファメモリ、６・・・始端
・終端検出回路、７・・・発声長測定回路、８・・・発
声長選別回路、９・・・登録用諸費集団切替スイッチ手
段、１０・・・メモリ、１０１〜Ｉｏｎ・・・諸量集団
記憶部、１１・・・認識用諸量集団切替スイッチ手段、
１２・・・認識用バッファメモリ、１３・・・パターン
マツチング部、１４・・・登録／認識切替スイッチ手段
、１５・・・ＣＰＵ、１６・・・キーボード、１７・・
・表示部である。特許出願人　　キャノン株式会社第２図 −Ｍｌ’１第３図FIG. 1 is a block configuration diagram of a speech recognition device according to an embodiment of the present invention. FIG. 2 is a diagram showing input speech power information P on a time axis. FIG. 3 is a flowchart showing utterance length measurement processing according to an embodiment. It is. Here, 1...microphone, 2...feature extraction unit,
3... A/D converter, 4... Registration/recognition changeover switch means, 5... Registration buffer memory, 6... Start/end detection circuit, 7... Utterance length measurement circuit, 8 . . . Utterance length selection circuit, 9 . . . Miscellaneous expense group changeover switch means for registration, 10 . ,
DESCRIPTION OF SYMBOLS 12... Recognition buffer memory, 13... Pattern matching section, 14... Registration/recognition changeover switch means, 15... CPU, 16... Keyboard, 17...
・It is a display section. Patent applicant: Canon Co., Ltd. Figure 2 - Ml'1 Figure 3

Claims

[Claims]

(1) utterance length detection means for detecting the utterance length of input speech;
Registered voice storage means for storing voice standard patterns grouped according to utterance length, and registered voice reading means for reading out voice standard patterns of corresponding groups from the registered voice storage means based on the utterance length output from the utterance length detecting means. A speech recognition device comprising: a speech recognition device which sequentially compares the speech standard pattern read by the registered speech reading device and the input speech pattern to perform speech recognition.

(2) The registered speech reading means determines the corresponding group based on the utterance length detected by the utterance length detecting means and the value obtained by adding a predetermined fluctuation range to the utterance length.
Speech recognition device described in section.