JPS6118200B2

JPS6118200B2 -

Info

Publication number: JPS6118200B2
Application number: JP51099114A
Authority: JP
Inventors: Hiroaki Sekoe
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1976-08-18
Filing date: 1976-08-18
Publication date: 1986-05-10
Also published as: JPS5324205A

Description

【発明の詳細な説明】本発明は音声認識装置の改良に関するものであ
る。数字や命令語などの音声を認識する装置であ
る音声認識装置は計算機システムへのデータ入力
手段として、あるいは各種産業機械の制御手段と
して有用であり、これを利用することによつて達
成される省力化、高能率化の効果は極めて高い。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to improvements in speech recognition devices. Speech recognition devices, which are devices that recognize voices such as numbers and command words, are useful as a means of inputting data into computer systems or as a means of controlling various industrial machines, and the labor savings achieved by using them. The effects of increasing efficiency and efficiency are extremely high.

一方、音声認識の研究は近年になつて大幅に進
歩し、一部では商品化の動きも見られるが、得ら
れる認識性能は未だ充分であるとは言えない。音
声認識装置の動作原理としては、パターンマツチ
ング法が採用される場合が多い。この方法では、
認識されるべき語彙の各単語にあらかじめ標準と
なるパターン（以下標準パターンと称する）を設
定しておき、未知音声（以下入力パターンと称す
る）が入力されると、上記の各パターンと比較し
て最も類似した標準パターンを決定することによ
つて認識が実行される。このパターンマツチング
法は原理が簡単でありまた、標準パターンを変更
することによつて語彙の変化や発声者の交替に対
して容易に追従できるという特徴をもつている
が、一方でパターンの変動に対して不安定である
という欠点を有している。音声パターンの変動要
因の大きなものの一つとして発声速度の変動があ
る。 On the other hand, research on speech recognition has made significant progress in recent years, and although there are some moves towards commercialization, the recognition performance obtained is still not sufficient. As the operating principle of speech recognition devices, a pattern matching method is often adopted. in this way,
A standard pattern (hereinafter referred to as standard pattern) is set in advance for each word in the vocabulary to be recognized, and when unknown speech (hereinafter referred to as input pattern) is input, it is compared with each of the above patterns. Recognition is performed by determining the most similar standard pattern. This pattern matching method has a simple principle and can easily follow changes in vocabulary and changes in speaker by changing the standard pattern. It has the disadvantage of being unstable. One of the major causes of variation in speech patterns is variation in speaking speed.

これは同一人が同一単語を発声する場合でも、
その度毎に発声の速度が変る現象であつて、結果
として音声パターンの時間軸上に大幅な伸縮を生
起する。普通の発声では発声速度変動は約30％以
下と考えられる。この程度であると「特願昭45―
53896、パターン比較装置」に記載されているダ
イナミツク・プログラミングに基いたパターンマ
ツチング法（以下DPマツチング法と称する）や
「特願昭44―4542号明細書」に記載される逐次的
な方法によつて正規化することができる。すなわ
ち、発声者が普通の音声をするように心掛けてい
る限り、在来の技術によつて正確な認識を実行で
きる。しかるに、実際には、データの入力速度を
向上するために高速な発声が行なわれることが多
く、場合によつては標準パターンに比較して30％
以上の速度で発声されることも起り得る。このよ
うに高速に発生された場合には上記のDPマツチ
ング法によつても正規化を完全に行なうことが不
可能になり誤認識を引き起すことになる。 This is true even when the same person pronounces the same word.
This is a phenomenon in which the speed of vocalization changes each time, resulting in a significant expansion or contraction on the time axis of the voice pattern. In normal speech, the variation in speech rate is thought to be about 30% or less. At this level, the
The pattern matching method based on dynamic programming (hereinafter referred to as DP matching method) described in ``Pattern Comparison Device'' and the sequential method described in ``Japanese Patent Application No. 4542/1983''. Therefore, it can be normalized. That is, as long as the speaker tries to make normal speech, accurate recognition can be performed using conventional techniques. However, in reality, high-speed utterances are often performed to improve the data input speed, in some cases up to 30% faster than the standard pattern.
It is also possible for the words to be uttered at speeds higher than that. When generated at such a high speed, even with the above-mentioned DP matching method, it is impossible to perform complete normalization, resulting in erroneous recognition.

本発明は在来の音声認識装置が水幅な発生速度
変化によつて誤動作を引き起されるという欠点を
改善し、発声者が希望する速度で正確なデータ入
力を可能とするような音声認識装置を実現・提供
することを目的としている。 The present invention improves the drawback that conventional speech recognition devices are prone to malfunctions due to wide variations in generation speed, and provides speech recognition that enables accurate data input at the speed desired by the speaker. The purpose is to realize and provide equipment.

本発明による装置は、発声者が希望する発声速
度を設定するための発声速度設定部と、この発声
速度設定部よりの信号に基づいて分析フレーム周
期が変化するごとく構成された分析部と、分析部
によつて得られる分析結果を基にして判定動作を
実行するための認識部とより構成される。 The device according to the present invention includes a speech rate setting section for setting a speech rate desired by a speaker, an analysis section configured such that the analysis frame period changes based on a signal from the speech rate setting section, and a recognition section for executing a determination operation based on the analysis results obtained by the section.

以下に本発明の詳細な構成を図面に基づいて説
明する。第１図は本発明の基本的な構成を示すブ
ロツク図である。マイクロホン１０より入力され
た入力音声は分析部１１によつて特徴ペクトルａ_i＝（ａ_1i，ａ_2i，……ａ_Ki） (1) の時系列に変換される。 The detailed configuration of the present invention will be explained below based on the drawings. FIG. 1 is a block diagram showing the basic configuration of the present invention. The input voice inputted from the microphone 10 is converted by the analysis unit 11 into a time series of characteristic vectors a _i =(a _1i , a _2i , . . . a _Ki ) (1).

Ａ＝a₁，a₂，……ａ_i……ａ_I (2) ここにａ_iは時間間隔△ｔで時間標本化されて
いる。 A=a ₁ , a ₂ , ...a _i ...a _I (2) Here, a _i is time sampled at time interval Δt.

具体的な分析処理としては周知の周波数分析、
自己相関分析、線形予測分析など種々の方法が考
えられるが、以下では代表的な一例として周波数
分析による場合を考える。この場合には(1)式のペ
クトルａ_iはＫチヤンネルのバンドパスフイルタ
ー群の分析出力を分析フレーム周期△ｔで時間標
本化した値を要素として構成される。発声速度設
定部１２は目盛付きダイアルのように手動操作に
よつて、発声速度を指定できる機構を有してお
り、この目盛付きダイアルの指定によつて分析フ
レーム周期△ｔを定めて信号ｄにより前記分析部
に指令するように構成されている。認識部１３は
分析部１１より与えられる(2)式のごとき発声パタ
ーンＡを処理することによつて認識動作を行な
う。認識動作の原理としては種々の方法が考えら
れるが、ここでは最も簡単な一例としてパターン
マツチング法を考える。すなわち、認識されるべ
き語彙がｎ＝１，２，……，ＮなるＮ個の単語よ
り成るときは各単語ｎに対して標準パターンＢⁿ＝ｂ^ｎ _１，ｂ^ｎ _Ｊ２，……ｂ^ｎ _ｉ，……ｂ^ｎ _Ｊ(3) をあらかじめ用意して記憶しておく。この標準パ
ターンの時間軸ｊは前の入力パターンＡの時間軸
ｉと異なつて固定の分析フレーム周期△τの間隔
で時間標本化されているものとする。入力パター
ンＡが与えられるとこれらの標準パターンとの間
で比較操作が実行され、類似度が計算される。い
ま標準パターンＢⁿと入力パターンＡとの類似度
をＳ（Ａ，Ｂⁿ） (4) で示すことにする。この類似度がｎ＝１〜Ｎなる
すべての標準パターンに対して計算されると、そ
の最大になるｎ＝ｎが決定される。入力パターン
Ａは単語ｎであると認識され、認識結果は信号ｒ
として出力される。 Specific analysis processing includes well-known frequency analysis,
Various methods such as autocorrelation analysis and linear prediction analysis are possible, but below we will consider frequency analysis as a typical example. In this case, the spectrum a _i in equation (1) is constructed with elements of values obtained by time-sampling the analysis outputs of the bandpass filter group of the K channel at the analysis frame period Δt. The speech rate setting section 12 has a mechanism such as a dial with a scale that allows the user to specify the speech rate by manual operation.The analysis frame period Δt is determined by the designation of the dial with a scale, and the analysis frame period Δt is determined by the signal d. The analyzer is configured to instruct the analysis unit. The recognition unit 13 performs a recognition operation by processing the utterance pattern A given by the analysis unit 11 as shown in equation (2). Various methods can be considered as the principle of recognition operation, but here we will consider a pattern matching method as the simplest example. That is, when the vocabulary to be recognized consists of N words where n = 1, 2, ..., N, the standard pattern B ⁿ = b ⁿ ₁ , b ⁿ _J2 , ... b ⁿ is used for each word n. Prepare and memorize _i ,...b ⁿ _J (3) in advance. It is assumed that the time axis j of this standard pattern is different from the time axis i of the previous input pattern A and is time-sampled at intervals of a fixed analysis frame period Δτ. When input pattern A is given, a comparison operation is performed between these standard patterns and similarity is calculated. Let us now denote the degree of similarity between the standard pattern B ⁿ and the input pattern A as S(A, B ⁿ ) (4). When this degree of similarity is calculated for all standard patterns from n=1 to N, the maximum value n=n is determined. Input pattern A is recognized as word n, and the recognition result is signal r
is output as

以上の原理によつて動作する場合、入力パター
ンの分析フレーム周期△ｔを調整することによつ
て標準パターンを発声するときの速度と入力パタ
ーンを発声するときの速度の相異を補正すること
ができる。 When operating according to the above principle, it is possible to correct the difference between the speed at which the standard pattern is uttered and the speed at which the input pattern is uttered by adjusting the analysis frame period Δt of the input pattern. can.

第２図は発声速度変化の影響が補正される様子
を説明するための図である。参照数字２１は標準
パターンＢⁿを示し一定のフレーム周期△τ（例
えば20ms）で時間標本化されている。参照数字
２２は前記標準パターンと同じ単語を２倍の速度
（1/2の時間長）で発声した入力パターンの例であ
る。標準パターンの分析フレーム周期△ｒと同じ
分析フレーム周期で時間標本化すると標準パター
ンの1/2の長さのパターンになつてしまう。いま
入力パターンの分析フレーム周期を1/2にする
と、結果として得られる入力パターンＡは標準パ
ターンＢⁿと同じ長さになり、発声速度変化の影
響は打ち消されたことになる。もちろん発声速度
を正確に２倍にするという制御は人為的には不可
能である。このため、入力パターンの分析フレー
ム周期を２倍にしても入力パターンＡの長さが対
応する標準パターンの長さとある程度異なること
も起り得るが、前に述べたDPマツチング法のよ
うなパターンマツチング法を採用することによつ
て、この程度の時間長の差は正規化可能であり問
題は生じない。よしんばDPマツチング法の如き
高精度な時間正規化を採用しないとしても、入力
パターンと標準パターンとの短かい方のパターン
の長さに揃えて、他のパターンの余剰部分を切り
棄てた後、単純なパターンマツチングを行なつて
も、本方式の分析フレーム周期制御を用いると、
近似的に時間正規化効果が得られる。 FIG. 2 is a diagram for explaining how the influence of a change in speaking speed is corrected. Reference numeral 21 indicates a standard pattern B ⁿ which is time-sampled at a constant frame period Δτ (for example, 20 ms). Reference numeral 22 is an example of an input pattern in which the same word as the standard pattern is uttered at twice the speed (1/2 time length). If time sampling is performed with the same analysis frame period as the analysis frame period Δr of the standard pattern, the pattern will be half the length of the standard pattern. If the analysis frame period of the input pattern is now halved, the resulting input pattern A will have the same length as the standard pattern B ⁿ , and the influence of the change in speech rate will be cancelled. Of course, it is impossible to artificially control the speech rate to accurately double it. For this reason, even if the analysis frame period of the input pattern is doubled, the length of input pattern A may differ to some extent from the length of the corresponding standard pattern. By adopting the method, this degree of difference in time length can be normalized and does not cause any problems. Even if high-precision time normalization such as the Yoshiba DP matching method is not adopted, after aligning the input pattern with the length of the shorter of the standard patterns and discarding the surplus parts of the other patterns, the simple Even if pattern matching is performed, if the analysis frame periodic control of this method is used,
Approximate time normalization effect can be obtained.

以上述べた構成によると発声者は自分の好み
（あるいは必要とする）発声速度をあらかじめ設
定することができ、こうすることによつて大幅に
高速に（あるいはゆつくりと）発声された音声で
も正しく認識されるようになる。 According to the configuration described above, the speaker can set his or her preferred (or required) speaking speed in advance, and by doing so, even if the voice is uttered at a significantly high speed (or slowly), it can be correctly uttered. Become recognized.

第３図は分析部１１において分析フレーム周期
が制御される機構の一構成例を示すブロツク図で
ある。 FIG. 3 is a block diagram showing an example of the structure of a mechanism for controlling the analysis frame period in the analysis section 11.

入力音声信号ｘはバンドパスフイルター群とレ
ベル検出器より成る周波数分析部１１０によつて
Ｎチヤンネルの分析帯域に分析され、信号a₁，
a₂，……，ａ_Nとして出力される。また、発声速
度設定部１２は目盛付ダイアル機構によつて構成
されており、目盛１にダイアルが合された時は信
号ｄとして数値10が出力され、ダイアル目盛が
0.1だけ増加あるいは減少される毎に信号ｄは１
だけ増加あるいは減少される。基準クロツク発生
回路１１６は200μｓ周期のクロツクパルスClを
発生する。加算回路１１３は信号ｄで与えられる
数値とレジスター１１４の内容を示す信号ｄで与
えられる数値の算術和を計算する。この算術和は
クロツクパルスClに同期してレジスター１１４
に書込まれる。それゆえ、レジスター１１４の内
容はクロツクパルスClの周期、すなわち200μｓ
毎に信号ｄで指定される数値だけ増加され続け
る。 The input audio signal x is analyzed into N-channel analysis bands by a frequency analyzer 110 consisting of a group of bandpass filters and a level detector, and the signals a ₁ ,
It is output as a ₂ , ..., a _N. The speech rate setting section 12 is composed of a dial mechanism with a scale, and when the dial is set at scale 1, the number 10 is output as a signal d, and the dial scale is set to 1.
The signal d increases or decreases by 1 every time it is increased or decreased by 0.1.
increased or decreased. A reference clock generating circuit 116 generates a clock pulse Cl having a period of 200 μs. The adder circuit 113 calculates the arithmetic sum of the numerical value given by the signal d and the numerical value given by the signal d indicating the contents of the register 114. This arithmetic sum is stored in register 114 in synchronization with clock pulse Cl.
written to. Therefore, the contents of register 114 are equal to the period of clock pulse Cl, i.e. 200 μs.
It continues to be incremented by the value specified by the signal d.

デコード回路１１５はレジスタ１１４の内容が
1000を超えた瞬間にフレームクロツクパルスfcl
を発生する。このフレームクロツクパルスfclに
よつてレジスター１１４の内容は０にリセツトさ
れる。以後再び、200μｓ毎の加算が繰返され
る。それゆえ信号ｄで指定される数値が10である
時には（1000／10）×200μｓ＝20ms 周期でフレームクロツクパルスfclが発生され
る。 The decoding circuit 115 detects the contents of the register 114.
Frame clock pulse fcl at the moment it exceeds 1000
occurs. The contents of register 114 are reset to 0 by this frame clock pulse fcl. Thereafter, addition is repeated every 200 μs. Therefore, when the value specified by the signal d is 10, a frame clock pulse fcl is generated with a period of (1000/10) x 200 μs = 20 ms.

信号ｄによつて指定される数値が10のα倍であ
る時にはフレームクロツパルスfclの周期は約1/
α倍になることは明白である。 When the value specified by the signal d is α times 10, the period of the frame pulse fcl is approximately 1/
It is clear that it will be α times as large.

マルチプレクサー１１１はフレームクロツクパ
ルスfclが与えられる毎に周波数分析部１１０の
出力a₁〜ａ_Nを走査的に時間標本化してＡ／Ｄ変
換器１１２に送る。Ａ／Ｄコンバーターの出力信
号ａとしては分析結果a₁〜ａ_Nのデイジタル信号
群がフレームクロツクパルスfclに同期して得ら
れる。 The multiplexer 111 time-samples the outputs a ₁ -a _N of the frequency analyzer 110 in a scanning manner every time the frame clock pulse fcl is applied, and sends the sampled signals to the A/D converter 112. As the output signal a of the A/D converter, a digital signal group of analysis results a ₁ to a _N is obtained in synchronization with the frame clock pulse fcl.

以上によつて発声速度を速く設定するにしたが
つて分析フレーム周期を短く設定する回路が実現
された。 As described above, a circuit has been realized in which the analysis frame period is set shorter as the speech rate is set faster.

以上の例で標準パターンＢⁿが認識部に固定的
に内蔵される場合には、その標準パターンを固定
の分析フレーム周期△τ（たとえば20ms）で時
間標本化してやればよい。しかし、発声者が交替
した時に新しい発声者の個人性に追従するために
新しい発声者自身の声で標準パターンの登録を行
なうことが通常行なわれる。 In the above example, when the standard pattern B ⁿ is fixedly built into the recognition unit, the standard pattern may be time-sampled at a fixed analysis frame period Δτ (for example, 20 ms). However, when a speaker is replaced, a standard pattern is usually registered in the new speaker's own voice in order to follow the individuality of the new speaker.

第４図の実施例は標準パターンの登録の時と、
実際にデータを入力する時の２種のモードのそれ
ぞれで分析フレーム周期の制御が簡単かつ確実に
行なわれるように構成された音声認識装置のブロ
ツク図を示す。この図では、分析部は分析処理部
４０と分析フレームパルス発生部４１に分割して
示されている。第３図のブロツク図の加算回路１
１３、レジスター１１４、デコード回路１１５基
準クロツク発声回路１１６が分析フレームパルス
発生部４１に対応し、周波数分析部１１０、マル
チプレクサー１１１Ａ／Ｄ変換器１１２が分析処
理部４０に対応する。また認識部は標準パターン
記憶部４３、入力パターンバツフアー４４、類似
度計算部４５、最大値検出部４６によつて構成さ
れる。発声速度設定部１２は第３図に説明したの
と同じである。基準分析フレーム周期発生部は固
定された周期△τ（例えば20ms）のパルスを発
生する。切換スイツチ４７と４８は連動して動作
する。すなわち、標準パターンを登録するモード
の時は切換スイツチ４７と４８は共に接点１の方
に接続される。切換スイツチ４７が接点１に接続
されるゆえ分析処理部４０に供給される分析フレ
ームクロツクパルスfclの周期は△τ（例えば
20ms）となる。 The embodiment shown in FIG. 4 is used when registering a standard pattern,
FIG. 2 is a block diagram of a speech recognition device configured to easily and reliably control the analysis frame period in each of two modes when actually inputting data. In this figure, the analysis section is shown divided into an analysis processing section 40 and an analysis frame pulse generation section 41. Adder circuit 1 in the block diagram of Figure 3
13, a register 114, a decoding circuit 115, and a reference clock generation circuit 116 correspond to the analysis frame pulse generation section 41, and a frequency analysis section 110, a multiplexer 111, and an A/D converter 112 correspond to the analysis processing section 40. Further, the recognition section includes a standard pattern storage section 43, an input pattern buffer 44, a similarity calculation section 45, and a maximum value detection section 46. The speech rate setting section 12 is the same as that explained in FIG. The reference analysis frame period generator generates a pulse with a fixed period Δτ (for example, 20 ms). Changeover switches 47 and 48 operate in conjunction. That is, in the standard pattern registration mode, the changeover switches 47 and 48 are both connected to the contact point 1. Since the changeover switch 47 is connected to contact 1, the period of the analysis frame clock pulse fcl supplied to the analysis processing section 40 is Δτ (for example,
20ms).

したがつて、この時発声される標準パターンは
分析フレーム周期△τで時間標本化される。かく
して得られる標準パターンＢⁿ（ｎ＝１〜Ｎ）
は、切換スイツチ４８の働きによつて標準パター
ン記憶部４３に送られ記憶される。実際にデータ
を入力するモードの時には切換スイツチ４７と４
８は共に接点２に接続される。したがつて、分析
処理部４０に供給される分析フレームパルスfcl
は分析フレームパルス発生部４１より送られるこ
とになる。それゆえデータを入力するモードの時
には分析フレームパルスfclの周期は発声速度設
定部１２によつて制御されることになる。それゆ
え発声者は自分の希望する発声速度に合わせて分
析フレーム周期△τを設定することができる。か
くして適切な分析フレーム周期で時間標本化され
た入力パターンＡは、切換スイツチ４８の働きに
よつて入力パターンバツフアー４４に送られ認識
処理に供せられる。認識のための具体的な処理内
容は本発明の直接関与するところではないが、一
例は次のごとくである。入力パターンバツフアー
４４に記憶される入力パターンＡと標準パターン
記憶部４３に記憶されている標準パターンＢⁿ
（ｎ＝１〜Ｎ）の間で類似度計算部４５によつて
比較操作が行われ、類似度Ｓ（Ａ，Ｂⁿ），ｎ＝１
〜Ｎが算出される。最大値検出部４６では類似度
Ｓ（Ａ，Ｂⁿ）が最大になる単語ｎ＝ｎを定め、
これを認識結果γとして出力する。 Therefore, the standard pattern uttered at this time is time sampled at the analysis frame period Δτ. Standard pattern B ⁿ (n=1 to N) thus obtained
is sent to the standard pattern storage section 43 and stored therein by the action of the changeover switch 48. When in the mode for actually inputting data, selector switches 47 and 4
8 are both connected to contact 2. Therefore, the analysis frame pulse fcl supplied to the analysis processing section 40
is sent from the analysis frame pulse generator 41. Therefore, in the data input mode, the period of the analysis frame pulse fcl is controlled by the speech rate setting section 12. Therefore, the speaker can set the analysis frame period Δτ according to his or her desired speaking speed. The input pattern A time-sampled at an appropriate analysis frame period is sent to the input pattern buffer 44 by the action of the changeover switch 48 and subjected to recognition processing. Although the specific processing content for recognition is not directly related to the present invention, an example is as follows. Input pattern A stored in input pattern buffer 44 and standard pattern B ⁿ stored in standard pattern storage section 43
A comparison operation is performed by the similarity calculation unit 45 between (n=1 to N), and the similarity S (A, B ⁿ ), n=1
~N is calculated. The maximum value detection unit 46 determines the word n=n for which the degree of similarity S(A, B ⁿ ) is maximum, and
This is output as the recognition result γ.

本実施例のごとき構成によると、標準パターン
を登録するモードのときは、分析フレーム周期は
自動的に固定の基準分析フレーム周期△τ（例え
ば20ms）に設定される。もし切換スイツチ４７
が無く分析フレームパルス発生部４１よりのパル
スが直接分析処理部に送られるとすると、発声者
は標準パターンを登録するとき切換スイツチ４８
を操作する他に発声速度設定部１２をも操作して
基準の分析フレーム周期△τ（例えば20ms）に
なるように調整する必要がある。このように２種
の操作を行なうと一方の操作をやり忘れるなど、
とかく誤動作が生じやすい。第４図の実施例のご
とき構成によれば以上の問題は完全に回避でき
る。 According to the configuration of this embodiment, in the standard pattern registration mode, the analysis frame period is automatically set to a fixed reference analysis frame period Δτ (for example, 20 ms). If changeover switch 47
If the pulse from the analysis frame pulse generator 41 is sent directly to the analysis processing unit without the
In addition to operating the speech rate setting section 12, it is necessary to adjust the analysis frame period to the standard analysis frame period Δτ (for example, 20 ms). If you perform two types of operations in this way, you may forget to perform one operation, etc.
However, malfunctions are likely to occur. With a configuration such as the embodiment shown in FIG. 4, the above problems can be completely avoided.

以上本発明の構成を実施例をもとに説明したが
これらの記載は本発明の範囲を限定するものでは
ない。特に分析フレーム周期を可変とする回路は
種々の回路によつて実現可能であり、第３図の例
に限定されない。認識部の具体的構成は本発明で
は特に限定しない。また、第３図では音声の分析
処理を周波数分析によつて行なう例を示したが、
他の分析法、たとえば自己相関分析、フオルマン
ト分析などの方法に対しても本発明の原理を適用
できる。 Although the configuration of the present invention has been described above based on examples, these descriptions do not limit the scope of the present invention. In particular, a circuit that makes the analysis frame period variable can be realized using various circuits, and is not limited to the example shown in FIG. 3. The specific configuration of the recognition unit is not particularly limited in the present invention. In addition, although Fig. 3 shows an example in which voice analysis processing is performed by frequency analysis,
The principles of the present invention can also be applied to other analysis methods, such as autocorrelation analysis and formant analysis.

[Brief explanation of the drawing]

第１図は本発明の基本的な構成を示す図であり
１０はマイクロホン、１１は分析部、１２は発声
速度設定部、１３で認識部である。第２図は本発明の原理を示す図である。第３図
は本発明の一実施例を示す図であり、１１０は周
波数分析部、１１１はマルチプレクサー、１１２
はA/D変換器、１１３は加算回路、１１４はレジ
スター、１１５はデコード回路、１１６は基準ク
ロツク発生回路である。第４図は本発明の第２の実施例を示す図であり
４０は分析処理部、４１は分析フレームクロツク
発生部、４２は基準分析フレームクロツク発生
部、４３は標準パターン記憶部、４４は入力パタ
ーンバツフア、４５は類似度計算部、４６は最大
値検出部、１２は発声速度設定部、４７と４８は
互いに連動した切換スイツチである。 FIG. 1 is a diagram showing the basic configuration of the present invention, in which 10 is a microphone, 11 is an analysis section, 12 is a speech rate setting section, and 13 is a recognition section. FIG. 2 is a diagram showing the principle of the present invention. FIG. 3 is a diagram showing an embodiment of the present invention, in which 110 is a frequency analysis section, 111 is a multiplexer, and 112 is a diagram showing an embodiment of the present invention.
113 is an A/D converter, 113 is an adder circuit, 114 is a register, 115 is a decoding circuit, and 116 is a reference clock generation circuit. FIG. 4 is a diagram showing a second embodiment of the present invention, in which 40 is an analysis processing section, 41 is an analysis frame clock generation section, 42 is a reference analysis frame clock generation section, 43 is a standard pattern storage section, and 44 is a diagram showing a second embodiment of the present invention. 45 is an input pattern buffer, 45 is a similarity calculation section, 46 is a maximum value detection section, 12 is a speech rate setting section, and 47 and 48 are mutually interlocked changeover switches.

Claims

[Claims]

1. In a speech recognition device consisting of an analysis section that analyzes an input speech signal and a recognition section that makes a judgment by processing the analysis result signal, there is a method for setting the speech rate desired by the speaker. 1. A speech recognition device comprising a speech rate setting section, the analysis frame period of the analysis section being controllable by a signal from the speech rate setting section.