JPS62294298A

JPS62294298A - Voice input unit

Info

Publication number: JPS62294298A
Application number: JP61138538A
Authority: JP
Inventors: 樺澤　哲
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1986-06-13
Filing date: 1986-06-13
Publication date: 1987-12-21

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】３、発明の詳細な説明産業上の利用分野本発明は発声速度の変動に対処した音声入力装置に関す
る。DETAILED DESCRIPTION OF THE INVENTION 3. Detailed Description of the Invention Field of Industrial Application The present invention relates to a voice input device that can cope with variations in speech rate.

従来の技術従来のこの種の音声入力装置としては、例えば、５ｐｏ
ｋｅｎ　Ｊａｐａｎｅｓｅ、”ＩＣＡＳＳＰ−８３゜１
１１１）３２０−３２３．１９８３．に示されているよ
うに第３図のような構成になっていた。2. Description of the Related Art Conventional audio input devices of this type include, for example, a 5-point
ken Japanese, “ICASSP-83゜1
111) 320-323.1983. As shown in Figure 3, the configuration was as shown in Figure 3.

すなわち、音声入力端子３１．入力音声信号全特徴ベク
トルの系列から成る入力パタンに変換する特徴抽出部３
２．音節標準パタンを記憶する音節標準パタン記憶部３
３．入力パタンの母音部分を検出して識別する母音部識
別部３４．ＤＰマツチングを用いて時間軸伸縮しながら
入力パタンの部分パタンと前記音節種属パタンとのパタ
ン間距離を求めるパタンマツチング部３５．前記パタン
マツチング部３５で得られたパタン間距離の累積距離の
最小値を与える音節標準パタン列を判定して入力音声の
もつ音節列を決定する音節列決定部３６、認識結果出力
端子３７から構成され、入力された音声の母音部分を識
別し、その母音部分毎の部分パタンと母音部の識別結果
と同じ母音部をもつ音節標準パタンとのパタン間距離を
求め、パタン間距離の累積距離が最小となる音節標準パ
タン列を判定して入力音声のもつ音節列として決定する
ことにより入力音声を認識するようになっている。That is, the audio input terminal 31. Feature extraction unit 3 converts the input audio signal into an input pattern consisting of a series of all feature vectors
2. Syllable standard pattern storage unit 3 that stores syllable standard patterns
3. A vowel part identification unit 34 that detects and identifies the vowel part of the input pattern. A pattern matching unit 35 that uses DP matching to calculate the inter-pattern distance between a partial pattern of the input pattern and the syllable type pattern while expanding and contracting the time axis. A syllable string determining section 36 that determines the syllable string of the input speech by determining the syllable standard pattern string that gives the minimum cumulative distance between patterns obtained by the pattern matching section 35, and a recognition result output terminal 37. The vowel part of the composed and input speech is identified, and the inter-pattern distance between the partial pattern for each vowel part and the syllable standard pattern that has the same vowel part as the vowel part identification result is calculated, and the cumulative distance of the inter-pattern distance is calculated. The input speech is recognized by determining the syllable standard pattern string with the minimum value and determining it as the syllable string of the input speech.

発明が解決しようとする問題点しかし、このような構成の音声認識装置を使用して入力
音声を認識する際、ＤＰマツチングにより時間軸伸縮し
ているものの、入力音声の発声速度が、極端に遅かった
り速かったりすると、母音部分の識別において、余分の
母音部分が付加したり、母音部分が検出できず脱落した
りして、認識精度が劣化するという問題があった。Problems to be Solved by the Invention However, when recognizing input speech using a speech recognition device with such a configuration, although the time axis is expanded and contracted by DP matching, the speaking speed of the input speech is extremely slow. If the speed is too fast, there is a problem in that when identifying vowel parts, extra vowel parts are added or vowel parts cannot be detected and are omitted, resulting in deterioration of recognition accuracy.

そこで、本発明は、入力音声の発声速度が、極端に遅か
ったり速かったりした場合に、話者に認識装置の側から
「もう少し速く発声して下さい」とか「もう少しゆっ〈
υ発声して下さい」といった指示を発生することによ）
、話者の発声速度の変動をできるだけ小さくして、認識
精度の劣化を防ぐものである。Therefore, in the present invention, when the speaking speed of the input voice is extremely slow or fast, the recognition device can tell the speaker, ``Please speak a little faster,'' or ``Speak a little more slowly.''
(By generating instructions such as "Please speak υ")
, to prevent deterioration in recognition accuracy by minimizing fluctuations in the speaker's speaking rate.

問題点を解決するための手段上記問題点を解決する本発明の技術的な手段は、発声速
度を制御するために、発声速度に関する指示を与える発
声速度指示部を設けたことにある。Means for Solving the Problems The technical means of the present invention for solving the above-mentioned problems lies in the provision of a speech rate instruction section that gives instructions regarding the speech rate in order to control the speech rate.

作用この技術的手段による作用は次のようになる。action The effect of this technical means is as follows.

すなわち、発声速度が、極端に遅かったり速かったりす
ると、発声速度指示部が「もう少し速く発声して下さい
」とか「もう少しゆっくり発声して下さい」という指示
を出して、発声速度を矯正し、発声速度を一定に保つこ
とができる。In other words, if the speaking speed is extremely slow or fast, the speaking speed instruction unit issues instructions such as "Please speak a little faster" or "Please speak a little more slowly" to correct the speaking speed and adjust the speaking speed. can be kept constant.

この結果、認識装置において、発声速度が極端に遅いた
めに発生する余分な音節の付加や、発声速度が極端に速
いために発生する音節の脱落を防止することができて、
発声速度の変動に起因する認識８度の劣化を防ぐことが
できるのである。As a result, in the recognition device, it is possible to prevent the addition of extra syllables that occur due to extremely slow speaking speeds, and the omission of syllables that occur due to extremely fast speaking speeds.
This makes it possible to prevent the deterioration of the recognition level 8 due to variations in speech rate.

実施例以下、本発明の実施例について説明するが、その前にパ
タンマツチングによる単語音声認識装置について説明す
る。この装置の一般的な構成は次のようなものである。Embodiments Below, embodiments of the present invention will be described, but first a word speech recognition device using pattern matching will be explained. The general configuration of this device is as follows.

入力音声信号音、フィルタバンク、周波数分析ＬＰＣ分
析等によって特徴ベクトルの系列に変換する特徴抽出手
段と、予め発声され、この特徴抽出手段により抽出され
た特徴ベクトルの系列を認識単語全部について標準パタ
ーンとして登録しておく標準パターン記憶手段と、認識
させるべく発声され、前記特徴抽出手段により抽出され
た入カバターンと前記標準パターン記憶手段に記憶され
ている標準パターンの全てと特徴ベクトルの系列として
の類似度あるいは距離を計算するパターン比較手段と、
パターン比較の結果、最も類似度の高かった（距離の小
さかった）標準パターンに対応する単語を認識結果とし
て判定出力する判定手段からなる。A feature extracting means converts the input speech signal into a series of feature vectors using a filter bank, frequency analysis, LPC analysis, etc., and a feature vector series uttered in advance and extracted by the feature extracting means is used as a standard pattern for all recognized words. A standard pattern storage means to be registered, an input pattern uttered for recognition and extracted by the feature extraction means, all standard patterns stored in the standard pattern storage means, and similarity as a series of feature vectors. Or a pattern comparison means for calculating distance,
It consists of a determining means for determining and outputting a word corresponding to the standard pattern with the highest degree of similarity (smallest distance) as a recognition result as a result of pattern comparison.

このとき、同一話者が同一の単語を発声しても発声の都
度、その発声時間長が異るので、前記パターン比較手段
で標準パターンと入カバターンの比較を行う際には、両
者の時間軸を伸縮させ、両者のパターン長を揃えて比較
する必要がある。その際、発声時間長の変化は、発声単
語の各部で一様に生じているのではないので、各部を不
均一に伸縮する必要がある。At this time, even if the same speaker utters the same word, the duration of the utterance differs each time, so when comparing the standard pattern and the input pattern using the pattern comparison means, the time axis of both is It is necessary to expand and contract the pattern lengths of the two to make them the same and compare them. At this time, since the change in utterance time length does not occur uniformly in each part of the uttered word, it is necessary to expand and contract each part non-uniformly.

これを図で表現したのが第４図である。第４図ａにおい
ては横軸は入カバターンＡ＝＆、、ａＩ１、・・・・・
・、ａｉ　（ａｉは入カバターンの第１フレームの特徴
ベクトル）に対応するｉ座標、縦軸は標準バター：ｙＲ
”＝ｒ”　　ｒ”　　−＝−、ｒ”ｊｎ（ｒ”ｊは標準
パターンＨｎの第１フレームの特徴ベクトル）に対応す
るフ座標を表す。入カバターン人と標準パターンＨｎと
を時間軸を非線形に伸縮してマツチングするとはこの格
子グラフ上において、両パターンの各特徴ベクトルの対
応関係を示す径路１を、両パターンの、系列としての距
離が最小になるという評価基準のもとで見出し、そのと
きの距離を両パターンの距離とする。この計算を効率的
に行う方法として動的計画法を用いる方法が良く知られ
ており、ＤＰマツチングと呼ばれている。Figure 4 represents this graphically. In Fig. 4a, the horizontal axis represents the input cover turn A=&,, aI1,...
・, i coordinate corresponding to ai (ai is the feature vector of the first frame of the input pattern), the vertical axis is the standard butter: yR
"=r"r" -=-, r"jn (r"j is the feature vector of the first frame of the standard pattern Hn) Matching by expanding and contracting means that on this lattice graph, path 1 indicating the correspondence between the feature vectors of both patterns is found based on the evaluation criterion that the distance between the two patterns as a series is minimized, and then The distance between the two patterns is the distance between the two patterns.A method using dynamic programming is well known as a method for efficiently performing this calculation, and is called DP matching.

この径路を決める際には音声の性質を考慮して制限条件
を設ける。第４図すは傾斜制限と呼ばれる径路選択の条
件の一例である。即ち、この例では点（ｉ、ｊ）へ至る
径路は、点（ｉ−２，１−１）から点（ｉ−１，コ）を
通る径路が、点（１−１，ｊ−１）からの径路か、点（
１−１，コー１）から点（ｉ、ｊ−１）ｋ通る径路かの
何れかの径路しか取り得ないことを意味しており、入カ
バターンと標準パターンの始端と終端は必ず対応させる
という条件をつければ、前記マツチングの径路は第４図
乙の斜線の部分に制限される。この制限は、いかに時間
軸が伸縮するとはいっても、同一単語に対してはそれ程
極端に伸縮するはずはないという事実からあまり極端な
対応づけが生じないようにするためである。When determining this route, limiting conditions are set in consideration of the nature of the voice. FIG. 4 is an example of a route selection condition called slope restriction. That is, in this example, the path leading to point (i, j) is the path passing from point (i-2, 1-1) to point (i-1, k), and the path leading to point (1-1, j-1) A path from or a point (
This means that only one of the routes passing from point (i, j-1) k from point (i, j-1) k can be taken, and the condition that the starting and ending ends of the input cover pattern and the standard pattern must correspond. If , the matching path is limited to the shaded area in Figure 4B. This restriction is made to prevent extreme correspondences from occurring due to the fact that no matter how much the time axis expands or contracts, it is unlikely that the same word will expand or contract so drastically.

両系列間の距離は、入力ベクトルａよと標準パターンベ
クトルｒ　、ｎのベクトル間距Ｊ’４ｄｎ　（ｉ　。The distance between both series is the distance between the input vector a and the standard pattern vector r and n, J'4dn (i.

））の前記径路に沿う重み付平均として定義さ九る。こ
のとき径路に清う重みの和が径路の選ばれ方に依らず一
定になるようにしておけばＤＰマツチングの手法が使え
る。)) is defined as the weighted average along said path. At this time, the DP matching method can be used if the sum of the weights assigned to the routes is made constant regardless of how the routes are selected.

第６図は単音節音声標準パターンを結合することによっ
て構成した単語標準パターンと入カバターンのマツチン
グの様子を図示したものである。FIG. 6 illustrates the matching of word standard patterns constructed by combining monosyllabic speech standard patterns and input kata patterns.

同図において、Ｒｑ　（＋ｌ　、　Ｒｑ　ｆ２１　、　
Ｒす３）は単音節ｑ（１１゜ｑυｌ、ｑ１３１の標準パ
ターンを意味し、この例は単音節ｑ（＋＋、　ｑ（２１
，ｑｆ５１から成る単語の標準パターンと入カバターン
をマツチングする場合を示している。In the same figure, Rq (+l, Rq f21,
Rsu3) means the standard pattern of monosyllable q(11゜qυl, q131, this example is monosyllable q(++, q(21
, qf51 and an input cover pattern are matched.

前記説明に従ってマツチング径路は、例えば２のように
なる。According to the above description, the matching path is, for example, 2.

以下、前記したパターンマツチングの手法を用いた本発
明の実施例について説明する。Examples of the present invention using the pattern matching method described above will be described below.

第１図は本発明の一実施例を示すブロック図である。同
図において、１１１は音声信号の入力端子、１１２はフ
ィルタバンク等で構成された、入力音声信号を特徴ベク
トルの系列に変換する特徴抽出部である。１１３は音節
標準パタン記憶部であって、各音節の特徴ベクトルの系
列に変換された標準パタンか記憶される。ここで、音節
標準パタンとしては、単音節標準パタンのみと定義して
も、或いは単音節全連続発声した際に生じる調音結合（
ある単音節音声を単独で発声した場合の特徴ベクトルに
対し、連続発声された単音節音声の特徴ベクトルがその
単音節音声の前後の音声の影響を受けて変化する現象）
を考慮して、単音節標準パタン及びｖＣｖ音簡標準パタ
ン（ｖ：母音。FIG. 1 is a block diagram showing one embodiment of the present invention. In the figure, 111 is an input terminal for an audio signal, and 112 is a feature extraction unit that converts the input audio signal into a series of feature vectors, which is composed of a filter bank and the like. A syllable standard pattern storage unit 113 stores standard patterns converted into a series of feature vectors for each syllable. Here, the syllable standard pattern can be defined as only the monosyllabic standard pattern, or the articulatory combination that occurs when all monosyllables are uttered continuously (
A phenomenon in which the feature vector of a continuously uttered monosyllabic voice changes due to the influence of the voices before and after the monosyllabic voice, compared to the feature vector of a monosyllabic voice uttered alone)
Taking into consideration, the monosyllabic standard pattern and the vCv concise standard pattern (v: vowel).

Ｃ：子音）と定義しても良いが、以下の説明は単音節標
準パタンのみと定義する。ただし、音節として単音節標
準パタン及びＶＣＶ音節標準パタンと定義した場合には
、単音節の認識には単音節標準パタンのみで充分である
が、単語認識の場合に単音節標準パタンだけでなくＶＣ
Ｖ晋節を用いることができ、前記調音結合の問題を解消
することができる。C: consonant), but in the following explanation, only the monosyllabic standard pattern is defined. However, when syllables are defined as a monosyllabic standard pattern and a VCV syllable standard pattern, only the monosyllabic standard pattern is sufficient for monosyllabic recognition, but in the case of word recognition, not only the monosyllabic standard pattern but also the VC V
V Jinji can be used, and the problem of articulatory combination can be solved.

さて、１１４はベクトル間距離計算部であって、音節標
準パタン記憶部１１３の標準パタンＲｎを構成するベク
トルｒ　、ｎと入カパタンＡｆ構成するベクトルａｉの
ベクトル間距離ｄ　　（ｘ、ｊ）ｅ計算する。いま、’
ｉ”（ａｉ＋＊’ｉ２””””工１”ｒｎ−（、ｎ　、
ｒｎ　、、、、、、・、　ｒｆｉｌｌ）とするとき、ｄ
ｎＪコ１コ２（ｉ、ｊ）は最も簡単には、で与えられる。１１５はベクトル間距離記憶部であって
、ベクトル間距離計算部１１４で計算された結果を記憶
している。１１６は単語辞書であって、語彙がそれを構
成する単音節の記号列（例えば１文字記号列）としてキ
ーボード等で入力することにより予め準備されている。Reference numeral 114 denotes an inter-vector distance calculation unit, which calculates the inter-vector distance d (x, j)e between the vectors r and n constituting the standard pattern Rn of the syllable standard pattern storage unit 113 and the vector ai constituting the input pattern Af. do. now,'
i"(ai+*'i2""""工1"rn-(,n,
rn , , , , , rfill), then d
nJ ko 1 ko 2 (i, j) is most simply given by. Reference numeral 115 denotes an inter-vector distance storage unit which stores the results calculated by the inter-vector distance calculation unit 114. Reference numeral 116 denotes a word dictionary, in which vocabulary is prepared in advance by inputting it as a monosyllabic symbol string (for example, a one-character symbol string) using a keyboard or the like.

また、語彙の標準発声時間長もキーボード等で入力する
ことにより予め準備されてい゛る。１１７は単語累積距
離計算部であって、マツチングさせたい単語に対し、単
語辞書１１６で指定される単音節の順序に従って、ベク
トル間距離記憶部１１６に記憶されている、既に計算済
のベクトル間距離を読み出してきて、単語としての点（
ｉ、ｊ）までの累積距離を計算する。即ち、例えば第６
図において、第１フレームにおいてＴｈ　””１１２．
・・・・・・、Ｎ（Ｎは音節種属パタン数）に対して音
節標準バターノＲ”　ｒ”、　＊　弓ｔ　”””　ａ　
”２　　のそれぞれのベクトヤ、ニド入カバターンＡ：
ａｉ、　、＆２．・・・・・・、ａ工の第１フレームの
ベクトルａｉとのベクトル間距離ｄｎ（ｉ、ｊ）をベク
トル間距離記憶部１１６から読み出して、Ｒｑ（１１、
Ｈｑ（２）　、　Ｒｑ　（３１の結合・くターフＲｑ　
（１）■Ｒｑ　（２１、■Ｒｑ（５ゝ＝ｒｑｆ＋１．ｒ
υ’ｒ　””’・、　ｒｑ（１１゜Ｊｑ（１）ｒｑ（２１、ｒｑ（２１、、、、、、・、　ｒｑ（，２
１Ｆ、　ｒｑ［３１、ｒυｌ　、　・・−・ＨＨ１ｒ♀
盲−とａ工とのベクトル間累積距離を求める。Further, the standard utterance time length of the vocabulary is also prepared in advance by inputting it using a keyboard or the like. Reference numeral 117 is a word cumulative distance calculation unit which calculates the already calculated distance between vectors stored in the inter-vector distance storage unit 116 according to the order of monosyllables specified in the word dictionary 116 for the word to be matched. is read out, and the point as a word (
Calculate the cumulative distance to i, j). That is, for example, the sixth
In the figure, in the first frame, Th ""112.
......, syllable standard batano R"r" for N (N is the number of syllable species and genus patterns), * bow t """ a
``2 each vector turn A with Nido:
ai, , &2. . . . Read the inter-vector distance dn(i, j) from the vector ai of the first frame of work a from the inter-vector distance storage unit 116, and calculate Rq(11,
Hq (2), Rq (31 bonds/cutoff Rq
(1) ■Rq (21, ■Rq (5ゝ=rqf+1.r
υ'r ``”'・, rq(11゜Jq(1) rq(21, rq(21, , ,,,, rq(,2
1F, rq[31, rυl, ...-HH1r♀
Find the cumulative distance between the vectors between the blind person and the a person.

マツチング径路の拘束条件として第４図すを採用し、各
径路に沿う重み係数を同図の径路上に付した数値とする
と、座標（ｉ、ｊ）における標準パタンＲｎに対する累
積距離Ｄｎ（ｉ、ｊ）は次の１１８は単語判定部であっ
て、単語累積距離計算部１１７で得られたそれぞれの単
語に対するＪα終累積距離のうち、最小値を与える単語
を認識結果として、認識結果出力端子１１９から出力す
る。If we adopt Figure 4 as the constraint condition for the matching route and the weighting coefficient along each route is the numerical value attached to the route in the figure, then the cumulative distance Dn(i, j) with respect to the standard pattern Rn at the coordinates (i, j) is j) is a word determination unit 118, which outputs the word that gives the minimum value among the Jα final cumulative distances for each word obtained by the word cumulative distance calculation unit 117 as a recognition result, and outputs the word to a recognition result output terminal 119. Output from.

前記認識結果は、後述の指示起動部２１にも送られる。The recognition result is also sent to the instruction activation unit 21, which will be described later.

１２０は音声時間長計算部であって、例えば特徴ベクト
ル系列から得られるエネルギーレベルの時系列に対して
閾値を設け、先ず入力音声の始端を検出して、音声時間
長のカウントを開始し、終端を検出した時点で音声時間
長のカウントを終了することにより音声時間長を求める
。１２１は指示起動部であって、単語判定部１１８から
得られた認識結果（単語）に対応する前記標準発声時間
長を前記単語辞書１１６から読み出して、前記音声時間
長計算部１２０で得られた音声時間長との比α（＝標準
発声時間長／音声時間長）を求め、前記αが閾値ＴＨＥ
及びＴＨｂ（ＴＨ，（ＴＨｂ）に対して、α＜ＴＨＥな
らば、発声速度指示部１２２に対して、「もう少しゆっ
くり発声して下さい」という指示を発生させる信号Ｓ１
を出力し、α）　ＴＨｂならば、発声速度指示部１２２
に対して、［もう少し速く発声して下さい」という指示
を発生させる信号Ｓ２を出力する。ＴＨＥ≦α≦ＴＨｂ
の場合は信号は発生しない。１２２は発声速度指示部で
あって、前記指示起動部１２１かも前記信号Ｓ、が入力
された場合に、「もう少しゆっくり発声して下さい」と
いう指示を指示出力１子１２３より出力し、前記指示起
動部１２１から前記信号Ｓ２が入力された場合に、「も
う少し速く発声して下さい」という指示を指示出力端子
１２３よシ出力する。１２３は指示出力端子である。Reference numeral 120 denotes a voice duration calculation unit, which sets a threshold value for the time series of energy levels obtained from the feature vector series, first detects the start of the input voice, starts counting the voice duration, and calculates the end of the voice. The audio time length is determined by ending the counting of the audio time length at the time when the audio time length is detected. Reference numeral 121 denotes an instruction activation unit which reads out the standard utterance duration corresponding to the recognition result (word) obtained from the word determination unit 118 from the word dictionary 116 and calculates the standard utterance duration obtained by the utterance duration calculation unit 120. The ratio α to the voice time length (=standard voice time length/voice time length) is calculated, and the above α is the threshold value THE
and THb(TH, (THb), if α<THE, a signal S1 generates an instruction to the speech rate instruction section 122 to "speak a little more slowly."
If α) THb, the speech rate instruction unit 122
In response to this, a signal S2 is output that generates an instruction to ``please speak a little faster.'' THE≦α≦THb
In this case, no signal is generated. Reference numeral 122 denotes a speaking speed instruction section, which outputs an instruction "Please speak a little more slowly" from the instruction output 1 child 123 when the instruction activation section 121 receives the signal S, and activates the instruction. When the signal S2 is inputted from the section 121, an instruction "Please speak a little faster" is outputted from the instruction output terminal 123. 123 is an instruction output terminal.

次に本発明の他の実施例について説明する。Next, other embodiments of the present invention will be described.

第２図は他の実施例を示しており、前記第１の実施例の
指示起動部１２１では前記標準発声時間長と前記音声時
間長の比（前記α）と閾値（前記ＴＨ，とＴＨ，）とを
比較して、前記αが前記Ｔ　ＨｅとＴＨｂに関して、α
（ＴＨｌまたばα：＞ＴＨｂのときに前記指示発生信号
を出力したのに対し、本実施例で示される指示起動部２
２１では、２２０で示される定常部検出部において例え
ば特徴ベクトルの変化量にもとづいて得られる母音定常
部の個数βと、単語判定部２１８で得られる認識結果（
単語）の音節数の個数γに関して、β〈ｒのときには前
記信号Ｓ１を出力し、β〉γのときには前記信号Ｓ２を
出力する。なお、母音定常部は、定常部検出部２２０に
おいて、例えば特徴ベクトルの変化量が閾値以下のフレ
ーム数が、所定フレーム数以上連続した部分として定義
される。また、前記第１の実施例の単語辞書１１６では
、語彙と語彙の標準発声時間長を記憶していたのに対し
、本実施例で示される単語辞書２１６は、語粱のみを記
憶している。すなわち、第２図において、２１１〜２１
５及び２１７〜２１９．２２２〜２２３で示される各部
は前記第１の実施例と全く同様に動作し、２１６で示さ
れる単語辞書、２２ｏで示される定常部検出部、２２１
で示される指示起動部は前記の動作をする。FIG. 2 shows another embodiment, in which the instruction activation unit 121 of the first embodiment uses the ratio of the standard utterance time length to the voice time length (the α), the threshold value (the TH, and the TH, ), the above α is the above T He and THb, α
(In contrast to outputting the instruction generation signal when THl or α:>THb, the instruction activation unit 2 shown in this embodiment
21, the number β of vowel constant parts obtained by the constant part detecting unit 220 based on the amount of change in the feature vector, and the recognition result obtained by the word determining unit 218 (
Regarding the number of syllables γ of a word), when β<r, the signal S1 is output, and when β>γ, the signal S2 is output. Note that the constant vowel portion is defined by the constant portion detection unit 220 as a portion in which the number of frames in which the amount of change in the feature vector is equal to or less than a threshold value continues for a predetermined number of frames or more. In addition, while the word dictionary 116 of the first embodiment stores the vocabulary and the standard utterance duration of the vocabulary, the word dictionary 216 shown in this embodiment stores only the vocabulary. . That is, in FIG. 2, 211 to 21
5 and 217 to 219, and each unit indicated by 222 to 223 operates in exactly the same manner as in the first embodiment, including a word dictionary indicated by 216, a stationary part detection unit indicated by 22o, and 221.
The instruction activation unit shown by performs the above-mentioned operation.

以上のように、本実施例によれば、発声速度指示部を設
けて適宜発声速度に関する指示を話者に与えることによ
り１話者の発声速度を制御し、発声速度が極端に遅いた
めに発生する余分な音節の付加や、発声速度が極端に速
いために発生する音節の脱落を防止することができて、
発声速度の変動に起因する認識精度の劣化を防ぐことが
できるものである。As described above, according to the present embodiment, the speech rate of one speaker is controlled by providing a speech rate instruction section and appropriately giving instructions regarding the speech rate to the speaker. It is possible to prevent the addition of extra syllables and the omission of syllables that occur due to extremely fast speaking speed.
This makes it possible to prevent deterioration in recognition accuracy due to variations in speech rate.

なお、以上で説明した実施例の各構成要素は、ソフトウ
ェア手段によりその機能を実現することも可能である。Note that the functions of each component of the embodiment described above can also be realized by software means.

発明の効果本発明の音声入力装置は、発声速度指示部を設けて適宜
発声速度に関する指示を話者に与えることにより、話者
の発声速度を制御し、発声速度が極端に遅いために発生
する余分な音節の付加や、発声速度が極端に速いために
発生する音節の脱落を防止することができて発声速度の
変動に起因する認識精度の劣化を防ぐことができるもの
である。Effects of the Invention The voice input device of the present invention controls the speaking speed of the speaker by providing a speaking speed instruction section and appropriately giving instructions regarding the speaking speed to the speaker. It is possible to prevent the addition of extra syllables and the omission of syllables that occur due to extremely high speaking speeds, and it is possible to prevent deterioration in recognition accuracy due to fluctuations in speaking speed.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すブロック図、第２図は
本発明の他の実施例を示すブロック図、第３図は従来例
を示すブロック図、第４図ａＩｂはＤＰマツチングの原
理説明図、第５図は本発明の実施例において音節標準パ
タンを用いて単語音声を認識する原理の説明図である。１１２．２１２・・・・・・特徴抽出部、１１３，２１
３・・・・・・音節標準パタン記憶部、１１４，２１４
・・・・・・ベクトル間距離計算部、１１６，２１５・
・・・・・ヘクトル間距離記憶部、１１６，２１６・・
・・・・単語辞書、１１７．２１７・・・・・・単語累
積距離計算部、１１８゜２１８・・・・・・単語判定部
、１２０・・・・・音声時間長計算部、１２１，２２１
・・・・・・指示起動部、１２２゜２２２・・・・・・
発声速度指示部、２２０・・・・・・定常部検出部。代理人の氏名　弁理士　中　尾　敏　男　ほか１名第３
図Ｉｊ−Ｊ　ｊ−２）第５図乙劫ノド詐ンーンFIG. 1 is a block diagram showing one embodiment of the present invention, FIG. 2 is a block diagram showing another embodiment of the present invention, FIG. 3 is a block diagram showing a conventional example, and FIG. 4 aIb is a block diagram of DP matching. FIG. 5 is an explanatory diagram of the principle of recognizing word sounds using syllable standard patterns in the embodiment of the present invention. 112.212... Feature extraction unit, 113, 21
3...Syllable standard pattern storage section, 114, 214
・・・・・・Vector distance calculation unit, 116, 215・
... Hector distance storage section, 116,216...
...Word dictionary, 117.217...Word cumulative distance calculation unit, 118°218...Word determination unit, 120...Speech duration calculation unit, 121,221
...Instruction activation section, 122°222...
Speech rate instruction section, 220... Steady part detection section. Name of agent: Patent attorney Toshio Nakao and 1 other person No. 3
Figure Ij-Jj-2) Figure 5

Claims

[Claims]

(1) The input audio signal is a series of feature vectors (a_1, a
＿２、・・・・・・、a_i、・・・・・・、a_I)
a feature extraction means for converting into an input pattern A consisting of a standard pattern of syllables R^n=(r^n_1, r^n_2, .
..., r^n_j, ......, r^n_J)
(n = 1, 2, ..., N); a word dictionary that stores vocabulary and standard utterance duration of the vocabulary; and the standard pattern R^n. Feature vector r^n_j (j=1, 2,..., J_
n) and the feature vector a_i of the i-th frame of the input pattern A, d^n(i, j)
, an inter-vector distance calculating means for calculating the inter-vector distance calculating means, an inter-vector distance storing means for storing the inter-pattern distance obtained by the inter-vector distance calculating means in correspondence with each syllable standard pattern, and a combination of the standard patterns. word cumulative distance calculation means for calculating the cumulative distance of the inter-vector distance d^n(i, j) between each vector constituting the word standard pattern and the input pattern; and the result of the word cumulative distance calculation means. word determining means for determining vocabulary from the word dictionary based on the word dictionary; speech duration calculating means for detecting the beginning and end of input speech and calculating the time length between the beginning and end; and issuing an instruction regarding the speaking rate. utterance speed instructing means; and instruction activating means for prompting a utterance instruction when the ratio of the standard speech duration stored in the word dictionary and the speech duration is not within a predetermined range regarding the result of the word determination means. A voice input device comprising:

(2) The input audio signal is a series of feature vectors (a_1, a
＿２、・・・・・・、a_i、・・・・・・、a_I)
a feature extraction means for converting into an input pattern A consisting of a standard pattern of syllables R^n=(r^n_1, r^n_2, .
..., r^n_j, ......, r^n_J)
syllable standard pattern storage means for storing (n=1, 2, ..., N), and a feature vector r^n_j (j = 1, 2, ...) constituting the standard pattern R^n; ..., J
_n) and the feature vector a_i of the i-th frame of the input pattern A, the inter-vector distance d^n(i, j
); inter-vector distance storage means for storing the inter-pattern distance obtained by the inter-vector distance calculating means in correspondence with each syllable standard pattern; word cumulative distance calculation means for calculating the cumulative distance of the inter-vector distance d^n(i, j) between each vector constituting the word standard pattern and the input pattern; a word determining means for determining a vocabulary from the word dictionary based on the result; a speech rate instruction means for issuing an instruction regarding the speech rate; a word dictionary for storing the vocabulary; and a stationary device for detecting the number of stationary parts of the input speech. an instruction to prompt the speech rate instructing means to issue an instruction when the number of vowels constituting the word obtained by the word determining means and the number of steady parts obtained by the steady part detecting means are not equal; A voice input device comprising: activation means.