JPS6225796A - Voice recognition equipment - Google Patents

Voice recognition equipment

Info

Publication number
JPS6225796A
JPS6225796A JP16511685A JP16511685A JPS6225796A JP S6225796 A JPS6225796 A JP S6225796A JP 16511685 A JP16511685 A JP 16511685A JP 16511685 A JP16511685 A JP 16511685A JP S6225796 A JPS6225796 A JP S6225796A
Authority
JP
Japan
Prior art keywords
vowel
recognition
stationary point
point detection
stationary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP16511685A
Other languages
Japanese (ja)
Inventor
紀代 原
喜一 長谷川
入路 友明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP16511685A priority Critical patent/JPS6225796A/en
Publication of JPS6225796A publication Critical patent/JPS6225796A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 産業上の利用分野 本発明は音声認識装置の改良に関し、特に認識率の向上
に関する。
DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to an improvement in a speech recognition device, and more particularly to an improvement in recognition rate.

従来の技術 音声認識技術はワードプロセッサや計算機への入力等マ
ン・マシンインターフェイスとして実用化が期待されて
いる分野である。
BACKGROUND OF THE INVENTION Speech recognition technology is a field that is expected to be put to practical use as a man-machine interface for input into word processors and computers.

音声認識装置には、入力音声を認識する単位として単音
節(CV% Cは子音、■は母音を表わす)を用いるも
の、cvおよびvcvを用いるもの、音素(CおよびV
)を用いるもの等が考えられる。また、使用者があらか
じめ標準となる音声を発声、登録してから認識処理をは
じめる登録型と、たくさんの発声データをもとに統計処
理を施しti的なパターンを準備しておき使用者の登録
を必要としない不特定型がある。また特徴抽出の方法と
しては線形予測分析やフィルタバンクを用いたものが主
流となっている。ここでは従来例、実施例とともに認識
単位としてCνおよびvcv、特徴抽出法として線形予
測分析を用いた不特定型音声L2.識装置について説明
する。以下図面を用いて、従来の音声認識装置の一例を
説明する。
Speech recognition devices include those that use monosyllables (CV% C represents consonants, ■ represents vowels), those that use cv and vcv, and those that use phonemes (C and V
). In addition, there is a registration type in which the user utters and registers a standard voice in advance and then starts the recognition process, and a registration type in which a ti-like pattern is prepared by performing statistical processing based on a large amount of vocalization data and the user registers it. There are unspecified types that do not require . In addition, the mainstream methods of feature extraction are those using linear predictive analysis and filter banks. Here, as well as the conventional example and the embodiment, Cv and vcv are used as recognition units, and unspecified speech L2. The identification device will be explained below. An example of a conventional speech recognition device will be described below with reference to the drawings.

第3図は不特定型音声認識装置の構成を示すブロック図
である。音声入力端21から入力された音声は特徴抽出
部22に於て、窓長20m5ec、フレームシフト5 
m5ec15次の自己相関法を用いた線形予測分析がな
されて、15個のケプストラム係数および残差パワーの
計16個のパラメータの組として出力される。(線形予
測分析については、マーケル・グレイ著鈴木久喜訳:音
声の線形予測1980年コロナ社参照)。次に無音検出
部23に於て、残差パワーを利用して語頭・語尾および
語中の無音部が決定される。母音認識部24においては
、あらかじめた(さんの発声データを統計処理して得ら
れた母音識別関数(安田三部著社会統計学、2章7節参
照、1969年丸善)の係数を格納した識別関数記憶部
25より係数を読み込み、無音検出部23において検出
された無音部以外の部分について、各フレーム毎母音認
識を行なう。26は定常点検出部で母音認識部24で得
られた各フレーム毎の母音認識結果より支点なものを取
りだして母音定常点列とじて出力する。27は音韻認識
部であらかじめ作成された標準パターン記憶部28から
標準パターンを読みだし入カバターンと叶マツチングを
行ない、その結果距離が最小となる標準パターンを認識
音韻列として出力する。29は単語認識部で9で得られ
た音韻認識結果と記号列表記されて記憶されている単語
辞書30との内容を比較し最終的な単語としての認識結
果を認識結果出力端12に得る。
FIG. 3 is a block diagram showing the configuration of an unspecified speech recognition device. The audio input from the audio input terminal 21 is processed by the feature extraction unit 22 with a window length of 20 m5ec and a frame shift of 5
A linear prediction analysis using the m5ec15th order autocorrelation method is performed and output as a set of 16 parameters including 15 cepstral coefficients and residual power. (For information on linear predictive analysis, see Markel Gray, Translated by Hisaki Suzuki, Linear Prediction of Speech, 1980, Corona Publishing). Next, the silence detection section 23 uses the residual power to determine the beginning, end, and middle of a word. In the vowel recognition unit 24, a discrimination function that stores coefficients of a vowel discrimination function (Social Statistics by Sanbe Yasuda, see Chapter 2, Section 7, Maruzen, 1969) obtained by statistical processing of the vocalization data of Mr. The coefficients are read from the function storage unit 25, and vowel recognition is performed for each frame for parts other than the silent portion detected by the silence detection unit 23. 26 is a stationary point detection unit that recognizes the vowel for each frame obtained by the vowel recognition unit 24. The fulcrum is extracted from the vowel recognition results and output as a vowel stationary point string.27 reads the standard pattern from the standard pattern storage section 28 created in advance by the phoneme recognition section, performs input cover pattern and leaf matching, and outputs the vowel stationary point string. The standard pattern with the minimum result distance is output as a recognized phoneme string. 29 is a word recognition unit that compares the contents of the phoneme recognition result obtained in 9 with the word dictionary 30 stored in the symbol string notation and finalizes the result. A recognition result as a typical word is obtained at the recognition result output terminal 12.

(例えば、三船他:電子通信学会、PRL83−40、
この論文は、特徴抽出手段として線形予測分析ではなく
フィルタバンクを、母音定常点検出手段として各フレー
ムの母音認識結果を用いるかわりに各フレーム間の分散
を用いたものであるが、従来例の一例としてあげる事が
できる。) 発明が解決しようとする問題点 この様な従来の音声認識装置では、母音定常点を検出す
る際の安定性(同じ母音認識結果が継続するフレーム数
などで表わすことができる)を検出する閾値として1つ
の閾値として1つの閾値しか用いていないので発声速度
が早くなると母音定常点の脱落を起こし、遅くなると挿
入を起こして、発声速度の変動に伴い認識率が低下する
という問題点がある。また人間の発声速度が変動するの
は仕方のない事であり、常に同じ発声速度を保つ事は困
難であり、発声者の負担も大きい。
(For example, Mifune et al.: Institute of Electronics and Communication Engineers, PRL83-40,
This paper uses a filter bank instead of linear predictive analysis as a feature extraction means, and uses the variance between each frame instead of using the vowel recognition results of each frame as a vowel stationary point detection means, but this is an example of a conventional example. It can be given as follows. ) Problems to be Solved by the Invention In such conventional speech recognition devices, the stability in detecting vowel stationary points (which can be expressed as the number of frames in which the same vowel recognition result continues) is determined by a threshold value for detecting stability. Since only one threshold value is used as one threshold value, there is a problem that when the speech rate becomes faster, the vowel stationary point is dropped, and when it becomes slower, the vowel stationary point is inserted, and the recognition rate decreases as the speech rate changes. Furthermore, it is unavoidable that a person's speaking speed fluctuates, and it is difficult to always maintain the same speaking speed, which places a heavy burden on the speaker.

本発明はかかる点に鑑みてなされたもので、母音定常点
を検出する際の閾値を数個設け、どの閾値を用いて最初
の定常点が決定されたかを用いて発声速度の簡単な推定
を行ない発声速度に即した定常点検出を行なう事により
、発声速度の変動から生じる母音定常点の挿入、脱落の
削減を計り、さらに発声者の負担の軽減を計る事を目的
とする。
The present invention has been made in view of this point, and it provides several thresholds for detecting vowel stationary points, and uses which threshold is used to determine the first stationary point to easily estimate the utterance rate. The purpose of this method is to reduce the insertion and omission of vowel steady points caused by fluctuations in the speaking speed by detecting steady points according to the speaking speed, and to further reduce the burden on the speaker.

問題点を解決するための手段 本発明は上記問題点を解消するため、母音定常点検出の
際の閾値を数個段階的に設けて、発声速度の推定を行な
い、発声速度を考慮した定常点検出を行なう事で発声速
度の変動に対処する事を目的としたものである。
Means for Solving the Problems In order to solve the above-mentioned problems, the present invention provides several threshold values in stages for vowel steady point detection, estimates the utterance rate, and performs regular inspection taking the utterance rate into consideration. The purpose of this is to deal with fluctuations in the rate of speech by performing the following steps.

作用 本手段の作用は次のようになる。すなわち母音定常点検
出の際の安定度(mMフレーム数)に関する閾値を数個
設けて、まず発声速度の推定を行ないその後発声速度に
あった母音定常点検出を行なうようにしたもので、発声
速度の変動に伴う母音定常点の挿入、脱落を削減し認識
率の向上を計る事が出来る。また発声速度を一定に保つ
必要がないので発声者の負担を軽減することが可能とな
る。
Function The function of this means is as follows. In other words, several thresholds are set regarding the stability (number of mm frames) when detecting a vowel steady point, and the speaking rate is first estimated, and then the vowel steady point is detected according to the speaking speed. It is possible to improve the recognition rate by reducing the insertion and omission of vowel stationary points due to fluctuations in the number of vowels. Furthermore, since it is not necessary to keep the speaking speed constant, it is possible to reduce the burden on the speaker.

実施例 以下、本発明の一実施例について説明する。第1図は本
発明の一実施例について示したブロック図である。音声
入力端lから入力された音声は特徴抽出部2に於て、窓
長2Qmsec、フレームシフト5 m5ec、15次
の自己相関法を用いて線形予測分析され、15個のケプ
ストラム係数および残差パワーの計16個のパラメータ
の組として出力される。次に無音検出部3において、残
差パワーを利用して語頭・語尾および語中の無音部が決
定される。母音認識部4においては、あらかじめた(さ
んの発声データを統計処理して得られた識別関数の係数
を格納した識別関数記憶部5より係数を読み込み、無音
検出部3において検出された無音部以外の部分について
、各フレーム毎母音認識を行なう。6は母音定常点検出
部で、まず発声速度推定部6aにおいて発声速度の推定
を行ない、その結果をもとに定常点検出部6bにおいて
母音認識部4で得られた各フレーム毎母音認識結果列よ
り、安定なものを取り出して定常点列として出力する。
EXAMPLE An example of the present invention will be described below. FIG. 1 is a block diagram showing one embodiment of the present invention. The audio input from the audio input terminal l is subjected to linear predictive analysis in the feature extraction unit 2 using a window length of 2 Q msec, a frame shift of 5 m5 ec, and a 15th order autocorrelation method, and 15 cepstral coefficients and residual power are extracted. are output as a set of 16 parameters in total. Next, the silence detection section 3 uses the residual power to determine the beginning, end, and middle of a word. The vowel recognition unit 4 reads the coefficients from the discriminant function storage unit 5 which stores the coefficients of the discriminant function obtained by statistically processing the utterance data of Mr. 6 is a vowel stationary point detection unit.First, the utterance rate estimation unit 6a estimates the utterance rate, and based on the result, the vowel recognition unit 6b performs the vowel recognition. From the vowel recognition result sequence for each frame obtained in step 4, stable ones are extracted and output as a stationary point sequence.

定常点検出部6については後に詳しく説明する。7は音
韻認識部であらかしめ作成された標準パターン記憶部8
から標準パターンを読み出し入カバターンとDPマツチ
ングを行ないその結果距離が最小となる標準パターンの
音韻を認識結果音韻列として出力する。9は単語認識部
で7で得られた音韻認識結果と記号列表記されて記憶さ
れている単語辞書10との内容を比較して最終的な単語
としての認識結果を認識結果出力端11に得る。
The steady point detection section 6 will be explained in detail later. 7 is a standard pattern storage unit 8 that has been revised and created by the phoneme recognition unit.
The standard pattern is read from the input cover pattern and DP matching is performed, and the phoneme of the standard pattern with the minimum distance is output as a recognition result phoneme string. 9 is a word recognition unit that compares the phoneme recognition result obtained in step 7 with the contents of the word dictionary 10 stored as symbol strings, and outputs the final recognition result as a word to the recognition result output terminal 11. .

次に定常点検出部6におけるアルゴリズムについて詳細
に説明する。第2図は、定常点検出部6(6a、6bと
もに)における処理の概略を示したフローチャートであ
る。母音認識部4で得られた各フレーム毎の母音認識結
果を入力として、同じ母音の″!a続フレーム数を求め
る。例えば、入力母音列が、aaaaaaaaaaaa
eeeeee 1eee i i i iならばa12
e6i4と同一母音の継続フレーム数を求める (ただ
しここで″m続フレーム数が3以下のものは除外される
)、この12.6.4が各々の継続フレーム数(=nf
i)である。(この例の場合nfl、nf2、nf3の
3つの値が存在する) そこで、 nf = max (nfl、nf2、nf3 )とし
て、入カバターン内の継続フレーム数の最大値を求める
。この母音が最も安定性が高いことになる。このnfが
、nf>T)It(閾値1)の時は、比較的発声速度が
遅い仮定して定常点検出を行ない、nf>Tl12(閾
値2、THI > TH2)の時には、発声速度が普通
であると仮定しそれ以外では(nf < TH2)発声
速度は早いと仮定して定常点検出を行なう。その後タイ
ミング処理を用いて定常点の付加、消去処理を行なって
母音定常点として出力する。(タイミング処理とは、2
つの定常点は近すぎたり遠すぎたりする事がないという
仮定のもとに、定常点の付加、消去を行なう。日本語で
は各母音はほぼ等タイミングとなっているという事実を
利用している。本実施例ではτ旧=25、TH2・15
を採用している。) 以上の様に、定常点を検出する際の閾値を段階的に設け
る事により発声速度の推定を行ない、発声速度に即した
定常点検出を行なう。その結果発声速度の変動に伴う母
音定常点の挿入、脱落を削減する事が可能となる。また
発声速度を無理に一定に保つ必要がないので発声者の負
担を軽減する事ができる。
Next, the algorithm in the stationary point detection section 6 will be explained in detail. FIG. 2 is a flowchart showing an outline of processing in the stationary point detection section 6 (both 6a and 6b). Using the vowel recognition results for each frame obtained by the vowel recognition unit 4 as input, calculate the number of "!a" continuation frames of the same vowel. For example, if the input vowel string is aaaaaaaaaaaa
eeeeee 1eee i i i i then a12
Find the number of continuous frames of the same vowel as e6i4 (however, cases where the number of consecutive frames is 3 or less are excluded), and this 12.6.4 is the number of continuous frames for each (=nf
i). (In this example, there are three values, nfl, nf2, and nf3.) Therefore, the maximum value of the number of continuous frames in the input cover turn is determined by setting nf = max (nfl, nf2, nf3). This vowel has the highest stability. When this nf is nf>T)It (threshold 1), stationary point detection is performed assuming that the speaking speed is relatively slow, and when nf>Tl12 (threshold 2, THI>TH2), the speaking speed is normal. Otherwise, steady point detection is performed assuming that (nf < TH2) and the speaking speed is fast. Thereafter, using timing processing, processing for adding and erasing stationary points is performed and output as vowel stationary points. (Timing processing is 2
Stationary points are added and deleted on the assumption that no two stationary points are too close or too far apart. It takes advantage of the fact that in Japanese, each vowel has approximately equal timing. In this example, τ old = 25, TH2・15
is adopted. ) As described above, the speaking rate is estimated by setting the threshold value in stages for detecting the steady point, and the steady point is detected in accordance with the speaking speed. As a result, it is possible to reduce the insertion and omission of vowel stationary points due to variations in speech rate. Furthermore, since there is no need to forcibly keep the speaking speed constant, the burden on the speaker can be reduced.

実施例では、THI、TlI2の2つの閾値を用いたが
、本発明は閾値の個数には何ら制約を受けるものではな
い。また、認識の単位としてcv、  vcシネ特定型
の認識装置について解説したがこれは本発明を何ら制限
するものではない。
In the embodiment, two threshold values, THI and TlI2, are used, but the present invention is not subject to any restrictions on the number of threshold values. Further, although a CV and VC cine specific type recognition device has been described as a recognition unit, this does not limit the present invention in any way.

発明の効果 以上の様に本発明によると、発声速度が簡単な方法で推
定でき発声速度に促した母音定常点検出を行なう事で発
声速度の変動による母音定常点の脱落、挿入の削減が可
能となり認識率の向上を計る事ができる。また発声速度
を無理に一定に保つ必要がないので発声者の負担を軽減
する事が可能となる。
Effects of the Invention As described above, according to the present invention, the speaking rate can be estimated using a simple method, and by detecting vowel steady points based on the speaking rate, it is possible to reduce the omission and insertion of vowel steady points due to fluctuations in the speaking rate. Therefore, it is possible to measure the improvement in the recognition rate. Furthermore, since there is no need to forcibly keep the speaking speed constant, it is possible to reduce the burden on the speaker.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例の音韻認識結果のブロック図
、第2図は本発明のアルゴリズムの概略を示したフロー
チャート、第3図は従来例のブロック図である。 l・・・・・・音声入力端、2・・・・・・特徴抽出部
、3・・・・・・無音検出部、4・・・・・・母音認識
部、5・・・・・・識別関数記憶部、6・・・・・・定
常点検出部、6a・・・・・・発声速度推定部、6b・
・・・・・定常点検出部、7・・・・・・音韻認識部、
8・・・・・・標準パターン記憶部、9・・・・・・単
語認識部、10・・・・・・単語辞書記憶部、11・・
・・・・音声出力端。
FIG. 1 is a block diagram of the phoneme recognition results of an embodiment of the present invention, FIG. 2 is a flowchart showing an outline of the algorithm of the present invention, and FIG. 3 is a block diagram of a conventional example. l...Audio input end, 2...Feature extraction unit, 3...Silence detection unit, 4...Vowel recognition unit, 5... - Discrimination function storage unit, 6... Steady point detection unit, 6a... Speech rate estimation unit, 6b.
...Steady point detection unit, 7...Phonological recognition unit,
8...Standard pattern storage unit, 9...Word recognition unit, 10...Word dictionary storage unit, 11...
...Audio output terminal.

Claims (1)

【特許請求の範囲】[Claims] 音声入力手段、前記音声入力手段から入力された音声に
対し一定時間毎に特徴抽出を行ない特徴パラメータ列を
抽出する特徴抽出手段、前記特徴パラメータ列に対し母
音認識を行なう母音認識手段、前記母音認識結果から安
定部を検出して母音定常点列として出力する定常点検出
手段、認識すべき各音韻に対してあらかじめ準備されて
いる標準パターンを格納しておく標準パターン格納手段
、前記特徴パラメータ列と前記標準パターンの各々を比
較して前記特徴パラメータ列を音韻系列に変換する音韻
認識手段を有し、前記定常点検出手段において定常点を
検出するため閾値を段階的に設けて発声速度を推定した
上で定常点検出を行なうことを特徴とする音声認識装置
a voice input means, a feature extraction means for performing feature extraction on the voice inputted from the voice input means at regular intervals to extract a feature parameter string, a vowel recognition means for performing vowel recognition on the feature parameter string, and the vowel recognition. A stationary point detection means for detecting a stable part from the result and outputting it as a vowel stationary point sequence; a standard pattern storage means for storing a standard pattern prepared in advance for each phoneme to be recognized; It has a phonological recognition means that compares each of the standard patterns and converts the characteristic parameter sequence into a phonological sequence, and the stationary point detection means sets a threshold in stages to detect a stationary point to estimate the speech rate. A voice recognition device characterized by performing stationary point detection on the top of the screen.
JP16511685A 1985-07-26 1985-07-26 Voice recognition equipment Pending JPS6225796A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP16511685A JPS6225796A (en) 1985-07-26 1985-07-26 Voice recognition equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP16511685A JPS6225796A (en) 1985-07-26 1985-07-26 Voice recognition equipment

Publications (1)

Publication Number Publication Date
JPS6225796A true JPS6225796A (en) 1987-02-03

Family

ID=15806213

Family Applications (1)

Application Number Title Priority Date Filing Date
JP16511685A Pending JPS6225796A (en) 1985-07-26 1985-07-26 Voice recognition equipment

Country Status (1)

Country Link
JP (1) JPS6225796A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01162887A (en) * 1987-11-23 1989-06-27 Sued Chemie Ag Additive of bleaching agent
JPH03137285A (en) * 1989-10-18 1991-06-11 Air Prod And Chem Inc Method of decarbonizing and bleaching pulp and treating secondary cellulosic fiber

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01162887A (en) * 1987-11-23 1989-06-27 Sued Chemie Ag Additive of bleaching agent
JPH03137285A (en) * 1989-10-18 1991-06-11 Air Prod And Chem Inc Method of decarbonizing and bleaching pulp and treating secondary cellulosic fiber

Similar Documents

Publication Publication Date Title
CN108198547B (en) Voice endpoint detection method and device, computer equipment and storage medium
CN110211565B (en) Dialect identification method and device and computer readable storage medium
US7177810B2 (en) Method and apparatus for performing prosody-based endpointing of a speech signal
CN110570876A (en) Singing voice synthesis method and device, computer equipment and storage medium
JP4666129B2 (en) Speech recognition system using speech normalization analysis
JP5296455B2 (en) Speaker identification device and computer program
JPS6138479B2 (en)
JPS6225796A (en) Voice recognition equipment
JPH1097285A (en) Speech recognition system
JPS63161499A (en) Voice recognition equipment
JP3058569B2 (en) Speaker verification method and apparatus
JPS6225797A (en) Voice recognition equipment
JPS6355600A (en) Voice recognition equipment
JPS60164800A (en) Voice recognition equipment
JPS60198596A (en) Syllable boundary selection system
JP2760096B2 (en) Voice recognition method
JPS61180300A (en) Voice recognition equipment
JPS63161500A (en) Voice recognition equipment
JPH045699A (en) Nasal sound section detection system
JPS63217399A (en) Voice section detecting system
JPS6355599A (en) Voice recognition equipment
KR960007132B1 (en) Voice recognizing device and its method
JPS62280800A (en) Plosive consonant identification system
JP2000242292A (en) Voice recognizing method, device for executing the method, and storage medium storing program for executing the method
JPS6136798A (en) Voice segmentation