JPS61183697A

JPS61183697A - Monosyllable voice recognition equipment

Info

Publication number: JPS61183697A
Application number: JP60023858A
Authority: JP
Inventors: 謙二松井
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1985-02-08
Filing date: 1985-02-08
Publication date: 1986-08-16
Also published as: JPH0569240B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は予め登録しである単音節音声の標準）くタンを
用いて入力された単音節音声を認識する単音節音声認識
装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a monosyllabic speech recognition device that recognizes input monosyllabic speech using a pre-registered monosyllabic speech standard.

従来の技術従来の単音節音声認識装置としては、例えば、日本音響
学会音声研究会資料１小型単音節認識装置の開発”５ｅ
ａ−４１，１９８３−１０に示されている。Conventional technology Conventional monosyllabic speech recognition devices include, for example, the Acoustical Society of Japan Speech Study Group Material 1 Development of a Small Monosyllabic Recognition Device” 5e.
a-41, 1983-10.

第３図は、この従来の単音節音声認識装置の構成を示す
ものであり、１は、音声をディジタル化された電気信号
に変換する入力部、２は上記入力部１でディジタル化さ
れた音声信号から周波数分析等の手段によって音声の特
徴を抽出する特徴抽出部、３は、単音節入力パタンの母
音部と子音部の境界を検出して母音部と子音部を分離す
る母音−子音境界検出部、４は、分離された母音パタン
を記憶しておく母音入力パタン記憶部、５は分離された
子音パタンを記憶しておく子音入力パタン記憶部、６は
、あらかじめいくつかの母音パタンを記憶しておく母音
標準パタン記憶部、７は、あらかじめいくつかの子音パ
タンを記憶しておく子音標準パタン記憶部、８は、上記
母音入力パタン記憶部４に格納されたパタンと上記母音
標準パタン記憶部６との類似度を計算する母音類似度計
算部、９は、上記子音入力パタン記憶部６に格納さ音類
似度計算部８と子音類似度計算部９との出力結果より入
力パタンがどの単音節に認識されたかを決定し出力する
出力部である。FIG. 3 shows the configuration of this conventional monosyllabic speech recognition device, in which 1 is an input section for converting speech into a digitalized electrical signal, and 2 is an input section for converting speech into a digitalized electrical signal. 3 is a feature extraction unit that extracts voice features from a signal by means such as frequency analysis; 3 is a vowel-consonant boundary detection unit that detects the boundary between a vowel part and a consonant part of a monosyllabic input pattern and separates the vowel part and the consonant part; Section 4 is a vowel input pattern storage section for storing separated vowel patterns; 5 is a consonant input pattern storage section for storing separated consonant patterns; and 6 is for storing several vowel patterns in advance. 7 is a consonant standard pattern storage section that stores some consonant patterns in advance; 8 is a storage section that stores the patterns stored in the vowel input pattern storage section 4 and the vowel standard pattern storage section; A vowel similarity calculation unit 9 calculates the similarity between the input pattern and the consonant similarity calculation unit 9, which is stored in the consonant input pattern storage unit 6. This is an output unit that determines whether the syllable is recognized and outputs it.

上記のように構成された単音節音声認識装置について、
以下具体的に動作を説明する。Regarding the monosyllabic speech recognition device configured as above,
The operation will be specifically explained below.

発声された単音節音声は、入力部１に入力され、入力部
１では、この音声信号を電気信号に変換し、さらにディ
ジタル化する。この信号は、特徴抽出のディジタルフィ
ルタによって周波数分析した様子である。このように特
徴抽出された音声パタンは、母音−子音境界検出部３に
おいて母音部と子音部に分離される。この様子と第４図
ａに示す。The uttered monosyllabic speech is input to the input section 1, which converts this speech signal into an electrical signal and further digitizes it. This signal is frequency-analyzed using a digital filter for feature extraction. The speech pattern whose features have been extracted in this way is separated into a vowel part and a consonant part by the vowel-consonant boundary detection section 3. This situation is shown in Figure 4a.

母音−子音境界検出の方法は各種考えられるが、ここで
は例として音声の電力パタンに台形状のテンプレートを
あてはめる方法を示す。即ち第４図すに示すように音声
の電力パタン１１とテンプレート１２との距離１３を計
算し最小の点を母音子音境界とする。このようにして分
離された母音パタンと子音パタンは、それぞれ母音入力
パタン記憶部４及び子音入力パタン記憶部５に格納され
る。Various methods of vowel-consonant boundary detection are possible, but here, as an example, a method of applying a trapezoidal template to the power pattern of the voice will be described. That is, as shown in FIG. 4, the distance 13 between the speech power pattern 11 and the template 12 is calculated, and the minimum point is taken as the vowel/consonant boundary. The vowel patterns and consonant patterns thus separated are stored in the vowel input pattern storage section 4 and the consonant input pattern storage section 5, respectively.

認識装置が登録モードの時には、母音入力パタン記憶部
４及び子音入力パタン記憶部６の入力パタンはそれぞれ
、母音標準パタン記憶部６及び子音標準パタン記憶部７
に格納される。When the recognition device is in the registration mode, the input patterns in the vowel input pattern storage section 4 and the consonant input pattern storage section 6 are stored in the vowel standard pattern storage section 6 and the consonant standard pattern storage section 7, respectively.
is stored in

次に認識モードの時には、母音類似度計算部８により母
音入力パタンに最も近い母音標準パタンに対応する番号
が出力部１０に出力され、同様に、子音類似度計算部９
により子音入力パタンに最も近い子音標準パタンに対応
する番号が出力部１０に出力される。出力部１ｏは、子
音認識結果及び母音認識結果を統合して単音節認識結果
を出力する。Next, in the recognition mode, the vowel similarity calculation section 8 outputs the number corresponding to the vowel standard pattern closest to the vowel input pattern to the output section 10;
Accordingly, the number corresponding to the consonant standard pattern closest to the consonant input pattern is output to the output unit 10. The output unit 1o integrates the consonant recognition results and the vowel recognition results and outputs a monosyllable recognition result.

発明が解決しようとする問題点上記の構成による単音節認識装置では、次の様な問題が
生ずる。即ち、母音−子音境界検出部３が境界位置を大
巾に誤る危険性が存在することである。第６図は、母音
−子音境界検出部３が境界検出を誤った場合を示したも
のである。このように誤って境界を検出した場合、子音
部、母音部とも誤認識の大きな要因となる。Problems to be Solved by the Invention In the monosyllable recognition device having the above configuration, the following problems occur. In other words, there is a risk that the vowel-consonant boundary detection section 3 will make a large error in the boundary position. FIG. 6 shows a case where the vowel-consonant boundary detection section 3 makes a mistake in boundary detection. If a boundary is detected incorrectly in this way, it becomes a major cause of misrecognition for both consonant and vowel parts.

本発明はかかる点に鑑み、母音−子音境界位置の検出誤
差を小さくし認識率の向上を図ることができる単音節音
声認識装置を提供することを目的とする。In view of this, an object of the present invention is to provide a monosyllabic speech recognition device that can reduce the detection error of the vowel-consonant boundary position and improve the recognition rate.

問題点を解決するための手段本発明は、母音−子音境界検出に先だって、単音節音声
の特徴量の変化の累積関数を計算し、単音節音声の特徴
量の変化の累積がある一割の量増加した点を、仮の母音
−子音境界とする累積関数計算部を設け、母音−子音境
界範囲を限定するものである。Means for Solving the Problems The present invention calculates a cumulative function of changes in feature amounts of monosyllabic speech prior to vowel-consonant boundary detection, and calculates a cumulative function of changes in feature amounts of monosyllabic speech. A cumulative function calculating section is provided which uses the point at which the amount has increased as a temporary vowel-consonant boundary, thereby limiting the vowel-consonant boundary range.

作　　用本発明は、入力された単音節音声に対して、累積関数を
算出すれば、−足の範囲内に母音子音境界部の存在を限
定出来る。従って、母音−子音境界検出部３が誤った場
合においても、その誤りは、一定の範囲内に限定するこ
とが出来るので、誤認識の要因となる大巾なずれは、回
避でき、認識率の向上が図れる。Function: The present invention can limit the existence of vowel-consonant boundaries within the range of - foot by calculating a cumulative function for input monosyllabic speech. Therefore, even if the vowel-consonant boundary detection unit 3 makes an error, the error can be limited to within a certain range, so large deviations that can cause misrecognition can be avoided, and the recognition rate can be improved. Improvements can be made.

実施例第１図は本発明の実施例における単音節音声認識装置の
構成を示すものである。同図において、１は入力部、２
は特徴抽出部、３は母音−子音境界検出部、４け母音入
力パタン記憶部、６は子音入力パタン記憶部、６は母音
標準パタン記憶部、以上は第３図と同様なも゛のである
。第３図の構成と異なるのは、累積関数計算部を、特徴
抽出部２と母音−子音境界検出部３の間に設けた点であ
る。Embodiment FIG. 1 shows the configuration of a monosyllabic speech recognition apparatus according to an embodiment of the present invention. In the figure, 1 is an input section, 2
3 is a feature extraction section, 3 is a vowel-consonant boundary detection section, 4-digit vowel input pattern storage section, 6 is a consonant input pattern storage section, and 6 is a vowel standard pattern storage section, which is the same as that shown in FIG. 3. . The difference from the configuration shown in FIG. 3 is that the cumulative function calculation section is provided between the feature extraction section 2 and the vowel-consonant boundary detection section 3.

上記のように構成された本発明の実施例の単音節音声認
識装置について、以下その動作を説明する。The operation of the monosyllabic speech recognition device according to the embodiment of the present invention configured as described above will be described below.

発声された単音節音声は、入力部１に入力され、入力部
１では、この音声信号を電気信号に変換し、さらにディ
ジタル化する。この信号は、特徴抽出部２に於て同波数
分析等の手段によって特徴抽出される。第２図ａは例と
して単音節音声を１６個のディジタルフィルタによって
周波数分析した様子である。このように特徴抽出された
音声パタンは、累積関数計算部１１において、累積関数
が算出される。The uttered monosyllabic speech is input to the input section 1, which converts this speech signal into an electrical signal and further digitizes it. Features of this signal are extracted in the feature extraction section 2 by means such as wave number analysis. FIG. 2a shows, as an example, the frequency analysis of monosyllabic speech using 16 digital filters. A cumulative function calculation unit 11 calculates a cumulative function of the voice pattern whose features have been extracted in this way.

即ち、単音節音声をある一定時間Ｔ毎に平均化した周波
数スペクトルパタン系列に変換し、これをＦ　（ｎ）と
するとＦ（ｎ）＝（！０（ｎ）、！１（ｎ）、・−−−−・、
！ｒｎ（ｎ））　　　　・−・・・−（１）ここでｎは
時間間隔Ｔを１フレームとした時のフレーム番号、ｘｉ
（ｎ）（ｉ＝０．１　、・・・・・・、　ｍ　）は、ｎ
フレームの周波数成分を示す。次に隣り合うフレーム間
のＩｉ　の変化量を累積し、これを累積関数ａｃｃｆ（
ｎ）とすると、次に単音節音声の取込みを終了した時点での上記累積関
数の値をａｃｃｆ　（Ｎ）、正しい母音−子音境界位置
をｎｓ　フレーム目とすると、次のような値α、εを定
めることができる。That is, if monosyllabic speech is converted into a frequency spectrum pattern series averaged over a certain period of time T, and this is denoted as F (n), then F (n) = (!0 (n), !1 (n), · -----・,
! rn(n)) ・−・−(1) Here, n is the frame number when the time interval T is one frame, xi
(n) (i=0.1,..., m) is n
Indicates the frequency components of the frame. Next, the amount of change in Ii between adjacent frames is accumulated, and this is calculated using the accumulation function accf(
n), then if the value of the above cumulative function at the time when monosyllabic speech is finished being captured is accf (N), and the correct vowel-consonant boundary position is the nsth frame, then the following values α, ε are obtained. can be determined.

ａｃｃｆ　（ｎａ）＝ａｃｃｆ　（Ｎ）　Ｘ　−、旧、
、（３）α ｎｃＬ−ε＜　ｎ　ｓ　＜　ｎａ＋ε　　　　　　　　
・・・・・・（４）第２図すは例として第２図ａの累積
関数を求めたものである。次に、あらかじめ実験的に決
められた上記式（３）９式（４）のα、εによって、母
音−子音境界の存在範囲式（４）を決定する。第２図Ｃ
は、例として第２図すの累積関数からのα＝２．ε＝１
６として母音−子音境界の存在範囲を決定する様子を示
したものである。次に、この存在範囲の情報は、母音−
子音境界検出部３に送られ、上記の存在範囲の間で母音
−子音境界が決定され、母音部と子音部が分離される。accf (na)=accf (N) X −, old,
, (3) α ncL−ε< ns < na+ε
(4) Figure 2 shows, as an example, the cumulative function of Figure 2a. Next, the vowel-consonant boundary existence range equation (4) is determined using α and ε of the above equations (3), 9, and (4), which are experimentally determined in advance. Figure 2C
For example, α=2. from the cumulative function in Figure 2. ε=1
6 shows how the range of existence of the vowel-consonant boundary is determined. Next, information on this range of existence is determined by the vowel −
It is sent to the consonant boundary detection unit 3, where a vowel-consonant boundary is determined between the above existing ranges, and the vowel part and the consonant part are separated.

この様子を第２図ｄに示す。ここでは、従来例と同様に
音声の電力パタンに台形状のテンプレートをあてはめる
方法を示す。This situation is shown in FIG. 2d. Here, a method of applying a trapezoidal template to the audio power pattern as in the conventional example will be described.

このようにして分離された母音パタンと子音パタンはそ
れぞれ母音入力パタン４及び子音入力パタン５に格納さ
れる。The vowel patterns and consonant patterns thus separated are stored in vowel input patterns 4 and consonant input patterns 5, respectively.

認識装置が登録モードの時には、母音入力パタン記憶部
４及び子音入力パタン記憶部５の入力パタンは、それぞ
れ、母音標準パタン記憶部６及び子音標準パタン記憶部
７に格納される。When the recognition device is in the registration mode, the input patterns in the vowel input pattern storage section 4 and the consonant input pattern storage section 5 are stored in the vowel standard pattern storage section 6 and the consonant standard pattern storage section 7, respectively.

次に認識モードの時には、母音類似度計算部８により母
音入力パタンに最も近い母音標準パタンに対応する番号
が出力部１０に出力され、同様に、子音類似度計算部９
により子音入力パタンに最も近い子音標準パタンに対応
する番号が出力部１゜に出力される。出力部１ｏは、子
音認識結果及び母音認識結果を統合して単音節認識結果
を出力する。Next, in the recognition mode, the vowel similarity calculation section 8 outputs the number corresponding to the vowel standard pattern closest to the vowel input pattern to the output section 10;
Accordingly, the number corresponding to the consonant standard pattern closest to the consonant input pattern is output to the output unit 1°. The output unit 1o integrates the consonant recognition results and the vowel recognition results and outputs a monosyllable recognition result.

以上説明したように本発明によれば、累積関数計算部１
１を設けることにより、母音−子音境界の存在範囲を容
易に限定でき、母音−子音境界検出誤差を低減し認識率
の向上が図れる。As explained above, according to the present invention, the cumulative function calculation unit 1
1, it is possible to easily limit the existence range of the vowel-consonant boundary, reduce the vowel-consonant boundary detection error, and improve the recognition rate.

発明の効果以上のように本発明によれば、母音−子音境界検出に先
だって、累積関数の計算を行うことにより、母音７子音
境界の存在範囲を容易に限定することができ、母音−子
音境界検出誤差を小さな範囲におさえることができる。Effects of the Invention As described above, according to the present invention, by calculating the cumulative function prior to detecting the vowel-consonant boundary, it is possible to easily limit the existence range of the vowel-7 consonant boundary, and to detect the vowel-consonant boundary. Detection errors can be kept within a small range.

これより、認識率の安定、向上が図れ、その実用的効果
は大きい。This makes it possible to stabilize and improve the recognition rate, which has great practical effects.

[Brief explanation of drawings]

第１図は本発明における一実施例の単音節音声認識装置
のブロック図、第２図は同実施例の動作説明図、第３図
は従来の単音節音声認識装置のブロック図、第４図は同
従来例の動作説明図、第６図は同従来例の誤った母音−
子音境界検出の例を示す説明図である。１：・・・・・入力部、２・・・・・・特徴抽出部、３
・・・・・・母音−子音境界検出部、４・・・・・・母
音入力パタン記憶部、６・・・・・・子音入力パタン記
憶部、６・・・・・・母音標準パタン記憶部、７・・・
・・・子音標準パタン記憶部、８・・・・・・母音類似
度計算部、９・・・・・・子音類似度計算部、１０・・
・・・・出力部、１１・・・・・・本発明による累積関
数計算部。FIG. 1 is a block diagram of a monosyllabic speech recognition device according to an embodiment of the present invention, FIG. 2 is an explanatory diagram of the operation of the same embodiment, FIG. 3 is a block diagram of a conventional monosyllabic speech recognition device, and FIG. 4 is an explanatory diagram of the operation of the conventional example, and Fig. 6 is an incorrect vowel of the conventional example.
It is an explanatory diagram showing an example of consonant boundary detection. 1: Input section, 2... Feature extraction section, 3
... Vowel-consonant boundary detection section, 4 ... Vowel input pattern storage section, 6 ... Consonant input pattern storage section, 6 ... Vowel standard pattern storage Part, 7...
. . . Consonant standard pattern storage section, 8 . . . Vowel similarity calculation section, 9 . . . Consonant similarity calculation section, 10.
. . . Output section, 11 . . . Cumulative function calculation section according to the present invention.

Claims

[Claims]

an input unit that converts input monosyllabic speech into an electrical signal;
A feature extraction unit extracts audio features from the output signal of the input unit, and a function is calculated by accumulating the changes in the feature amount of the audio extracted by the feature extraction unit over a certain period of time. a cumulative function calculation unit that sets the vowel-consonant boundary existence range of the monosyllabic speech to be around a point in time where the value obtained by dividing the value at the end of audio capture by a certain constant and the value at a certain point in the cumulative function are equal; , a vowel-consonant boundary detection unit that detects a vowel-consonant boundary within the vowel-consonant boundary existence range determined by the cumulative function calculation unit, and a vowel that stores the vowel pattern separated by the vowel-consonant boundary detection unit. an input pattern storage section, a consonant input pattern storage section that stores the consonant patterns separated by the vowel-consonant boundary detection section, a vowel standard pattern storage section that stores the vowel standard patterns, and a consonant standard pattern storage section that stores the consonant standard patterns. a consonant standard pattern storage section for storing a consonant standard pattern; a vowel similarity calculating section for calculating vowel similarity between the vowel input pattern storage section and the vowel standard pattern storage section; and a consonant standard pattern storage section and the consonant standard pattern storage section. a consonant similarity calculation unit that calculates a consonant similarity with a consonant similarity calculation unit; and an output unit that outputs a recognition result based on the output results of the vowel similarity calculation unit and the consonant similarity calculation unit. Syllable speech recognizer.