JPS5879297A

JPS5879297A - Voice recognition equipment

Info

Publication number: JPS5879297A
Application number: JP17764581A
Authority: JP
Inventors: 達伊福部; 積田　伸夫; 道夫倉田
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 1981-11-05
Filing date: 1981-11-05
Publication date: 1983-05-13

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】この発明は音声認識装置に関し、％に補助Ｖ繊装置とし
てバラトゲラフを利用した単音節音声認識装置に関する
。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition device, and more particularly to a monosyllabic speech recognition device using a balatgelaf as an auxiliary V-fiber device.

この発明では、人工ｒ：ＪＩＩｌｙらの舌の動きの情報
を動的に解析して入力音声を４つの子音グループに分類
し、この分類結果を４種の＝−ドによって音声ｉｌ！！
繊装置へ転送する。そして、音声認識装置は独自ｋｌｌ
識した所定類似度以上の単音節のうち、上記子音グルー
プに−Ｔ一致し、かつ類φ度の最も高いものを入力音声
の該幽コードとして出力するようにしている。これによ
り従来識別困難であった子音のうち、舌の動きの異なる
ものが明確に！＆！！！識できるようになり、この結果
、個人により差はあるがａｍ率を従来よりも２〜５−向
上することができた。In this invention, artificial r: JIIly et al.'s tongue movement information is dynamically analyzed to classify input speech into four consonant groups, and the classification results are converted into speech il! by four types of =-do. !
Transfer to fiber processing equipment. And the voice recognition device is a proprietary kll.
Among the identified single syllables having a predetermined degree of similarity or higher, the one that matches -T with the consonant group and has the highest degree of similarity φ is output as the yu code of the input speech. As a result, among the consonants that were previously difficult to identify, consonants with different tongue movements can now be clearly identified! &! ! ! As a result, we were able to improve the AM rate by 2 to 5 points compared to the conventional method, although it differs from person to person.

すなわち、この１発明は第１“図に示すように１マイク
ｐホン１を介して入力される単音節音声（ＶＳ）の特徴
パラメータを抽出中１パラメータ抽出装置２と、人工口
蓋５からのパラトゲラフ信号ＰＧを解析して該尚する子
音グループコードＧＣを出力するバットグラフ解析装置
４と、パラメータ抽出装置２で抽出された特徴パラメー
タＦＰと内部記憶装置に予め格納されている単音節特徴
パラメータとを比較するととｋより得られる所定類似度
以上の単音節コードのうち、バットグラフ解析装置４か
らの子音グループコードＧＣｋ子音部分が一致し、かつ
類似度の最も高いものを該幽単音節コードＭｅとして出
力する音声認識装置３と、音声ａｍ装置３からの単音節
；電−ドＭＣを表示する表示装置６とを具備して−１［
＋、Ｌかして、パラメータ抽出装置２は第２図に示すよ
うに、音声信号■Ｓを増幅して前処理する前処理回路４
と、前処理された音声信号ＶＳＡを互いに中心周波数の
異なる各帯域に分割する帯域通過フィルタ群ｎと、分割
された各帯域信号を制御信号０８１によって順次選択す
るチャネル選択回路塾と、選択されたチャネル信号ＣＨ
を制御信号０８２によって所定のタイ電ングでサンプリ
ングするサンプリング（ロ）路冴とを具備している。ま
た、バラトゲラフ解析装［４は人工口蓋５からのパラト
ゲラフ信号ＰＧを一定周期間隔で送出する人工口蓋制御
装置４１と、この出力情報を所要データにターするデー
タ変換回路４２と、音声Ｉａｌ１１１！装置３か６づ母
音部開始信号Ｃ８３及び転送要求信号Ｃ８４によって子
音グループコードＧＣを出力する子音グループ分類装置
招とで構成されている。なお、音声認識装置３にはｖ４
認訂正用のキーボード等が接続されており、表示装＃６
を見ながら容易に１ａｌａ！訂正を行ない得るようにな
っている。That is, this first invention consists of a first parameter extracting device 2 for extracting feature parameters of monosyllabic speech (VS) inputted through a microphone P-phone 1, and a paratogelatus from an artificial palate 5 as shown in FIG. A bat graph analysis device 4 that analyzes the signal PG and outputs the corresponding consonant group code GC, and a feature parameter FP extracted by the parameter extraction device 2 and a monosyllabic feature parameter stored in advance in an internal storage device. Among the monosyllabic codes with a predetermined degree of similarity or higher obtained from and k, the consonant group code GCk from the batgraph analysis device 4 matches the consonant part and has the highest degree of similarity, as the monosyllabic code Me. -1 [
As shown in FIG. 2, the parameter extraction device 2 includes a preprocessing circuit 4 that amplifies and preprocesses the audio signal
, a band pass filter group n that divides the preprocessed audio signal VSA into bands having different center frequencies, a channel selection circuit that sequentially selects each divided band signal by a control signal 081, and a selected Channel signal CH
The control signal 082 includes a sampling circuit for sampling the signal at a predetermined timing according to the control signal 082. In addition, the balatogelaf analysis device [4 includes an artificial palate control device 41 that sends out a paratogelaf signal PG from the artificial palate 5 at regular intervals, a data conversion circuit 42 that converts this output information into required data, and a voice Ial111! The device consists of a consonant group classification device that outputs a consonant group code GC in response to a vowel part start signal C83 and a transfer request signal C84. Note that the voice recognition device 3 is v4.
A keyboard for verification and correction is connected, and display #6
1ala easily while watching! Corrections can be made.

このような構成にお（ｆ′、マイクロホン１からの音声
信号■Ｓはパラメータ抽出装置２内の前処理回路２１に
よって増幅及び前処理され、互いに中心周波数の異なる
帯域通過フィルタ群乙に与えられる。ここで各帯域に分
割された音声信号はチャネル選択回路乙によって順次選
択され、後段のサンプリング回路Ｕに送られてサンプリ
ングされる。In such a configuration (f'), the audio signal S from the microphone 1 is amplified and preprocessed by the preprocessing circuit 21 in the parameter extraction device 2, and is applied to a group of band pass filters B having mutually different center frequencies. Here, the audio signal divided into each band is sequentially selected by a channel selection circuit B, and sent to a subsequent sampling circuit U for sampling.

しかして、チャネル選択回路ハ及びサンプリング回路ス
は音声認識装置３からの制御信号Ｃ８ｌ及びＣ８２によ
ってタイ々ング制御され、各帯域に分割された音声信号
は時間的に直列に並べられ、ディジタル情報の特徴パラ
メータとして音声ｉｇ峻装置３へ出力される。The timing of the channel selection circuit C and the sampling circuit C is controlled by the control signals C8l and C82 from the voice recognition device 3, and the voice signals divided into each band are arranged in series in time, and the digital information is It is output to the audio ignition device 3 as a feature parameter.

一方、パラトゲツク解析装置４は発声した際の人工口蓋
５からのパラトゲラフ信号ＰＧのうち、発声の子音部分
についてのみ入力し、時間的な情報の変化から舌の動き
を解析する。すなわち、人工口ＩＩ８の全面に亙って配
量されている端子のビット情報（たとえば全６３ビツト
）を、時間的直列に一定周期間隔で送出する人工口蓋制
御装置４１からの情報を、後段の子音グループ分類装置
４３０入力形態に合せてデータ変換回路４２で変換整理
する。On the other hand, the paratogek analysis device 4 inputs only the consonant part of the utterance out of the paratogelaph signal PG from the artificial palate 5 at the time of utterance, and analyzes the movement of the tongue based on temporal information changes. In other words, the information from the artificial palate control device 41, which transmits bit information (for example, 63 bits in total) of the terminals distributed over the entire surface of the artificial mouth II8 in time series at regular intervals, is transmitted to the subsequent stage. Consonant group classification device 430 converts and organizes the data in accordance with the input format in data conversion circuit 42.

この場合、たとえば直列ピッＦ列情報を８ビット単位に
バックして送出する。しかして、子音グループ分類装雪
銘内では次に述べるような要領で、入力の子音グループ
券類を行なう。In this case, for example, the serial PIF column information is backed up in 8-bit units and sent out. Therefore, in the consonant group classification inscription, the input consonant group classification is performed in the following manner.

そもそも、通常＄ソトグラフは子音を決定する重要な要
素の１つである舌の位置及び動きを把握するために用い
られ、発声学の立場から広く利用・研究されているが、
この発明では舌の位置及び動きの情報をパターン解析す
ることにより、４種の子音グループに分類する。ここに
おいて、バラトゲラフを用いた場合、発声された音声は
大別して舌が人工口蓋５の上面に接触するもの％　ｓ　
＃。In the first place, $sotographs are usually used to understand the position and movement of the tongue, which is one of the important elements that determine consonants, and are widely used and studied from the standpoint of phonetics.
In this invention, information on the position and movement of the tongue is pattern-analyzed to classify consonants into four types of consonant groups. Here, when using the Balatgelaf, the uttered sounds can be broadly classified into those in which the tongue contacts the upper surface of the artificial palate 5.
#.

１　・　ｎ　・　ｙ′・＊ｒＩ、％、１．％、１１と、
接触しないもの’ｗ’、’に’、’ｈ’、’ｍ’、％ｂ
ｌ。1・n・y′・*rI,%,1. %, 11 and
Non-contact 'w', 'to', 'h', 'm', %b
l.

％ｐＩ、％ｇＩ　　とに分けると−おができ、また、舌
が接触するものでも、舌の位Ｍより更に細かく分類する
ことが可能である。しかし、舌の動きには個人差がある
こと、かつこの発明では音％Ｉ認識の補豐としてバラト
ゲラフを用いることなどの理由から、はぼ確実に分類可
能で、十分に補助装置としての役割を持つ表１のような
４ｍ’ｌの子音グループに分類している。この表１では
、そのグルービング開繊率をも併せて示している。If it is divided into %pI and %gI, it can be classified into sores, and even things that come into contact with the tongue can be classified more finely than the tongue position M. However, because there are individual differences in the movement of the tongue, and because this invention uses the Balatgelaf as a supplement for sound %I recognition, it is possible to classify it with certainty, and it does not play a sufficient role as an auxiliary device. It is classified into 4 m'l consonant groups as shown in Table 1. Table 1 also shows the grooving fiber opening ratio.

しかして、発声する場ｙの動きと音声パワーの関係は第
４回置のようになるが、この発明では子音部分に対応す
るバラトゲラフパターンのみラドグラフパターンとし、
このパターンが第４図（ハ）に示す４種のいずれに該当
するかを判定するのである。かかる４種のグループに分
類するために１子音グル一プ分類装置６は、音声認識装
置３からの母音部開始信号Ｃ８３を受信するまで常に一
定間隔でバラトゲラフ情報を入力し続け、母音部開始信
号Ｃ８３を入力した時点から上述のグループ分類動作を
始め、分類結果を各子音グループ固有の子音グループコ
ードＧＣとして用意し待機する。その彼、％微パラメー
タの比較により独自の認識結果を得た音声認識装’ｆＲ
ｓからの転送要求信号Ｃ８４を受けると、用意しｔμい
た子音グへ−プコードＧＣを音声認識装ｆ３へ出力する
。Therefore, the relationship between the movement of the utterance field y and the voice power is like that of the fourth position, but in this invention, only the Balatogelaph pattern corresponding to the consonant part is used as a Radograph pattern,
It is determined which of the four types shown in FIG. 4(c) this pattern corresponds to. In order to classify into these four types of groups, the one-consonant group classification device 6 continues to input baratogeraf information at regular intervals until it receives the vowel part start signal C83 from the speech recognition device 3. The above-described group classification operation is started from the time when C83 is input, and the classification result is prepared as a consonant group code GC unique to each consonant group and is put on standby. He is a voice recognition system 'fR' that has obtained unique recognition results by comparing %fine parameters.
When it receives the transfer request signal C84 from s, it outputs the prepared consonant group code GC to the speech recognition device f3.

音声認識装［１３は内蔵した記憶装置〆に各所定単音節
の特徴パラメータをその単音節コードと共に記憶してお
り、パラメータ抽出装置２からの入力音声の％微パラメ
ータＦＰがどの単音節に類似しているかを判断し、所定
１ＩＩ４ＥＪ度以上の単音節コードを選び出す。詳述す
れば、音声Ｖ！識装倉３は制御信号Ｃ８１及びＣ８２に
よってパラメータ抽出装置２を同期させ、帯ψ分割され
た音声信号を特徴パラメータＰＰとして入力する。そし
て、入力された特徴パラメータＦＰを内部記憶装置に順
次格納すると同時に、その特徴パラメータＦＰの値等か
ら入力音声の母音開始漬を判定する。音声Ｖ識装ｆ１１
３は入り音声が早讐発声から母音′発声に移行したこと
を確藺すると、既に一定間隔でバラトゲラフデータを入
力し続けているバラトゲラフ解析装置４に母音部開始信
号Ｃ８３を出力する。その後、音声ｉｉ繊装置３は引続
いて母音部分の特徴パラメータを６定時間サンへし、発
−４食体の４１１１パラメータを入力する。奪Ｗ％特徴
−ラメータの正規化や部分的抽出等の類似度計算に至る
までの前処理を必ｌＮｋ応じて行ない、いわゆるパター
ンマツチング法により所定単音節特徴パラメータとの類
似度を算出する。この結果、類似度が予め定めた一定値
以上である一種以上の単音節コードを選出し、この時点
で子音グループ分類を終了させると共に、待機中のバラ
トゲラフ解析装置４へ転送要求信号Ｃ８４を送用して子
音グループコードＯＣを入力する。しかして、その前段
階に選出しておいた単音節＝−ドが有する子音部分（た
とえば１に１＃→％ｋＩ　、％、Ｃｌ−４％、＃　）と
子音グループコードＧＣとを比較し、選出された単音節
コードのうち子音グループコードＧＣと一致するものを
選び、１つであればそれを認識結果とし、複数個が一致
、音律’る場合にはそれらのうちＩｊｊ仰度の最も高い
もσを認識結果とする。以上のような手順で入力音声に
該当する。単音節コードＭｅを決定した後、宍示装ｆＩ
６にその単音節コーｙｙｃｙ該当する文字を表示する。The speech recognition device [13] stores the characteristic parameters of each predetermined monosyllable along with its monosyllable code in its built-in storage device, and determines which monosyllable the % fine parameter FP of the input speech from the parameter extraction device 2 is similar to. The monosyllabic code having a predetermined degree of 1II4EJ is selected. To be more specific, audio V! The identification storage 3 synchronizes the parameter extraction device 2 with control signals C81 and C82, and inputs the band ψ-divided audio signal as the characteristic parameter PP. Then, the input feature parameters FP are sequentially stored in the internal storage device, and at the same time, the vowel start pitch of the input speech is determined from the value of the feature parameters FP. Voice V identification device f11
When it is confirmed that the incoming voice has shifted from early utterance to vowel utterance, step 3 outputs a vowel part start signal C83 to the balat geraf analysis device 4 which has already been inputting balat geraf data at regular intervals. Thereafter, the voice II fiber device 3 successively inputs the characteristic parameters of the vowel part for 6 fixed times, and inputs the 4111 parameters of the pronunciation-4 food. Pre-processing up to similarity calculation, such as normalization of the deprived W% feature-parameter and partial extraction, is performed as required, and the similarity with a predetermined monosyllabic feature parameter is calculated by a so-called pattern matching method. As a result, one or more monosyllabic codes whose degree of similarity is equal to or higher than a predetermined value are selected, and at this point, the consonant group classification is finished, and a transfer request signal C84 is sent to the standby Balatgeraf analysis device 4. and enter the consonant group code OC. Then, the consonant part of the monosyllable =-do selected in the previous stage (for example, 1 to 1# → %kI, %, Cl-4%, #) is compared with the consonant group code GC, Among the selected monosyllabic codes, select one that matches the consonant group code GC, and if there is one, it is the recognition result, and if multiple codes match or temperament, the one with the highest Ijj elevation among them is selected. Let σ be the recognition result. The above procedure corresponds to input audio. After determining the monosyllabic code Me, the Shishisho fI
6 displays the character corresponding to that monosyllabic yycy.

第５図は該尚単音節を最終済挙呼る様子を、％　ｄａ１
と発声入力した場合を例として模式的に示したものであ
る。円の中心に位置するのが％　ｄａ　Ｉと入力された
音声の％徴パラメータであり、円は予め定めた一定の類
似度を示している。しかして、バラトゲラフを用いない
場合は類似度の最も高い′２１′が骸尚単音節となるの
で駒間となるが、子音グループコードとして’ｔ、ｄ、
ｎ’　が与えられれば円内の％　ｄａ　＃と％　ｎａ　
＃のうち入力音声に近い方が選択され、正しく’ｄａ’
と９識されるのである。このように、パラトゲラフから
の舌の位置及び動きの情報を補助的に用いることにより
、認識率は２〜５ｑＩＩ向上する。Figure 5 shows how the single syllable is called in its final form, % da1
This is a schematic diagram illustrating the case where the user inputs the following utterance as an example. The center of the circle is % da I, which is the % signature parameter of the input voice, and the circle indicates a predetermined degree of similarity. However, if you do not use baratogerahu, '21', which has the highest degree of similarity, becomes a single syllable, so it becomes a komama, but as a consonant group code, 't, d,
Given n', % da # and % na in the circle
The one closest to the input voice is selected, and 'da' is correctly displayed.
9. In this way, by supplementarily using the information on the position and movement of the tongue from the paratogelaph, the recognition rate is improved by 2 to 5qII.

以上のようにこの発明によれば、音声認識の補助装置と
してパラトゲラフを用・いると共に、子音を４種にグル
ープ化して９！徴するようにしているので、音声の確爽
な単音節ＩＰ！轍を達成することができる。As described above, according to the present invention, paratogelaf is used as an auxiliary device for speech recognition, and consonants are grouped into 4 types, resulting in 9! Because it is designed to have a unique character, it is a monosyllabic IP with a clear voice! A rut can be achieved.

[Brief explanation of drawings]

第１図はこの発明の一実施例を示すブーツク構威図、第
２図はこの発明に用いるパラメータ抽出装置の一例を詳
細に示すブロック構成図、第３図はこの発明に用いるパ
ラトゲラフ解析装置の一例を詳細に示すプルツク構成図
、第４図＾、（ロ）はその動作を説明するための図、第
５図はこの発明による単音節決定の様子を示す図である
。１・・・マイクロホン、２・・・パラメータ抽出装置、
３・・・音声認識装置、４・・・パラトゲラフ解析装置
、５・・・人工口蓋、６Ｖ−表示装置。FIG. 1 is a block configuration diagram showing an embodiment of the present invention, FIG. 2 is a block diagram showing details of an example of a parameter extraction device used in this invention, and FIG. 3 is a block diagram showing a detailed example of a parameter extraction device used in this invention. A block diagram showing an example in detail, FIGS. 4(a) and 4(b) are diagrams for explaining its operation, and FIG. 5 is a diagram showing how a single syllable is determined according to the present invention. 1...Microphone, 2...Parameter extraction device,
3... Speech recognition device, 4... Paratogelaf analysis device, 5... Artificial palate, 6V-display device.

Claims

[Scope of Claims] a. A parameter extraction device for extracting characteristic parameters of an input single-tone cylinder sound and voice; The Balatgeraf analysis device to output, C0, a monosyllabic code with a predetermined degree of similarity or more obtained by comparing the feature parameters extracted by the parameter extraction device and the monosyllabic special parameters stored in advance in the internal storage device. a speech recognition device that outputs a consonant part that matches the consonant group code from the Balatgelaf analysis device and has the highest degree of similarity as the monosyllabic code; d. a monosyllabic code from the speech recognition device; An audio cabinet meeting device characterized by comprising: a display device that displays; and a display device that displays.