JPS60237495A

JPS60237495A - Voice recognition equipment

Info

Publication number: JPS60237495A
Application number: JP59093355A
Authority: JP
Inventors: 和彦松尾; 岩橋　弘幸; 充宏斗谷
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1984-05-09
Filing date: 1984-05-09
Publication date: 1985-11-26
Also published as: JPH0217119B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】く技術分野〉本発明は日本語の音声認識装置に関する。[Detailed description of the invention] Technical fields> The present invention relates to a Japanese speech recognition device.

〈従来技術〉従来の音声認識装置には、話者が認識率の向上を計るだ
めの目安となる発声速度、識別率の表示装置が設けられ
ていなかっただめ、話者は最良の発声状態を自覚するこ
とができなかった。<Prior art> Conventional speech recognition devices were not equipped with display devices for displaying speech rate and recognition rate, which would be used as a guideline for speakers to improve their recognition rate. I couldn't realize it.

〈発明の目的〉本発明の目的は、発声速度と識別率が可視表示され、識
別率の向上を計りながら発声することのできる音声認識
装置を提供することにある。<Object of the Invention> An object of the present invention is to provide a speech recognition device that visually displays the speech rate and recognition rate, and is capable of uttering while trying to improve the recognition rate.

〈実施例〉以下、本発明の構成を、実施例により説明する。<Example> Hereinafter, the configuration of the present invention will be explained using examples.

第１図に本発明装置のブロック構成図を示す。FIG. 1 shows a block diagram of the apparatus of the present invention.

マイクロホン１に入力された音声信号は、アナログ入力
部２で増幅及びデジタル変換され、音声分析部３及び音
節セグメンテーション部４に導入される。音声分析部３
は、入力された音声信号を１６ｍ８程度のフレームに分
け、スペクトル分析を行ない、８ｍ８程度の間隔で音節
セグメンテーション部４−＼、その音声の特徴パターン
情報、ハワー零交差数等の情報を転送する。A voice signal input to the microphone 1 is amplified and digitally converted by the analog input section 2, and then introduced to the voice analysis section 3 and the syllable segmentation section 4. Voice analysis section 3
divides the input speech signal into frames of about 16m8, performs spectrum analysis, and transfers information such as the characteristic pattern information of the speech and the number of Hower zero crossings to the syllable segmentation unit 4-\ at intervals of about 8m8.

音節セグメンテーション部４ば、音声分析部３から送ら
れた種々な情報に基いて音節を切り出す。The syllable segmentation section 4 cuts out syllables based on various information sent from the speech analysis section 3.

また、入力音声の無音区間、有音区間の識別を行ない、
各区間について時間計測が８ｍＳ毎に行なわれ、計測さ
れたデータは発声速度の計算データとしてＣＰＵ５に送
られる。It also identifies silent sections and sound sections of the input audio,
Time measurement is performed for each section every 8 mS, and the measured data is sent to the CPU 5 as calculation data of the speaking rate.

音節セグメンテーション部４において切り出された音節
の特徴パターンは、ＣＰＵ５を通じてパターンメモリ６
内の特徴パターンメモリ７に格納され、標準パターンメ
モリ８に予め登録されている標準パターンと、単音節認
識部９にて比較される。その比較結果はＣＰＵ５に戻さ
れ、ＣＰＵ５はそれを認識結果格納メモリ１０に貯蔵し
、表示装置１２に表示する。キーボード１１は表示され
た入力音声を修正するために用いられる。The characteristic patterns of the syllables extracted by the syllable segmentation unit 4 are stored in the pattern memory 6 through the CPU 5.
The monosyllable recognition unit 9 compares the standard pattern with a standard pattern registered in advance in the standard pattern memory 8. The comparison result is returned to the CPU 5, which stores it in the recognition result storage memory 10 and displays it on the display device 12. The keyboard 11 is used to modify the displayed input voice.

第２図に音節セグメンテーション部４０発声速度演算処
理用プログラムのフローチャートを示す。FIG. 2 shows a flowchart of the utterance rate calculation processing program of the syllable segmentation unit 40.

無音区間と有音区間は交互に繰返される。無音区間と有
音区間の時間計測のカウント数をそれぞれＰｉ、Ｖｉと
し、無音区間のカウント数Ｐ１を累積記憶するレジスタ
の内容をＰ、有音区間のカウント数■１を累積記憶する
レジスタの内容をＶ、無音区間Ｐ１のいき値をｌとする
。このいき値ｊは促音、語頭の音節を無視するように設
定される。Silent sections and sound sections are repeated alternately. Let Pi and Vi be the count numbers for time measurement in the silent section and the sound section, respectively, P are the contents of the register that cumulatively stores the count number P1 of the silent period, and the contents of the register that cumulatively store the count number ■1 of the sound period. is V, and the threshold value of the silent section P1 is l. This threshold value j is set so as to ignore consonants and word-initial syllables.

ステップ８１．Ｓ２にて各レジスタ及びカウンタがクリ
ヤされる。無音区間と判断されているときには８ｍＳ毎
にカウンタＰ１がカウントされ（Ｓ５）、有音区間と判
断されているときには８ｍＳ毎にカウンタ■１がカウン
ト　（Ｓ６）される。すなわち、Ｐ　１←Ｐ　１　＋１Ｖｉ−Ｖｉ−１−１が実行される。次にＰ　ｉ　）　／３が判断され、ＮＯ
であればＰ←Ｐ−１−ＰｉＶ←Ｖ＋Ｖ　工の累積が実行され（Ｓ９．　５１０）、つづいて切り出
された音節のカウントＣ−Ｃ＋１が実行される（Ｓ　１１）　。１１’：＞ｌがｙｅｓで
あれば８９．　ＳｌＯ，Ｓｌｌの処理は省かれる。最後
に、Ｓ’１２にて１音節の平均時間Ｔと３２秒毎の平均
発声速度Ｍが次式により算出される。Step 81. Each register and counter are cleared in S2. When it is determined that it is a silent section, the counter P1 is counted every 8 mS (S5), and when it is determined that it is a sound section, the counter P1 is counted every 8 mS (S6). That is, P 1←P 1 +1 Vi-Vi-1-1 is executed. Next, P i ) /3 is determined and NO
If so, the accumulation of P←P-1-Pi V←V+V is executed (S9.510), and then the count of cut out syllables C-C+1 is executed (S11). 11':>89 if l is yes. The processing of SlO and Sll is omitted. Finally, in S'12, the average time T of one syllable and the average speech rate M every 32 seconds are calculated using the following equations.

１０００Ｍ　＝− この算出結果はＣＰＵに転送される。1000 M=- This calculation result is transferred to the CPU.

第３図に識別率の演算処理するプログラムのフローチャ
ートを示す。Ｓ２１にてカウンタＣ１゜Ｃ２がクリアさ
れ、前記した処理により切り出された音節か入力され（
Ｓ　２２）　、その内容が認識され（ｆ９２４）、認識
結果が表示装置１２に表示され（Ｓ２５）、この処理が
単語終了まで行われる。ここで、発声した内容と表示さ
れた内容が相違していると、認識結果をキーボードによ
り修正する作業が行われる（ＥＩ２７）。この修正は、
認識結果の複数の候補より第−位の候補の認識結果につ
いて修正の必要がある場合に行われる。修正後、ＣＰＵ
において第−位の候補の認識結果の単語と、確認もしく
は修正によって正しく改められた認識結果の単語との間
で各音節ごとに比較が行われ（Ｓ２８〜５３３）、識別
率にの算出ｌ［＝□ 　２が実行され、発声速度Ｍの算出が実行され、表示される。例えば、「はじめ」と発音し
たとき「かしめ」と表示されたため第一文字「か」を「
は」に修正したさきは識別率に一区−６６・７チと算出され、表示される。FIG. 3 shows a flowchart of a program for calculating the identification rate. At S21, counters C1 and C2 are cleared, and the syllables cut out by the above processing are input (
S22), its content is recognized (f924), the recognition result is displayed on the display device 12 (S25), and this process is continued until the end of the word. Here, if the uttered content and the displayed content are different, the recognition result is corrected using the keyboard (EI27). This modification is
This is carried out when it is necessary to correct the recognition result of the candidate ranked lower than the plurality of recognition result candidates. After modification, CPU
A comparison is made for each syllable between the word of the recognition result of the -th candidate and the word of the recognition result corrected by confirmation or correction (S28 to 533), and the recognition rate is calculated l[ =□ 2 is executed, and the speech rate M is calculated and displayed. For example, when you pronounce ``hajime,'' the first character ``ka'' is displayed as ``kajime.''
After correcting it to ``ha'', the identification rate is calculated and displayed as -66.7chi.

[Brief explanation of the drawing]

第１図は本発明実施例のブロック構成図、第２図及び第
３図は本発明実施例のプログラムを示すフローチャート
である。特許出願人　ンヤープ株式会社代　理　人　弁理士西１）　新FIG. 1 is a block diagram of an embodiment of the present invention, and FIGS. 2 and 3 are flowcharts showing programs of the embodiment of the present invention. Patent applicant Nyaap Co., Ltd. Agent Patent attorney Nishi 1) Shin

Claims

[Claims]

A monosyllable recognition device for recognizing Japanese speech in units of syllables, comprising a speech rate calculation means, a discrimination rate calculation means, a speech rate display means, and a discrimination rate display means.