JPS63265299A

JPS63265299A - Voice recognition equipment

Info

Publication number: JPS63265299A
Application number: JP62099369A
Authority: JP
Inventors: 柳瀬　憲治
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 1987-04-22
Filing date: 1987-04-22
Publication date: 1988-11-01

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、音声情報を音節毎に順次認識する音声認識装
置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition device that sequentially recognizes speech information syllable by syllable.

［従来技術］従来、この種の音声認識装置は、入力された音声情報を
音節毎に区分する音節分割手段と、区分された音節の母
音部を認識する母音認識手段と、各種の子音標準パター
ンを記憶している子音標準パターン記憶手段と、区分さ
れた音節の子音部を認識する子音認識手段とを備えてお
り、その子音認識手段において、前記区分された音節の
子音部の特徴パターンと前記子音標準パターンとのマツ
チングを行なうことにより前記子音部を認識している。[Prior Art] Conventionally, this type of speech recognition device has a syllable dividing means for dividing input speech information into syllables, a vowel recognition means for recognizing the vowel parts of the divided syllables, and various standard consonant patterns. and consonant recognition means for recognizing the consonant part of the divided syllable, and the consonant recognition means stores the characteristic pattern of the consonant part of the divided syllable and the consonant part of the divided syllable. The consonant part is recognized by matching with a standard consonant pattern.

し発明が解決しようとする問題点］前述の子音認識手段においては、前記区分された音節の
子音部の特徴パターンを全ての子音標準パターンに対し
てマツチングさせる必要がめるため、子音認識に要する
処理量及び時間は膨大なものになっていた。[Problems to be Solved by the Invention] In the above-mentioned consonant recognition means, since it is necessary to match the characteristic patterns of the consonant parts of the classified syllables with all consonant standard patterns, the amount of processing required for consonant recognition is And the amount of time was enormous.

［発明の目的］本発明は、前）ホの問題点を解決するために為されたも
のであって、子音ｍ’ｌ　’Ｉ＆手段においてマツチン
グの回数を減らし、認識に要する処理量及び時間を削減
することを目的とする。[Object of the Invention] The present invention has been made to solve the problem of (e) above, and is to reduce the number of matching operations in the consonant m'l 'I&, and to reduce the amount of processing and time required for recognition. The aim is to reduce

［問題点を解決するための手段］本発明の音声認識装置は第１図に示すように、入力され
た音声情報を音節毎に区分する音節分割手段１と、区分
された音節の母音部を認識する母音認識手段２と、各種
の子音標準パターンを記憶している子音標準パターン記
憶手段６と、区分された音節の子音部の特徴パターンと
前記子音標準パターンとのマツチングを行なうことによ
りその音節の子音部を認識する子音認識手段５とを備え
ており、前述の目的を達成するために更に、前記母音認
識手段２と前記子音認識手段５とによって認識された音
節の認識結果を時系列的に逐次記憶する音節列作成手段
７と、各種単語あるいは文節に関する情報を記憶してい
る辞書手段４と、前記音節列作成手段７に記憶されてい
る音部の認識結果の時系列と最後に認識された音節に続
く次の音節の母音部の認識結果との組合わせを前記辞書
手段４に記憶されている情報と照合し、その照合結果よ
り前記次の音節の子音候補を作成する子音候補作成手段
３とを有しており、前記子音認識手段５は、前記次の音
節の子音部の特徴パターンと前記各種の子音標準パター
ンのうちの前記子音候補の標準パターンのみとのマツチ
ングを行なうことを特徴としている。[Means for Solving the Problems] As shown in FIG. 1, the speech recognition device of the present invention includes a syllable dividing means 1 for dividing input speech information into syllables, and a vowel part of the divided syllables. A vowel recognition means 2, a consonant standard pattern storage means 6 that stores various standard consonant patterns, and a consonant standard pattern storage means 6 that stores various standard consonant patterns, match the characteristic pattern of the consonant part of the divided syllable with the consonant standard pattern to determine the syllable. and a consonant recognition means 5 which recognizes the consonant parts of the vowel recognition means 2 and the consonant recognition means 5. In order to achieve the above-mentioned purpose, the recognition results of the syllables recognized by the vowel recognition means 2 and the consonant recognition means 5 are chronologically analyzed. a syllable string creating means 7 which sequentially stores information on various words or phrases, a dictionary means 4 which stores information regarding various words or phrases, and a chronological sequence of the recognition results of the syllables stored in the syllable string creating means 7 and finally recognition. Consonant candidate creation, which compares the combination with the recognition result of the vowel part of the next syllable following the syllable with the information stored in the dictionary means 4, and creates a consonant candidate for the next syllable based on the verification result. The consonant recognition means 5 is configured to match only the characteristic pattern of the consonant part of the next syllable with the standard pattern of the consonant candidate among the various standard consonant patterns. It is a feature.

「作用］母音認識手段２は、音節分割手段１により区分された音
節の母音部を認識する。子音候補作成手段３は、既に認
識された音節の時系列と前記母音部の認識結果とを基に
して、辞書手段４に記憶されている単語あるいは文節を
形成する可能性のある子音を前記区分された音節の子音
候補として作成する。子音認識手段５は、前記子音候補
のみをマツチングの対像として前記区分された音部の子
音部を認識する。音節列作成手段７は、母音認識手段２
と子音ル２識手段５とによって認識された音節の認識結
果と既に記憶されている音部の認識結果の時系列とから
新たな時系列を構成し記憶する。"Operation" The vowel recognition means 2 recognizes the vowel part of the syllable divided by the syllable division means 1.The consonant candidate creation means 3 recognizes the vowel part of the syllable divided by the syllable division means 1. Then, consonants that may form words or phrases stored in the dictionary means 4 are created as consonant candidates for the classified syllables.The consonant recognition means 5 uses only the consonant candidates as matching targets. The syllable string creating means 7 recognizes the consonant part of the divided syllable part as the vowel recognition means 2.
A new time series is constructed and stored from the time series of the recognition results of the syllables recognized by the consonant le recognition means 5 and the already stored time series of the recognition results of the syllables.

［実施例］本発明の実施例について第２図乃至第５図を参照して以
下に説明する。[Example] Examples of the present invention will be described below with reference to FIGS. 2 to 5.

本実施例の音声認識装置は、単語単位で入力された音声
情報を音節毎に順次認識するものであり、第２図に示す
ように、ｃｐｕ　＜中央処理装置）１０とそれに接続さ
れているＲＯＭ　（リード　オンリー　メモリ）２０．
ＲＡＭ　（ランダム　アクセス　メモリ）３０及び入出
カポ−１〜４０とから横成されている。ＲＯＭ２０は、
ＣＰＵ１０の処理動作が記述されているプログラム２１
．　ＥＩ音音標式パターン２２子音標準パターン２３及
び各種単語が音素系列で記述されている単語辞書２４を
記憶しており、ＲＡＭ３０内には、音声情報エリア３１
、音節区間ポインタ３２．ｆｆｉ音レジスタ３３゜子音
候補エリア３４及び音節列エリア３５が形成されている
。音節列エリア３５は、第５図に示すように３つの行か
ら構成されており、各行には音節列とその総合類似度と
が記憶されるようになっている。また、子音候補エリア
３４は同図に示すように音節列エリア３５と同じく３つ
の行から構成されており、各行には対応する音節列エリ
ア３５の行に記憶されている音節列を基にして求められ
た子音候補とその類似度とが記憶されるようになってい
る。本実施例では、単語辞書２４として都市名を集めた
ものを使用しており、その単語辞書２４は第４図に示す
ように音素をノードとしその接続関係をブランチで表し
たツリー構造をしていて、更に母音部の認識結果から子
音候補を検索し易いように、通常のローマ字表記とは逆
の順序で母音と子音が記述されている。The speech recognition device of this embodiment sequentially recognizes speech information input word by word syllable by syllable, and as shown in FIG. (Read-only memory) 20.
It consists of a RAM (random access memory) 30 and input/output capos 1 to 40. ROM20 is
A program 21 in which processing operations of the CPU 10 are described
．． It stores an EI phonetic phonetic pattern 22, a consonant standard pattern 23, and a word dictionary 24 in which various words are described in phoneme series, and the RAM 30 stores a phonetic information area 31.
, syllable interval pointer 32. An ffi sound register 33, a consonant candidate area 34, and a syllable string area 35 are formed. The syllable string area 35 is composed of three rows, as shown in FIG. 5, and each row stores a syllable string and its overall similarity. In addition, as shown in the same figure, the consonant candidate area 34 is composed of three rows like the syllable string area 35, and each row contains a syllable string based on the syllable string stored in the corresponding row of the syllable string area 35. The obtained consonant candidates and their degrees of similarity are stored. In this embodiment, a collection of city names is used as the word dictionary 24, and as shown in FIG. 4, the word dictionary 24 has a tree structure in which phonemes are nodes and connections between them are represented by branches. Furthermore, to make it easier to search for consonant candidates from the vowel recognition results, vowels and consonants are written in the reverse order from the normal Roman alphabet.

次に、第３図のフローチャー１・を参照して以上の構成
におけるＣＰＵ１０の処理動作を説明する。Next, the processing operation of the CPU 10 in the above configuration will be explained with reference to flowchart 1 in FIG.

今、単語音声として「浜田（ｈａｍａｄａ）Ｊが与えら
れた場合を考える。入出力ポート４０に入力された音声
情報は、ステップＳ１で音声情報エリア３１へ格納され
る。ステップＳ２で音声情報から第１音節／ｈ　ａ／が
抽出され、その音節の始端及び終端に対応する音声情報
が格納されている音声情報エリア３１のアドレスが音節
区間ポインタ３２に格納される。音節区間ポインタ３２
で指示される第１の音節について、ステップＳ３でその
母音部／ａ／の特徴パターンと母音標準パターン２２と
のマツチングが行なわれ、第５図（ａ）に示すように認
識結果／ａ／が母音レジスタ３３に記憶される。第１音
節の認識時には、音節列エリア３５には音節列は記憶さ
れていないため、ステップＳ４では単ＭＦｊ辞ｍ２４の
中から第１音節の母音／ａ／につながる子音／ｈ／、／
に／、／ｍ／、ｉｎ＞、／Ｓ／が子音候補として子音候
補エリア３４の第１行に記憶される。ステップＳ５では
、子音候補エリア３４に記憶されている子音候補につい
て、音節区間ポインタ３２で指示される第１音節の子音
部の特徴パターンと子音標準パターン２３内の前記子音
候補の標準パターンとのマツチングが行なわれ、それに
よって求められた類似度が子音候補エリア３４に各子音
候補に続いて記憶される［第５図（ａ）参照］。ステッ
プＳ６では、子音候補エリア３４に記憶されている子音
ＩＵ補の中から類似度の高い順に第３位までが母音レジ
スタ３３に記憶されている母音とともに音節列として音
節列エリア３５の各行に記憶され、同時に各子音候補の
類似度が総合類似度として記憶される［第５図（ｂ）参
照］。ステップＳ７で最、後の音節であるか否かの判断
が行なわれ、ステップＳ２へ戻る。ステップＳ２で第２
音節／ｍａ／が抽出され、ステップＳ３で第２音節の母
音部がル２識され、母音レジスタ３３に母音／ａ／が記
憶される１Ｍ５図（ｂ）参照Ｊ。ステップＳ４では、音
節列エリア３５に記憶されている音節列のうち先ず第１
行の／ｈ　ａ／について、第２音節の母音／ａ／につな
がる子音／ｍ／、／ｎ／が第２音節の子音候補として子
音候補エリア３４の第１行に記憶され、第２行及び第３
行の音節列についても同様に子音候補が子音候補エリア
３４に記憶される。ステップＳ５では、各子音候補につ
いて重複のないように第２音節の子音部のマツチングが
行なわれ、類似度が子音候補エリア３４に記憶される［
第５図（ｂ）参照］。ステップＳ６では、各音節列とそ
の音節列を基にして求められた各子音候補との組合わせ
について、音節列の総合類似度と子音候補の類似度との
和を求め、その和の大きい順に第３位までが母音ととも
に新たな音節列として音節列エリア３５の各行に記憶さ
れ、同時にその和が新たな音節列の総合類似度として記
憶される［第５図（Ｃ）参照］。ステップＳ７から■び
ステップＳ２へ戻り、第３音節／ＣＪａ／について前述
のステップＳ２からステップＳ６３．での処理が行なわ
れる［第５図（Ｃ）参照〕。そして、ステップＳ７で最
俊の音節であると判断されるとステップＳ８で音節列エ
リア３５に記憶されている＆部列／ｈａｍａｄａ／、／
ｈａｍａｍａ／。Now, consider the case where "hamada" J is given as a word sound.The sound information input to the input/output port 40 is stored in the sound information area 31 in step S1.In step S2, the sound information is One syllable /ha/ is extracted, and the address of the audio information area 31 in which audio information corresponding to the beginning and end of the syllable is stored is stored in the syllable section pointer 32.Syllable section pointer 32
Regarding the first syllable designated by , the characteristic pattern of the vowel part /a/ is matched with the vowel standard pattern 22 in step S3, and the recognition result /a/ is shown in FIG. 5(a). It is stored in the vowel register 33. When recognizing the first syllable, since no syllable string is stored in the syllable string area 35, in step S4, the consonants /h/, / connected to the vowel /a/ of the first syllable are selected from the simple MFj word m24.
ni/, /m/, in>, and /S/ are stored in the first row of the consonant candidate area 34 as consonant candidates. In step S5, for the consonant candidates stored in the consonant candidate area 34, matching is performed between the characteristic pattern of the consonant part of the first syllable indicated by the syllable section pointer 32 and the standard pattern of the consonant candidate in the consonant standard pattern 23. is performed, and the degree of similarity determined thereby is stored in the consonant candidate area 34 following each consonant candidate [see FIG. 5(a)]. In step S6, the top three consonant IU complements stored in the consonant candidate area 34 are stored in each row of the syllable string area 35 as a syllable string along with the vowels stored in the vowel register 33 in descending order of similarity. At the same time, the similarity of each consonant candidate is stored as a total similarity [see FIG. 5(b)]. In step S7, it is determined whether it is the last syllable or not, and the process returns to step S2. In step S2, the second
The syllable /ma/ is extracted, the vowel part of the second syllable is identified in step S3, and the vowel /a/ is stored in the vowel register 33.See FIG. 1M5 (b). In step S4, first of all the syllable strings stored in the syllable string area 35,
Regarding the row /h a/, consonants /m/ and /n/ connected to the vowel /a/ of the second syllable are stored in the first row of the consonant candidate area 34 as consonant candidates of the second syllable, and Third
Consonant candidates are similarly stored in the consonant candidate area 34 for the syllable string of the row. In step S5, matching of the consonant part of the second syllable is performed for each consonant candidate so that there is no overlap, and the degree of similarity is stored in the consonant candidate area 34.
See FIG. 5(b)]. In step S6, for the combination of each syllable string and each consonant candidate found based on that syllable string, the sum of the overall similarity of the syllable string and the similarity of the consonant candidates is calculated, and the sum is determined in descending order of the sum. The first three digits are stored together with the vowel as a new syllable string in each row of the syllable string area 35, and at the same time, the sum is stored as the overall similarity of the new syllable string [see FIG. 5(C)]. From step S7, the process returns to step S2, and for the third syllable /CJa/, from step S2 to step S63. The processing is performed [see FIG. 5(C)]. If it is determined in step S7 that the syllable is the shortest, then in step S8 the & part sequence /hamada/, / stored in the syllable sequence area 35 is determined.
hamama/.

／５ａｋａｔａ／［第５図（、ｄ）参照〕のうち単語と
して完成しているもの／ｈａｍａｄａ／、／５ａｋａｔ
ａ／の中で総合類似度の最も高い音節列／ｈａｍａｄａ
／が認識結果として入出力ポート４０から出力される。/5akata/ [see Figure 5 (, d)] which are complete words /hamada/, /5akat
Syllable string with the highest overall similarity among a//hamada
/ is output from the input/output port 40 as a recognition result.

［発明の効果］以上詳述したように、本発明の音声認識装置は、既に認
識された音節の認識結果と母音部の認識結果とを基にし
て辞書手段に記憶されている単語あるいは文節を形成す
る可能性のある子音のみをマツチングの対象とすること
により、無駄なマツチングを行なわずに済み、子音認識
に要する処理機及び時間を削減することができるという
利点を有する。更に本発明によれば、−音節入力される
毎にその音節の認識を行なうことができるため、予め単
語毎に区切ることなく文単位で連続入力された音声情報
を各音節の認識結果を基にしてその文の構造を解析しな
がら実時間で認識する装置に応用される可能性を有する
ものである。[Effects of the Invention] As described in detail above, the speech recognition device of the present invention can recognize words or phrases stored in the dictionary means based on the recognition results of already recognized syllables and the recognition results of vowel parts. By matching only consonants that are likely to be formed, there is no need to perform unnecessary matching, and there is an advantage that the processing equipment and time required for consonant recognition can be reduced. Furthermore, according to the present invention, it is possible to recognize each syllable each time the syllable is input, so that speech information that has been input continuously in sentences without dividing it into words in advance can be recognized based on the recognition results for each syllable. This has the potential to be applied to a device that recognizes sentences in real time while analyzing their structure.

[Brief explanation of drawings]

第１図は本発明の音声認識装置の構成を示すブロック図
、第２図は本発明の一実施例を示すブロック図、第３図
は本実施例におけるＣＰＵの処理動作を示すフローチャ
ート、第４図は本実施例で使用する単語辞書の記憶内容
の一部を示す図面、第５図は本実施例において単語認識
を行なった際の母音レジスタ、子音候補エリア及び音節
列エリアの記憶内容の変化を示す図面である。に音節分割手段、２：母音認識手段、３：子音候補作成
手段、４：辞書手段、５：子音認識手段、６：子音標準
パターン記憶手段、７：＠部列作成手段。FIG. 1 is a block diagram showing the configuration of a speech recognition device according to the present invention, FIG. 2 is a block diagram showing an embodiment of the present invention, FIG. 3 is a flowchart showing the processing operation of the CPU in this embodiment, and FIG. The figure shows a part of the memory contents of the word dictionary used in this embodiment, and Figure 5 shows changes in the memory contents of the vowel register, consonant candidate area, and syllable string area when word recognition is performed in this embodiment. FIG. syllable dividing means, 2: vowel recognition means, 3: consonant candidate creation means, 4: dictionary means, 5: consonant recognition means, 6: consonant standard pattern storage means, 7: @part sequence creation means.

Claims

[Scope of Claims] 1. Syllable dividing means (1) for dividing input speech information into syllables; Vowel recognition means (2) for recognizing vowel parts of the divided syllables.
and a consonant standard pattern storage means (6) that stores various standard consonant patterns; and a consonant standard pattern storage means (6) that stores various standard consonant patterns; A speech recognition device comprising a consonant recognition means (5) for sequentially recognizing input speech information syllable by syllable, the speech recognition device comprising the vowel recognition means (2) and the consonant recognition means (5). syllable string creation means (7) for sequentially storing the recognition results of syllables recognized by the syllable strings in chronological order; dictionary means (4) for storing information regarding various words or phrases; and the syllable strings. A combination of the time series of syllable recognition results stored in the creation means (7) and the recognition results of the vowel part of the next syllable following the last recognized syllable is stored in the dictionary means (4). and a consonant candidate creation means (5) for creating a consonant candidate for the next syllable based on the verification result, and the consonant recognition means (5) is configured to identify the characteristic pattern of the consonant part of the next syllable. and a standard pattern of the consonant candidate among the various standard consonant patterns. 2. The consonant recognition means (5) determines the similarity of the standard pattern of each consonant candidate to the characteristic pattern of the consonant part of the next syllable by matching, outputs the similarity as a recognition result, and creates the syllable string. Means (7) combines each time series of already memorized syllable recognition results with the recognition result of the vowel part of the next syllable, and each consonant candidate obtained based on the combination. Construct a new time series from 2. The speech recognition device according to claim 1, wherein said new time series and their total similarity are stored up to a predetermined rank in descending order of similarity.