JPH0338699A

JPH0338699A - Speech recognition device

Info

Publication number: JPH0338699A
Application number: JP1173630A
Authority: JP
Inventors: Mitsuhiro Toya; 充宏斗谷; Shigeyoshi Ono; 小野　茂良
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1989-07-05
Filing date: 1989-07-05
Publication date: 1991-02-19

Abstract

PURPOSE:To recognize speech in an extremely large vocabulary with good accuracy by providing plural recognition parts which differ in recognition unit and a display part, identifying an input voice in phoneme units, and matching the voice with a dictionary by the recognition parts according to the identification results and recognizing the voice in the recognition units corresponding to the respective recognition parts. CONSTITUTION:An analog voice signal which is inputted continuously to an analog input part 1 is inputted to a voice analysis part 2 to generate voice feature parameter. The feature parameter is identified in phoneme units by pattern matching, etc. Then a paragraph recognition part 8 matches a combination of syllable candidates in successive syllable recognition results with the contents of a paragraph recognition dictionary 9 to recognize the input voice in paragraph units. A word recognition part 10 matches dictionary items of a dictionary 11 for word recognition with a syllable candidate group in the successive syllable recognition results to recognize the input voice in word units. Namely, the speech is recognized in units of an extremely large vocabulary such as paragraphs.

Description

【発明の詳細な説明】〈産業上の利用分野〉・この発明は、入力された音声を単語または文節等の意味
の有る単位で認識する音声認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION <Industrial Application Field> The present invention relates to a speech recognition device that recognizes input speech in meaningful units such as words or phrases.

〈従来の技術〉従来、音声認識装置として、音韻を識別単位とする音声
認識装置がある。この音韻を識別単位とする音声認識装
置においては、音韻の識別結果を組み合わせて音声の認
識処理（ボトムアップ処理）を行うので、文節等のよう
に活用を含む超大給量の認識が可能であり、文章入力に
適している。<Prior Art> Conventionally, as a speech recognition device, there is a speech recognition device that uses phonemes as identification units. This speech recognition device that uses phonemes as the identification unit performs speech recognition processing (bottom-up processing) by combining the phoneme identification results, so it is possible to recognize very large amounts of speech, including conjugations, such as phrases. , suitable for text input.

ところが、上記音韻を識別単位とする音声認識装置によ
って文節の単位で認識処理を行う場合には、文節認識の
精度をある程度以上高くすることは困難である。そこで
、音韻を識別単位とする識別動作に平行して単語を識別
単位とする識別動作を行って、両識別結果に基づいて文
節を認識することが試みられている（例えば、「連続音
声認識における頻出単語スポツティング」日本音響学会
昭和６１年度秋期研究発表会講演論文集２−３１Ｏ１特
開昭６３−１５３５９５等）。However, when recognition processing is performed in units of clauses using the speech recognition device that uses phonemes as identification units, it is difficult to increase the accuracy of clause recognition beyond a certain level. Therefore, attempts have been made to perform a recognition operation using words as a recognition unit in parallel with a recognition operation using a phoneme as a recognition unit, and to recognize phrases based on the results of both types of recognition. "Frequent Word Spotting" Proceedings of the 1986 Autumn Research Conference of the Acoustical Society of Japan 2-31O1 JP-A-153595, etc.).

〈発明が解決しようとする課題〉しかしながら、上記音韻単位による識別と単語単位によ
る識別とを平行して行う音声認識装置においては単語を
識別単位とする識別動作を実行しなければならない。と
ころが、単語を識別単位としｆこ場合には識別に時間を
要するにら拘わらず、識別できる単語は高々ｉ、ｏｏｏ
語程度である。<Problems to be Solved by the Invention> However, in a speech recognition device that performs the above-mentioned phoneme-based identification and word-based identification in parallel, it is necessary to perform an identification operation using words as identification units. However, even though it takes time to identify words when words are used as identification units, the words that can be identified are at most i, ooo
It is about a word.

しかも、その識別の対象となる総ての単語を一度発声し
て、その単語の特徴パラメータから戊る単語標準パター
ンを辞書に登録しておかなければならず効率が悪いとい
う問題がある。Moreover, it is necessary to utter all the words to be identified once and then register a standard word pattern determined from the characteristic parameters of the word in the dictionary, which is inefficient.

そこで、この発明の目的は、音韻を識別単位として、効
率良くしかも精度良く超大給電の音声を認識できる音声
認識装置を提供することにある。SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a speech recognition device that can efficiently and accurately recognize super-power-supplied speech using phonemes as identification units.

〈課題を解決するための手段〉上記目的を達成するため、この発明は、人力された音声
信号を音響分「して特徴パラメータに変換し、この特徴
パラメータを所定の手法によって音韻単位で識別し、こ
の識別結果に基づいて音声を認識する音声認識装置にお
いて、上記音韻単位の識別結果に基づいて、入力音声を
単語または文節等の意味の有る認識単位で辞書を照合し
て認識する認識部であって、各々の認識単位が異なる複
数の認識部と、各認識部による認識結果を表示する表示
部を備えて、異なる認識単位の認識処理を併用できるよ
うにしたことを特徴としている。<Means for Solving the Problems> In order to achieve the above object, the present invention acoustically separates a human-generated speech signal, converts it into feature parameters, and identifies the feature parameters in phoneme units using a predetermined method. In the speech recognition device that recognizes speech based on the identification result, a recognition unit recognizes the input speech by comparing it with a dictionary in meaningful recognition units such as words or phrases based on the identification result of the phoneme unit. The present invention is characterized in that it includes a plurality of recognition units each having a different recognition unit, and a display unit that displays the recognition results of each recognition unit, so that recognition processing of different recognition units can be used together.

また、上記音声認識装置において上記各認識部が音声を
認識する際に用いる辞書は、各認識部の認識単位に対応
させて構成すると共に、各認識部毎に設けるようにする
のが望ましい。Further, in the speech recognition device, it is preferable that the dictionaries used by each of the recognition units to recognize speech be configured to correspond to the recognition unit of each recognition unit, and provided for each recognition unit.

さらに、上記音声認識装置において認識処理を行う際に
、動作させる認識部を切り替え選択するための切替手段
を設けるようにするのが望ましい。Furthermore, when performing recognition processing in the voice recognition device, it is desirable to provide a switching means for switching and selecting the recognition section to be operated.

さらに、上記音声認識装置における上記表示手段は、複
数の領域に分けられ、各領域に異なった認識部による認
識候補を表示する１つのウィンドウを有して、表示され
た認識候補の中から所望の認識候補を認識候補選択手段
によって選択可能にするのが望ましい。Furthermore, the display means in the speech recognition device is divided into a plurality of areas, each area having one window for displaying recognition candidates by different recognition units, and selecting a desired recognition candidate from among the displayed recognition candidates. It is desirable that the recognition candidates can be selected by the recognition candidate selection means.

さらに、上記音声認識装置における上記表示手段は、異
なった認識部による認識候補を夫々表示する複数のウィ
ンドウを有して、複数のウィンドウの中から所望のウィ
ンドウをウィンドウ選択手段によって選択し、この選択
されたウィンドウに表示された認識候補の中から所望の
認識候補を認識候補選択手段によって選択可能にするの
が望ましい。Furthermore, the display means in the speech recognition device has a plurality of windows each displaying recognition candidates by different recognition units, and a window selection means selects a desired window from among the plurality of windows. It is desirable that a desired recognition candidate be selected from among the recognition candidates displayed in the displayed window by the recognition candidate selection means.

く作用〉音声信号が入力されると、この入力された音声信号が音
響分析されて特徴パラメータに変換され、この特徴パラ
メータがパターンマツチング等によって音韻単位で識別
される。そして、この音韻単位で識別された識別結果が
複数の認識部に入力され、上記識別結果に基づいて、各
認識部に応じた単語または文節等の意味の有る認識単位
で辞書等を照合して人力音声が認識される。そして、各
認識部における認識結果が表示部に表示される。Function> When an audio signal is input, the input audio signal is acoustically analyzed and converted into feature parameters, and these feature parameters are identified in units of phonemes by pattern matching or the like. Then, the identification results identified in phoneme units are input to a plurality of recognition units, and based on the above identification results, the recognition units that have meaning, such as words or phrases, according to each recognition unit are checked against a dictionary, etc. Human voice is recognized. Then, the recognition results of each recognition section are displayed on the display section.

しｒこがって、複数の認識部によって異なる認識単位の
認識処理を併用して音声を認識することかできる。Rather, it is possible to recognize speech by using recognition processing of different recognition units by a plurality of recognition units.

また、上記音声認識装置は、各認識部において音声を認
識する際に用いる辞書を、各認識部の認識単位に対応さ
せて構成して各認識ＫＳ毎に設けるようにし、動作させ
る認識部を切替手段によって切り替え可能にすれば、人
力されろ音声の内容に応じて認識部を組み合わせて、さ
らに効率良く音声を認識することができる。Further, in the above-mentioned speech recognition device, a dictionary used when recognizing speech in each recognition unit is configured to correspond to the recognition unit of each recognition unit and provided for each recognition KS, and the recognition unit to be operated is switched. If it is possible to switch between the recognition units manually, it is possible to recognize the voice more efficiently by combining the recognition units according to the content of the voice.

また、上記音声認識装置は、表示手段にオーブンされる
【つのウィンドウを複数の領域に分け、その各領域に異
なった認識部による認識候補を表示し、表示された認識
候補の中から所望の認識候補を認識候補選択手段によっ
て選択可能にすれば、簡単に正しい認識候補を選択する
ことができる。In addition, the above-mentioned speech recognition device divides a window opened into a plurality of areas into a plurality of areas, displays recognition candidates by different recognition units in each area, and selects a desired recognition from among the displayed recognition candidates. If the candidates can be selected by the recognition candidate selection means, the correct recognition candidate can be easily selected.

また、上記音声認識装置は、表示手段にオーブンされる
複数のウィンドウの各ウィンドウに異なった認識部によ
る認識候補を表示し、上記オーブンされた複数のウィン
ドウの中から所望のウィンドウをウィンドウ選択手段に
よって選択し、かつ、この選択されたウィンドウに表示
された認識候補の中から所望する認識候補を認識候補手
段によって選択可能にすれば、さらに簡単に正しい認識
候補をより効率的に選択することかできる。Further, the speech recognition device displays recognition candidates by different recognition units in each of the plurality of windows opened on the display means, and selects a desired window from among the plurality of opened windows by the window selection means. If the desired recognition candidate can be selected by the recognition candidate means from among the recognition candidates displayed in the selected window, the correct recognition candidate can be selected more easily and efficiently. .

〈実施例〉以下、この発明を図示の実施例により詳細に説明する。<Example> Hereinafter, the present invention will be explained in detail with reference to illustrated embodiments.

この発明は、連続発声された音声を音節単位にセグメテ
ーションして識別し、得られた音節ラティスに基づいて
単語認識処理および文節認識処理の２種類の認識単位に
よる認識処理を実施し、単語認識結果と文節認識結果と
を併用して超大誘電の音声を認識するものである。This invention segments and identifies continuously uttered speech into syllable units, and performs recognition processing using two types of recognition units, word recognition processing and phrase recognition processing, based on the obtained syllable lattice. The results are used in conjunction with the phrase recognition results to recognize super-large dielectric speech.

第１図はこの発明の音声認識装置の一実施例を示すブロ
ック図である。アナログ入力部ｌに連続して人力される
アナログ音声信号は、増幅器（ＡＭＰ）＋　１によって
増幅され、Ａ／Ｄ変換器１２によってディジタル音声信
号に変換されて音声分析部２に出力される。Ａ／Ｄ変換
器１２から音声分析部２に入力されたディジタル音声信
号は８ｍｓの周期で１６ｍ５をフレームの単位としてス
ペクトル分析が行われ、音響的特徴パラメータが生成さ
れる。そして、この生成された特徴パラメータは音節区
間の抽出に必要な情報（パワーおよび零交差数等）と共
に音節区間抽出部３出力される。FIG. 1 is a block diagram showing an embodiment of the speech recognition device of the present invention. An analog audio signal continuously input to the analog input section 1 is amplified by an amplifier (AMP)+1, converted into a digital audio signal by an A/D converter 12, and outputted to the audio analysis section 2. The digital audio signal inputted from the A/D converter 12 to the audio analysis unit 2 is subjected to spectrum analysis in units of 16 m5 frames at a cycle of 8 ms, and acoustic feature parameters are generated. The generated feature parameters are outputted to the syllable section extractor 3 together with information necessary for extracting the syllable section (power, number of zero crossings, etc.).

そうすると、音声分析部２から音節区間抽出部３に人力
された特徴パラメータは、音節区間の抽出に必要な情報
に基づいて音節単位に切り出される。そして、音節単位
に切り出された音節特徴パターンは、音節区間情報（音
節長、直前／直後の音節との間の無音時間長等）と共に
パターンメモリ７の特徴パターンメモリ７Ｉに格納され
る。上記パターンメモリ７は、上述のようにして音節区
間抽出部３によって切り出されたｌ音節分の音節特徴パ
ターンを１個だけ格納する上記特徴パターンメモリ７１
と、予め作成された標準となる音節の音響的特徴パラメ
ータから成る標準パターンを複数個記憶している標準パ
ターンメモリ７２とから構成されている。Then, the feature parameters manually entered from the speech analysis section 2 to the syllable section extraction section 3 are extracted in syllable units based on the information necessary for extracting syllable sections. Then, the syllable characteristic pattern cut out in syllable units is stored in the characteristic pattern memory 7I of the pattern memory 7 together with syllable interval information (syllable length, silent time length between the syllable immediately before/after the syllable, etc.). The pattern memory 7 is the feature pattern memory 71 that stores only one syllable feature pattern for l syllables extracted by the syllable section extraction unit 3 as described above.
and a standard pattern memory 72 that stores a plurality of standard patterns each consisting of acoustic feature parameters of standard syllables created in advance.

上記特徴パターンメモリ７Ｉに格納された！音節の音節
特徴パターンと標準パターンメモリ７２に記憶された標
準パターンとの間の類似度計算が単音節認識部６によっ
て行われ、演算結果に基づいて音節区間抽出部３で切り
出された単音節が識別される。そして、連続した音節認
識結果（音節ラティス）内の音節候補の組み合わせと文
節認識用辞書９の内容との照合が文節認識部８によって
行われ、入力音声が文節単位で認識される。一方、単語
認識用辞書１１の辞書項目と連続した音節認識結果（音
節ラティス）内の音節候補群との照合が単語認識部ＩＯ
によって行われ、人力音声が単語単位で認識される。す
なわち、文節等の超大誘電の単位で音声が認識されるの
である。そして、文節認識部８による認識結果（文節候
補）および単語認識部１０による認識結果（単語候補）
が表示部１２に表示される。そうすると、オペレータに
よってキーボード１３が操作されて表示部１２に表示さ
れた文節候補および単語候補の中から正しい認識候補が
選択されて出力される。Stored in the feature pattern memory 7I! The similarity calculation between the syllable characteristic pattern of the syllable and the standard pattern stored in the standard pattern memory 72 is performed by the monosyllable recognition unit 6, and the monosyllable extracted by the syllable segment extraction unit 3 based on the calculation result is be identified. Then, the phrase recognition section 8 compares the combination of syllable candidates in the continuous syllable recognition results (syllable lattice) with the contents of the phrase recognition dictionary 9, and the input speech is recognized in units of phrases. On the other hand, the word recognition unit IO compares the dictionary items of the word recognition dictionary 11 with a group of syllable candidates in the consecutive syllable recognition results (syllable lattice).
The system recognizes human speech on a word-by-word basis. In other words, speech is recognized in ultra-large dielectric units such as phrases. Then, the recognition results (phrase candidates) by the phrase recognition unit 8 and the recognition results (word candidates) by the word recognition unit 10
is displayed on the display section 12. Then, the operator operates the keyboard 13 to select and output the correct recognition candidate from the phrase candidates and word candidates displayed on the display section 12.

ＣＰＵ（中央処理装置）５は上記アナログ入力部１、音
声分析部２．音節区間抽出部３．単音節認識部６．パタ
ーンメモリ７、文節認識部８．単語認識部ｌＯ１表示部
１２およびキーボード１３を制御して、音声認識処理動
作を実行する。また、ＣＰＵメモリ４には、ＣＰＵ５用
のプログラムおよび音声認識処理に必要な情報等が格納
されている。The CPU (central processing unit) 5 includes the analog input section 1, the voice analysis section 2. Syllable interval extraction unit 3. Monosyllable recognition unit 6. Pattern memory 7, clause recognition unit 8. Word recognition unit IO1 controls display unit 12 and keyboard 13 to execute speech recognition processing operations. Further, the CPU memory 4 stores programs for the CPU 5 and information necessary for speech recognition processing.

第２図は上記ＣＰＵ５の制御に基づいて実施される音声
認識処理動作のフローチャートである。FIG. 2 is a flowchart of the speech recognition processing operation executed under the control of the CPU 5. As shown in FIG.

以下、第２図に従って音声認識処理動作について詳細に
説明する。Hereinafter, the speech recognition processing operation will be explained in detail with reference to FIG.

ステップＳＬで、上記音声分析部２によって生成された
特徴パラメータが、音節区間抽出部３によって音節単位
に切り出される。その際に、複数の音節区間候補が存在
すれば複数の音節区間候補で切り出される。In step SL, the feature parameters generated by the speech analysis section 2 are extracted into syllable units by the syllable section extraction section 3. At that time, if a plurality of syllable section candidates exist, a plurality of syllable section candidates are extracted.

上記音節区間の切り出しには種々の方法があるか、例え
ば次のようにして行われる。すなわち、パワーの増大も
しくは減少を伴うスペクトルの変化点とパワーデイツプ
とから音節境界を検出する。There are various methods for cutting out the syllable section, for example, as follows. That is, syllable boundaries are detected from the power dip and the change point of the spectrum accompanied by an increase or decrease in power.

そして、検出された音節境界によって挟まれる区間に含
まれる音節数が推定平均フレーム長の最小値の２倍以上
ある場合には、上記区間を複数の音節区間と見なして区
間内のスペクトルの変化点に基づいて複数の音節区間候
補が切り出される（第３図）。If the number of syllables included in the section sandwiched by the detected syllable boundaries is more than twice the minimum value of the estimated average frame length, the above section is regarded as a plurality of syllable sections, and the spectral change point within the section is A plurality of syllable section candidates are cut out based on (FIG. 3).

ステップＳ２で、上記ステップＳＬにおいて切り出され
た音節区間候補が展開されて複数の音節区間列が生成さ
れる（第４図）。その際に、各音節区間列には、各音節
境界候補の確からしさに基づく信頼度が付加される。At step S2, the syllable section candidates cut out at step SL are expanded to generate a plurality of syllable section strings (FIG. 4). At this time, each syllable interval sequence is given a reliability based on the certainty of each syllable boundary candidate.

ステップＳ３で、上記ステップＳ２において生成された
複数の音節区間列の一つの音節区間の特徴パターンが特
徴パターンメモリ７１に格納され、上述のようにして単
音節認識部６によって単音節認識が行われて音節候補が
出力される。こうして、上記複数の音節区間列が各音節
区間毎に認識されるのである。In step S3, the characteristic pattern of one syllable interval of the plurality of syllable interval sequences generated in step S2 is stored in the characteristic pattern memory 71, and monosyllable recognition is performed by the monosyllable recognition unit 6 as described above. syllable candidates are output. In this way, the plurality of syllable interval sequences are recognized for each syllable interval.

ステップＳ４で、上記ステップＳ３における単音節認識
の認識結果として各音節区間列毎に音節ラティスが生成
される。その際に、音節ラティスの各音節候補には標準
パターンとの類似度に基づく信頼度が付加され、その信
頼度の高い順に音節候補が配列されるのである（第５図
）。In step S4, a syllable lattice is generated for each syllable interval string as a recognition result of the monosyllable recognition in step S3. At this time, each syllable candidate in the syllable lattice is given a degree of reliability based on its degree of similarity to the standard pattern, and the syllable candidates are arranged in descending order of degree of reliability (FIG. 5).

ステップＳ５で、上記ステップＳ４において生成された
各音節区間列毎の複数の音節ラティスに基づいて、信頼
度が第１位の音節区間列に対応する音節ラティス（以下
、第１位の音節ラティスと言う）が文節認識部８に送出
される。それと同時に文節処理の実行が指示される。In step S5, based on the plurality of syllable lattices for each syllable interval sequence generated in step S4, the syllable lattice corresponding to the syllable interval sequence with the highest reliability (hereinafter referred to as the first syllable lattice) is determined. ) is sent to the phrase recognition unit 8. At the same time, execution of clause processing is instructed.

そうすると、文節認識部８においては、人力された第１
位の音節ラティスを構成する音節候補を組み合わせて文
字列が作成され、文節認識用辞書９の照合等の言語処理
が行われて、文法的に成立する文字列のみが認識結果（
文節候補）として累積信頼度と共に、累積信頼度の順に
出力される（第６図）。すなわち、文節認識処理はボト
ムアップ処理であると言うことができる。Then, in the phrase recognition unit 8, the first
A character string is created by combining the syllable candidates that make up the syllable lattice, and linguistic processing such as checking with the phrase recognition dictionary 9 is performed, and only grammatically valid character strings are recognized as (
The phrase candidates are output together with the cumulative reliability in the order of the cumulative reliability (Fig. 6). In other words, the phrase recognition process can be said to be a bottom-up process.

ステップＳ６で、上記ステップＳ４において生成された
複数の音節ラティスの総てが単語認識部ｌＯに送出され
る。そして、単語認識用辞書１１が照合されて単語候補
が出力される。In step S6, all of the plurality of syllable lattices generated in step S4 are sent to the word recognition unit IO. The word recognition dictionary 11 is then compared and word candidates are output.

上記辞書照合は次のように行われる。すなわち、単語認
識用辞書２における辞書項目（単語）を構成する総ての
音節が、音節ラティスを構成する音節候補群の中に存在
するような音節ラティスが在るか否かが各辞書項目毎に
判定される。その結果、そのような音節ラティスか存在
する場合にはその辞書項目（単語）が累積信頼度と共に
累積信頼度の順に認識結果（単語候補）として出力され
る（第７図）。すなわち、ｓ語処理はトップダウン処理
であると言うことができる。The above dictionary matching is performed as follows. In other words, it is determined for each dictionary item whether or not there is a syllable lattice in which all the syllables that make up the dictionary item (word) in the word recognition dictionary 2 are present in the group of syllable candidates that make up the syllable lattice. It is determined that As a result, if such a syllable lattice exists, its dictionary entry (word) is output as a recognition result (word candidate) along with its cumulative reliability in the order of the cumulative reliability (FIG. 7). In other words, s-word processing can be said to be top-down processing.

ステップＳ７で、上記ステップＳ５において出力された
文節候補およびステップＳ６において出力された単語候
補が表示部１２に表示される（第８図または第９図）。In step S7, the clause candidates output in step S5 and the word candidates output in step S6 are displayed on the display unit 12 (FIG. 8 or 9).

ステップＳ８で、上記ステップＳ７における表示内容を
参照して、文節候補および単語候補の中から人力音声に
対応する正しい認識候補が、キーボード１３によって選
択されて認識候補が確定され、音声認識処理動作が終了
する。In step S8, with reference to the display contents in step S7, a correct recognition candidate corresponding to the human voice is selected from among the phrase candidates and word candidates using the keyboard 13, the recognition candidate is confirmed, and the speech recognition processing operation is performed. finish.

次に、例えば単語「はいけい」かアナログ入力部ｌに入
力された場合を例に、第１図〜第９図に従って上述の音
声認識処理動作をより具体的に説明する。Next, the above-mentioned speech recognition processing operation will be explained in more detail with reference to FIGS. 1 to 9, taking as an example the case where the word "Haikei" is input to the analog input section 1.

第３図は音節区間抽出部３において切り出された音節区
間候補を発声内容と音素記号と併せて示す。上記音素記
号は音声の変化を分かり易く示すために添記したもので
あり、この発明とは直接関係はない。この場合、人力音
声「はい」の部分においては音節／は／が音節区間候補
ＳＯに、音節／い／が音節区間候補Ｓｌに一意に決定さ
れている。ところが、入力音声「けい」の部分において
は音節列／けい／が音節区間候補Ｓ２に決定される場合
と、音節／け／が音節区間候補Ｓ３に決定されると共に
音節／い／が音節区間候補Ｓ４に決定される場合との２
通りの音節区間候補が作成されている。FIG. 3 shows the syllable segment candidates extracted by the syllable segment extraction unit 3 together with the utterance content and phoneme symbols. The above-mentioned phoneme symbols are added to show the change in voice in an easy-to-understand manner, and are not directly related to this invention. In this case, in the human voice "yes", the syllable /ha/ is uniquely determined as the syllable interval candidate SO, and the syllable /i/ is uniquely determined as the syllable interval candidate Sl. However, in the input speech "kei", the syllable string /kei/ is determined as the syllable interval candidate S2, and the syllable /ke/ is determined as the syllable interval candidate S3, and the syllable /i/ is determined as the syllable interval candidate S2. 2 with the case decided in S4
Syllable section candidates for the street have been created.

・・・第２図のステップＳ１上述のように作成された音節区間候補か展開されて第４
図に示すような音節区間列が生成されろ。...Step S1 in Fig. 2 The syllable interval candidates created as described above are expanded and the fourth
Generate a syllable interval sequence as shown in the figure.

この場合、音節区間候補列５Ｏ−９Ｌ−９２から成る音
節区間列０と音節区間候補列５Ｏ−９ｌ−８３Ｓ４から
成る音節区間列１との２通りの音節区間列が生成される
。その際に、上述のように音節区間列には信頼度か付加
されており、音節区間列０に付加された信頼度の方が音
節区間列１に付加された信頼度よりも高いものとする。In this case, two syllable interval sequences are generated: syllable interval sequence 0 consisting of syllable interval candidate sequence 5O-9L-92 and syllable interval sequence 1 consisting of syllable interval candidate sequence 5O-9l-83S4. In this case, as mentioned above, reliability is added to the syllable interval sequence, and the reliability added to syllable interval sequence 0 is higher than the reliability added to syllable interval sequence 1. .

・・・第２図のステップＳ２第４図に示す音節区間列Ｏの各音節区間候補ＳＯ，Ｓｔ
、Ｓ２が単音認識部６で単音節認識され、その単音節認
識結果に基づいて第５図に示すようＧ音節ラティスが生
成される。この場合、音節区間列Ｏに基づく音節ラティ
スを音節ラティス０とし、音節区間列１に基づく音節ラ
ティスを音節ラティス１とする。...Step S2 in Figure 2 Each syllable interval candidate SO, St of the syllable interval sequence O shown in Figure 4
, S2 are monosyllable recognized by the monosyllable recognition unit 6, and a G syllable lattice is generated as shown in FIG. 5 based on the monosyllable recognition results. In this case, the syllable lattice based on the syllable interval sequence O is designated as syllable lattice 0, and the syllable lattice based on the syllable interval sequence 1 is designated as syllable lattice 1.

・・第２図のステップＳ３．ステップＳ４第１位の音節
ラティスである音節ラティス０が文節認識部８に送出さ
れて、音節ラティス０を構成する各音節区間候補Ｓ　Ｏ
，Ｓ　ｌ、　Ｓ　２に対応する音節候補が後に詳述する
ようにして選択されて文字列が作成される。すなわち、
まず各音節区間候補ＳＯ，Ｓ１．Ｓ２における信頼度が
第１位の音節候補が選択されて累積信頼度が最も高い文
字列「はいて」が作成される。以下、作成された文字列
の各文字が順次他の音節候補と取り替えられて文字列が
作成される。そして、文節認識用辞書９を照合して文節
認識用辞書９にない文字列や文法的に成立しない文字列
が除去されて、第６図に示すような文節候補が得られる
。図中における候補順位は、その文節候補を構成する音
節候補列の累積信頼度の大きい順である。　　　　　　
・・・ステップＳ５さらに、生成された総ての音節ラテ
ィス、すなわち、音節ラティスＯおよび音節ラティスｌ
が単語認識部１０に送出され、単語認識用辞書ｌｌが上
述のようにして照合されて単語認識が実行される。この
場合には、単語認識用辞書ｌｌの辞書項目「はいけい」
を構成する音節／は／、／い／、／け／および／い／の
総てが音節ラティス１の音節候補群の中に在るので、単
語候補として「はいけい」が音節ラティス番号１”と対
応付けて得られる。同様にして、辞書項目「かいて」を
構成する音節の総てが音節ラティスＯの音節候補群の中
に在るので単語候補「かいて」が音節ラティス番号“０
”と対応付けて得られる。以下同様にして、単語候補の
文字列と音節ラティス番号とが対応付けて得られ、第７
図に示すような単語候補が得られるのである。図中にお
ける候補順位は、その単語候補を構成する音節候補列の
累積信頼度の大きい順である。...Step S3 in FIG. Step S4 The syllable lattice 0, which is the first syllable lattice, is sent to the phrase recognition unit 8, and each syllable section candidate S O making up the syllable lattice 0 is sent.
, S 1 and S 2 are selected to create a character string, as will be detailed later. That is,
First, each syllable interval candidate SO, S1. The syllable candidate with the highest reliability in S2 is selected, and the character string "Haite" with the highest cumulative reliability is created. Thereafter, each character of the created character string is sequentially replaced with other syllable candidates to create a character string. Then, the phrase recognition dictionary 9 is checked to remove character strings that are not in the phrase recognition dictionary 9 and character strings that are not grammatically valid, and phrase candidates as shown in FIG. 6 are obtained. The candidate ranks in the figure are in descending order of the cumulative reliability of the syllable candidate strings that constitute the phrase candidate.
...Step S5 Furthermore, all the generated syllable lattices, that is, syllable lattice O and syllable lattice L
is sent to the word recognition unit 10, and the word recognition dictionary 11 is collated as described above to execute word recognition. In this case, the dictionary entry "Haikei" in the word recognition dictionary ll is used.
All of the syllables /ha/, /i/, /ke/, and /i/ that make up ``Haikei'' are in the syllable candidate group of syllable lattice 1, so ``Haikei'' is a word candidate in syllable lattice number 1. Similarly, all the syllables that make up the dictionary entry "Kite" are in the syllable candidate group of syllable lattice O, so the word candidate "Kite" is obtained with syllable lattice number "0".
”. In the same manner, word candidate character strings and syllable lattice numbers are obtained by associating them.
Word candidates as shown in the figure are obtained. The candidate ranks in the figure are in descending order of cumulative reliability of the syllable candidate strings that constitute the word candidate.

・・・ステップＳにうして得られた、文節候補および単語候補の認識候補が
表示部１２に表示される。その際に、表示方法として次
に示すような２つの方法がある。. . . The recognition candidates of phrase candidates and word candidates obtained in step S are displayed on the display unit 12. At that time, there are two display methods as shown below.

１つの方法は、第８図に示すように表示部Ｉ２にオープ
ンされた１つのウィンドウ２１を２つの領域に分割し、
その一方の領域には単語候補を表示すると共に他方の領
域には文節候補を表示することによって、１つのウィン
ドウ２１内に両認識候浦を同時に表示するのである。い
ま１つの方法は、第９図に示すように表示部１２に２つ
のウィンドウ２２．２３をオープンし、その一方のウィ
ンドウ２２に単語候補を表示すると共に他方のウィンド
ウ２３に文節候補を表示するのである。One method is to divide one window 21 opened on the display section I2 into two areas as shown in FIG.
By displaying word candidates in one area and phrase candidates in the other area, both types of recognition can be displayed simultaneously within one window 21. Another method is to open two windows 22 and 23 on the display unit 12, as shown in FIG. 9, and display word candidates in one window 22 and phrase candidates in the other window 23. be.

・・・ステップＳ７そして、第８図あるいは第９図の表示内容に従って、認
識候補を選択する場合には例えば次のようにして行う。...Step S7 Then, in accordance with the display contents of FIG. 8 or FIG. 9, recognition candidates are selected in the following manner, for example.

すなわち、１つのウィンドウ内の認識候補の選択にはキ
ーボード１３の「↓」、「↑」等のキーを用い、ウィン
ドウの選択の場合には「−」「←」等のキーを用いる。That is, keys such as "↓" and "↑" on the keyboard 13 are used to select recognition candidates within one window, and keys such as "-" and "←" are used to select a window.

　　　・・・ステップＳ８このように、文節認識処理と
単語認識処理とを併用することによって、文節等の超大
給量での認識が可能となると共に、その際に文節認識部
８において認識できなかった単語「はいけい」が単語認
識部ＩＯで認識されるので、キーボード１３による選択
によって所望の認識結果を得ることができるのである。...Step S8 In this way, by using the phrase recognition process and the word recognition process together, it is possible to recognize a super large amount of phrases, etc., and at the same time, it is possible to recognize phrases that could not be recognized by the phrase recognition unit 8. Since the word "haikei" is recognized by the word recognition unit IO, a desired recognition result can be obtained by selection using the keyboard 13.

すなわち、本実施例においては、入力された音声の音声
信号に基づいて音節区間抽出部３によって音節区間列を
生成する。そして、この生成された音節区間列毎に、単
音節認識部６によって単音節認識を行って各音節区間列
に対応する音節ラティスを生成する。その後、信頼度が
第１位の音節ラティスに対しては文節認識部８によって
文節認識処理を実行すると共に、総ての音節ラティスに
対して単語認識処理を実行する。そして、表示部１２に
よって上記文節認識処理によって得られた文節候補およ
び単語認識処理によって得られた単語候補を表示し、こ
の表示結果を参照してキーボード１３によって所望の認
識結果を選択するようにしている。That is, in this embodiment, the syllable segment extraction unit 3 generates a syllable segment sequence based on the audio signal of the input voice. Then, for each generated syllable segment string, monosyllable recognition is performed by the monosyllable recognition unit 6 to generate a syllable lattice corresponding to each syllable segment string. Thereafter, the phrase recognition unit 8 executes phrase recognition processing for the syllable lattice with the highest reliability, and also executes word recognition processing for all syllable lattices. Then, the display section 12 displays the phrase candidates obtained by the phrase recognition process and the word candidates obtained by the word recognition process, and a desired recognition result is selected using the keyboard 13 with reference to the displayed results. There is.

このように、文節認識部８による文節認識処理と、単語
認識部１０による単語認識処理とを併用することによっ
て、以下に述べるような効果が得られる。すなわち、頻
度の高い単語を単語認識用辞書１１に登録しておくこと
によって、単語認識用辞書１１に登録された頻度の高い
単語は単語認識処理によって精度よく認識できる。一方
、単語認識用辞書１１に登録されていない単語あるいは
文節等は文節認識処理によって効率良く認識できる。In this way, by using the phrase recognition process by the phrase recognition unit 8 and the word recognition process by the word recognition unit 10 in combination, the following effects can be obtained. That is, by registering frequently occurring words in the word recognition dictionary 11, the frequently occurring words registered in the word recognizing dictionary 11 can be recognized with high accuracy through word recognition processing. On the other hand, words or phrases that are not registered in the word recognition dictionary 11 can be efficiently recognized by phrase recognition processing.

換言すれば、ボトムアップ処理である文節認識処理とト
ップダウン処理である単語認識処理とを併用することに
よって、精度と効率とをバランスさせて超大給量の音声
を認識できるのである。その際に、上述のように識別単
位を音節としているので、単語を識別単位とした場合よ
りも標準パターンの登録を効率良く行うことができるの
である。In other words, by using bottom-up phrase recognition processing and top-down word recognition processing together, it is possible to recognize a very large amount of speech with a balance between accuracy and efficiency. At this time, since the identification unit is a syllable as described above, the standard pattern can be registered more efficiently than when words are used as the identification unit.

すなわち、本実施例によれば、音節を識別単位としてい
るので効率良く音声の識別かできる。また単語単位での
認識処理と文節単位での認識処理とを併用しているので
超大給電の音声を効率良くしかも精度良く認識できるの
である。That is, according to this embodiment, since syllables are used as identification units, speech can be efficiently identified. Furthermore, since recognition processing is performed on a word-by-word basis and on a phrase-by-phrase basis, it is possible to efficiently and accurately recognize speech using ultra-high power supply.

上記実施例においては、音節を音声識別の単位としてい
る。しかしながら、この発明はこれに限定されるもので
はなく、音素および半音節等であってもよい。すなわち
、音素、半音節および音節等を総称したいわゆる音韻を
識別単位とすればよいのである。In the above embodiment, the syllable is used as the unit of speech identification. However, the present invention is not limited to this, and phonemes, semisyllables, etc. may also be used. That is, a so-called phoneme, which is a general term for phonemes, half-syllables, syllables, etc., may be used as the identification unit.

上記実施例においては、信頼度第１位の音節ラティスで
文節認識処理を実行し、総ての音節ラティスで単語認識
処理を実行するようにしている。しかしながら、この発
明はこれに限定されるものではない。要は、異なる認識
単位の複数の認識処理を併用できればよいのである。In the above embodiment, phrase recognition processing is performed on the syllable lattice with the highest reliability, and word recognition processing is performed on all syllable lattices. However, the invention is not limited thereto. The point is that it is sufficient if a plurality of recognition processes of different recognition units can be used together.

上記実施例においては、表示部１２に文節候補および単
語候補を表示する際に仮名文字によって表示するように
しているが、仮名漢字変換を行って漢字によって表示す
るようにしてもよい。In the above embodiment, when displaying phrase candidates and word candidates on the display unit 12, they are displayed in kana characters, but they may be displayed in kanji by performing kana-kanji conversion.

上記実施例においては、文節認識処理および単語認識処
理を同時に実行するようにしている。しかしながら、こ
の発明はこれに限定されるものではなく、両認識処理を
同時に実行するかあるいはいずれか一方の認識処理のみ
を実行するかを、切替手段によって切り替え可能にして
もよい。こうすることによって、例えば単語認識用辞書
１１に地名のみを登録しているような場合には、地名が
含まれていない一般の文章を人力する際には文節認識処
理のみを選択することによって、表示部■２に余分な認
識候補が表示されないようにして、さらに効率良く音声
認識を実施することができるのである。In the above embodiment, the phrase recognition process and the word recognition process are executed simultaneously. However, the present invention is not limited to this, and a switching means may be used to switch between executing both recognition processes simultaneously or executing only one of the recognition processes. By doing this, for example, if only place names are registered in the word recognition dictionary 11, when manually writing general sentences that do not include place names, you can select only phrase recognition processing. By preventing unnecessary recognition candidates from being displayed on the display section (2), speech recognition can be performed more efficiently.

〈発明の効果〉以上より明らかなように、この発明の音声認識装置は、
認識単位の異なる複数の認識部および表示部を備えて、
入力された音声を音韻単位で識別し、この識別結果に基
づいて、上記複数の認識部によって辞書を照合して各認
識部に応じた認識単位で音声を認識し、その認識結果を
表示部に表示するようにしたので、異なる認識単位の認
識処理をＯＦ用できる。したがって、入力音声を音韻単
位て識別てきると共に、異なる認識中（立による複数の
認識処理を併用することによって、各認識単位による認
識の特徴を生かして効率良くしかも精度良く超大給量の
音声を認識することかできる。<Effects of the Invention> As is clear from the above, the speech recognition device of the present invention has the following effects:
Equipped with multiple recognition units and display units with different recognition units,
The input speech is identified in phoneme units, and based on the identification results, the plurality of recognition units mentioned above compare the dictionaries and recognize the speech in recognition units according to each recognition unit, and the recognition results are displayed on the display unit. Since it is displayed, recognition processing for different recognition units can be used OF. Therefore, in addition to identifying input speech in units of phonemes, by using multiple recognition processes in different recognition stages (stands) in combination, we can utilize the recognition characteristics of each recognition unit to efficiently and accurately generate a large amount of speech. I can recognize it.

また、この発明の音声認識装置は、上記各認識部が音声
を認識する際に用いる辞書を、各認識部の認識単位に対
応させて構成して各認識部毎に設するようにし、動作さ
せる認識部を切替手段によって切り替え選択可能にした
ので、入力音声の内容に応じた認識処理のみによって音
声認識を実行することができ、さらに効率良く音声を認
識できる。Further, in the speech recognition device of the present invention, the dictionary used by each of the recognition units when recognizing speech is configured to correspond to the recognition unit of each recognition unit, and is provided for each recognition unit, and is operated. Since the recognition unit can be switched and selected by the switching means, speech recognition can be performed only by recognition processing according to the content of input speech, and speech can be recognized more efficiently.

また、この発明の音声認識装置は、上記表示部にオープ
ンする１つのウィンドウを複数の領域に分け、その各領
域に異なった認識部による認識候補を表示するようにし
、表示された認識候補の中から所望の認識候補を認識候
補選択手段によって選択可能にしたので、表示された認
識候補の中から正しい認識候補を選択でき、簡単に正し
い認識・候補を選択できる。Further, in the speech recognition device of the present invention, one window opened on the display section is divided into a plurality of regions, and recognition candidates by different recognition sections are displayed in each region, and among the displayed recognition candidates. Since the desired recognition candidate can be selected by the recognition candidate selection means, the correct recognition candidate can be selected from among the displayed recognition candidates, and the correct recognition/candidate can be easily selected.

また、この発明の音声認識装置は、上記表示部にオーブ
ンする複数のウィンドウに異なった認識部による認識候
補を表示するようにし、複数のウィンドウの中から所望
のウィンドウをウィンドウ選択手段によって選択し、か
つ、この選択されたウィンドウに表示された認識候補の
中から所望の認識候補を認識候補選択手段によって選択
可能にしたので、さらに簡単に正しい認識候補を選択で
きる。Further, the speech recognition device of the present invention displays recognition candidates by different recognition units in a plurality of windows opened on the display unit, and selects a desired window from among the plurality of windows by a window selection means, Moreover, since the desired recognition candidate can be selected by the recognition candidate selection means from among the recognition candidates displayed in the selected window, the correct recognition candidate can be selected more easily.

[Brief explanation of drawings]

第１図はこの発明の音声認識装置における一実施例のブ
ロック図、第２図は音声認識処理動作のフローチャート
、第３図は切り出された音節区間候補の例を示す図、第
４図は生成された音節区間列の例を示す図、第５図は生
成された音節ラティスの例を示す図、第６図は得られた
文節候補の例を示す図、第７図は得られた単語候補の例
を示す図、第８図は認識候補表示の一例を示す図、第９
図は認識候補表示の他の例を示す図である。ｌ・・・アナログ入力部、２・・・音声分析部、３・・
・音節区間抽出部、　　　５・・・ＣＰＵ、６・・・単
音節認識部、　　　　７・・・パターンメモリ、８・・
・文節認識部、　　　　　９・・・文節認識用辞書、ｌ
Ｏ・・・単語認識部、　　　１１・・・単語認識用辞書
、１２・・・表示部、　　　　　１３・・・キーボード
。Fig. 1 is a block diagram of an embodiment of the speech recognition device of the present invention, Fig. 2 is a flowchart of speech recognition processing operation, Fig. 3 is a diagram showing an example of extracted syllable section candidates, and Fig. 4 is a generated 5 is a diagram showing an example of a generated syllable lattice, FIG. 6 is a diagram showing an example of the obtained phrase candidates, and FIG. 7 is a diagram showing the obtained word candidates. FIG. 8 is a diagram showing an example of recognition candidate display. FIG. 9 is a diagram showing an example of recognition candidate display.
The figure is a diagram showing another example of recognition candidate display. l...analog input section, 2...speech analysis section, 3...
-Syllable section extraction unit, 5...CPU, 6...Single syllable recognition unit, 7...Pattern memory, 8...
・Phrase recognition unit, 9...Phrase recognition dictionary, l
O...Word recognition unit, 11...Word recognition dictionary, 12...Display unit, 13...Keyboard.

Claims

[Claims]

(1) In a speech recognition device that acoustically analyzes the input speech signal and converts it into feature parameters, identifies the feature parameters in phoneme units using a predetermined method, and recognizes speech based on the identification results, the phoneme A recognition unit that recognizes input speech by checking a dictionary with meaningful recognition units such as words or phrases based on the unit identification results, and each recognition unit includes a plurality of recognition units each having a different recognition unit; What is claimed is: 1. A speech recognition device, comprising: a display section that displays recognition results obtained by the speech recognition system, and enables recognition processing of different recognition units to be used together.

(2) The dictionary used by each recognition unit when recognizing speech is configured to correspond to the recognition unit of each recognition unit, and is provided for each recognition unit. Speech recognition device.

(3) The speech recognition device according to claim 2, further comprising a switching means for switching and selecting the recognition unit to be operated when performing recognition processing.

(4) The display means is divided into a plurality of areas, each area having one window for displaying recognition candidates by different recognition units, and recognizing a desired recognition candidate from among the displayed recognition candidates. 4. The speech recognition device according to claim 1, wherein the speech recognition device is made selectable by candidate selection means.

(5) The display means has a plurality of windows each displaying recognition candidates by different recognition units, and the window selection means selects a desired window from among the plurality of windows, and displays the selected window. 4. The speech recognition device according to claim 1, wherein a desired recognition candidate can be selected from among the displayed recognition candidates by recognition candidate selection means.