JPH05119793A

JPH05119793A - Method and device for speech recognition

Info

Publication number: JPH05119793A
Application number: JP3306487A
Authority: JP
Inventors: Yoshio Nakadai; 芳夫中▲台▼
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1991-10-25
Filing date: 1991-10-25
Publication date: 1993-05-18

Abstract

(57)【要約】【目的】認識対象を限定して高精度の認識率を達成
し、音声入力に対して自由な語彙の文章を作成する音声
ワードプロセッサを提供する。【構成】標準パターン記憶部６には、認識語彙を複数
の部分集合に分類し、各音声標準パターンについて、ラ
ベル名とともに部分集合フラグを付して登録されてい
る。発声者の音声は音声入力部１より入力され、音声分
析部２で該音声の特徴パターンが抽出される。一方、発
声者はキー入力部５より、該発声した音声の部分集合を
表わすコードを入力する。標準パターン選択部７は、該
入力コードに対応する部分集合の標準パターン群とその
ラベル名を標準パターン記憶部６より選択する。パター
ンマッチング部８は、該選択された標準パターン群か
ら、特徴パターンと最も類似度の高いものを選択し、そ
のラベル名を認識出力部を介して出力する。 (57) [Abstract] [Purpose] To provide a speech word processor that achieves a highly accurate recognition rate by limiting the recognition target and creates a vocabulary sentence that is free for speech input. [Construction] In the standard pattern storage unit 6, the recognition vocabulary is classified into a plurality of subsets, and each voice standard pattern is registered with a label flag together with a subset flag. The voice of the speaker is input from the voice input unit 1, and the voice analysis unit 2 extracts the characteristic pattern of the voice. On the other hand, the speaker inputs a code representing a subset of the uttered voice from the key input unit 5. The standard pattern selection unit 7 selects a standard pattern group of the subset corresponding to the input code and its label name from the standard pattern storage unit 6. The pattern matching unit 8 selects the one having the highest similarity to the characteristic pattern from the selected standard pattern group, and outputs the label name via the recognition output unit.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声を入力して文字な
どを出力する音声ワードプロセッサなどに好適な音声認
識方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition method and apparatus suitable for a voice word processor for inputting voice and outputting characters.

【０００２】[0002]

【従来の技術】音声認識を用いて文章を作成しようとす
る技術、いわゆる音声ワードプロセッサについては、従
来よりさまざまな研究がなされている。その理由は、人
間の話した言葉通りに容易に文章を構成できること、文
字入力に比べて音声の入力速度が速いこと、などによ
る。音声ワードプロセッサでは、例えば、文節単位で発
声された音声に対して、その意味を理解して漢字かな混
じり文に書き下すことが重要な技術となっている。この
技術の実現のためには、大語彙の音声認識技術、認
識の難しい動詞、形容詞などの語尾変化部分に対処でき
る認識技術、を確立することが重要である。2. Description of the Related Art Various studies have been conventionally made on a technique for creating a sentence using voice recognition, that is, a so-called voice word processor. The reason is that it is easy to compose sentences according to the words spoken by humans, and the input speed of voice is faster than that of character input. In a voice word processor, for example, it is an important technique to understand the meaning of a voice uttered in a phrase unit and write it down into a sentence containing kanji and kana. In order to realize this technology, it is important to establish a large vocabulary speech recognition technology and a recognition technology capable of coping with inflectional parts such as verbs and adjectives that are difficult to recognize.

【０００３】従来の技術では、に対しては、大語彙認
識を実現するために、単語の記憶容量を大幅に増やす認
識手法が取られている。しかし、ハードウェア上記憶容
量には制限があること、大語彙を同時に認識するための
処理時間が語彙数に比例して増大すること、また、発声
の類似した単語が増加すると、認識率が低下しやすくな
ること、等により、大語彙の単語を高精度で認識する手
法としては有効ではないと考えられる。また、語彙を音
節レベルで認識し、音節列から語彙へ変換する手法があ
るが、変換ルールを決定する必要があることや、音節の
認識精度によって単語認識精度も左右されることから、
現在でも高精度の結果を得るために研究されている状況
にある。In the prior art, in order to realize large vocabulary recognition, a recognition method for significantly increasing the memory capacity of words has been adopted. However, the memory capacity is limited on the hardware, the processing time for recognizing a large vocabulary increases in proportion to the number of vocabularies, and the recognition rate decreases as the number of similar words increases. It is considered that this method is not effective as a method for recognizing a large vocabulary word with high accuracy. In addition, there is a method of recognizing vocabulary at the syllable level and converting a syllable string into a vocabulary, but since it is necessary to determine the conversion rule and the word recognition accuracy depends on the syllable recognition accuracy,
Even now, research is being conducted to obtain highly accurate results.

【０００４】また、の技術に対しては、知識処理に基
づき、発声された単語間や文節間の前後関係から語尾変
化を推定して認識精度を上げる手法や、多量の文章デー
タベースに基づき、先行する語幹部分から後続する語尾
を推定して認識精度を上げる方法が取られている。しか
し、知識処理やデータベースによる方法はハードウェア
上複雑な処理をしいられるため、装置規模が大きくなる
問題があり、簡易なシステムへの応用には向いていな
い。In addition, the technology of (1) is based on a knowledge processing, a method of estimating the ending change from the context of spoken words or between phrases and improving recognition accuracy, and based on a large amount of sentence database. A method of estimating the succeeding word ending from the stem portion of the word and improving the recognition accuracy is adopted. However, since the knowledge processing and the method using the database require complicated processing on the hardware, there is a problem that the device scale becomes large, and it is not suitable for application to a simple system.

【０００５】[0005]

【発明が解決しようとする課題】従来の知識処理や大規
模データベース等を応用して認識結果の不確定部分を改
善する音声認識手法は、処理規模が大きくなるために、
簡易な音声認識装置の構築には向いていない。これに対
しては、音声認識で認識結果があいまいとなる部分の入
力をキーボードなどの他の確実な入力手段によって補助
することで、解決が期待される。その手法には、大語彙
認識においてキー入力の指定により認識対象単語を部分
集合に分割して特定語彙だけを認識対象とする手法、認
識結果候補を複数挙げておき、キー入力により結果を選
択する手法、および、認識する音声に語尾変化に相当す
る複数のラベル名をつけておき、認識した音声に対し、
キー入力によりラベル名を選択する手法、等の組合せが
考えられる。これらの手法は、入力すべきキーの数を限
定することにより、音声入力の便利さや速度を極端に低
下することなく達成が可能である。The speech recognition method for improving the uncertain portion of the recognition result by applying the conventional knowledge processing or large-scale database, etc. requires a large processing scale.
It is not suitable for building a simple voice recognition device. A solution to this is expected to be achieved by assisting the input of a portion where the recognition result is unclear in voice recognition with another reliable input means such as a keyboard. In the method, in the large vocabulary recognition, the recognition target word is divided into subsets by specifying the key input and only the specific vocabulary is recognized as the recognition target, and a plurality of recognition result candidates are listed, and the result is selected by the key input. A method and a plurality of label names corresponding to inflection changes are attached to the recognized voice, and the recognized voice is
A combination of a method of selecting a label name by key input, etc. can be considered. These methods can be achieved by limiting the number of keys to be input without significantly reducing the convenience or speed of voice input.

【０００６】本発明の目的は、人間のキー入力等による
補助入力手法を音声認識と組み合わせることにより、音
声入力の便利さや速度を極端に低下させることなく、高
精度の認識率を達成する簡易な大語彙音声認識装置の構
築を可能にし、また、音声の入力に対し自由な語彙の文
章を作る音声ワードプロセッサの提供を可能にすること
にある。It is an object of the present invention to combine a supplementary input method such as a human key input with voice recognition to achieve a highly accurate recognition rate without significantly reducing the convenience or speed of voice input. It is possible to construct a large vocabulary speech recognition device, and to provide a speech word processor that creates a sentence of free vocabulary for speech input.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するため
に、本発明の音声認識方法は、認識語彙を複数の部分集
合に分類し、音声の標準パターンに対して、そのラベル
名とともに部分集合を示すフラグを付与して標準パター
ン記憶部に登録し、人間のコード入力により部分集合を
指定して、前記標準パターン記憶部から該当部分集合の
標準パターンとラベル名を選択し、前記選択された部分
集合の標準パターンと未知の音声パターンの類似度を計
算し、類似度の高い標準パターンのラベル名を出力する
ことを特徴とする。In order to achieve the above object, the speech recognition method of the present invention classifies a recognition vocabulary into a plurality of subsets, and for a standard pattern of speech, a subset together with its label name. Is registered in the standard pattern storage unit, a subset is designated by human code input, the standard pattern and label name of the relevant subset are selected from the standard pattern storage unit, and the selected The feature is that the similarity between the standard pattern of the subset and the unknown voice pattern is calculated, and the label name of the standard pattern with high similarity is output.

【０００８】また、本発明の音声認識方法は、最も類似
度の高い標準パターン、次に類似度の高い標準パターン
というように、類似度が高いと判定された複数の標準パ
ターンに対するラベル名を出力し、その中から人間のコ
ード入力により最終的に一つのラベル名を選択すること
を特徴とする。Further, the speech recognition method of the present invention outputs label names for a plurality of standard patterns determined to have a high similarity, such as a standard pattern having the highest similarity and a standard pattern having the second highest similarity. Then, one of the label names is finally selected by a human code input from among them.

【０００９】また、本発明の音声認識方法は、一つの標
準パターンに対して複数個のラベル名を付与して標準パ
ターン記憶部に登録し、類似度の高いと判定された標準
パターンについて、その複数のラベル名を出力し、その
中から人間のコード入力により最終的に一つのラベル名
を選択することを特徴とする。Further, in the voice recognition method of the present invention, a plurality of label names are given to one standard pattern and registered in the standard pattern storage unit, and the standard pattern judged to have a high degree of similarity is It is characterized by outputting a plurality of label names and finally selecting one label name by a human code input.

【００１０】また、これらの方法を実施する本発明の音
声認識装置は、認識語彙を複数の部分集合に分類し、音
声の標準パターンに対して、そのラベル名とともに部分
集合を示すフラグを付与して記憶した標準パターン記憶
部と、音声信号を入力する音声入力部と、音声信号より
音声区間を検出し信号分析により音声の特徴パターンを
抽出する信号分析部と、人間のキー操作によりコードを
出力するキー入力部と、前記キー入力部を通じて入力さ
れる部分集合指定コードにより、前記標準パターン記憶
部から該当部分集合の標準パターンとラベル名を選択し
て取り込む標準パターン選択部と、前記信号分析部から
出力された特徴パターンと前記標準パターン選択部に取
り込まれた標準パターンとの間の類似度を計算するパタ
ーンマッチング部と、前記パターンマッチング部の計算
結果により、前記特徴パターンと類似度が高いと判定さ
れた標準パターンのラベル名を出力するラベル名出力部
より構成されることを特徴とする。Further, the speech recognition apparatus of the present invention which implements these methods classifies the recognition vocabulary into a plurality of subsets, and adds a flag indicating the subset together with its label name to the standard speech pattern. The standard pattern storage unit that has been stored in memory, the voice input unit that inputs the voice signal, the signal analysis unit that detects the voice section from the voice signal and extracts the feature pattern of the voice by signal analysis, and outputs the code by human key operation. A key input section, a standard pattern selection section for selecting and capturing a standard pattern and label name of the corresponding subset from the standard pattern storage section by a subset specifying code input through the key input section; and the signal analysis section. Pattern matching section for calculating the similarity between the characteristic pattern output from the standard pattern and the standard pattern selected by the standard pattern selection section , The calculation result of the pattern matching unit, characterized in that it is composed of a label name output section that outputs the label of the standard pattern similarity between the feature pattern is determined to be high.

【００１１】[0011]

【作用】本発明では、認識語彙を部分集合に分類する際
に、分野別集合、品詞別集合、などに分け、また、１つ
の部分集合内に類似した発声を持つ語彙が少なくなるよ
うに分割を行うことにより、母集合が大語彙であっても
認識精度の劣化を少なくして認識することができる。さ
らに、１つの音声に対し、語尾変化などに相当する複数
個のラベル名をもたせることにより、従来技術で認識の
難しかった語尾変化を、人間によるキー入力の補助で自
由に選択できる。In the present invention, when classifying the recognized vocabulary into subsets, the recognition vocabulary is divided into a field-based set, a part-of-speech-based set, and the like, and the vocabulary having similar utterances in one subset is reduced. By doing, even if the population is a large vocabulary, it is possible to perform recognition with less deterioration in recognition accuracy. Furthermore, by giving a plurality of label names corresponding to word ending changes to one voice, it is possible to freely select a word ending change that was difficult to recognize in the conventional technique with the aid of human key input.

【００１２】[0012]

【実施例】以下、本発明の一実施例について図面により
説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００１３】図１に、本発明の一実施例である音声認識
装置のブロック図を示す。音声入力部１は、発声者１１
の音声信号を入力するもので、その手段はマイク、送話
器などである。音声分析部２は、音声入力部１より入力
された音声信号を区間検出し、検出された信号区間につ
いて分析を行うものである。区間検出の方法としては、
例えば短時間パワーによる方法がある。また、分析手法
としては、例えばＬＰＣケプストラムがある。ラベル入
力部３は、分析された音声区間に対してラベル名を入力
するものであり、その手段は、例えばコンピュータのキ
ーボードである。ここで、一つの音声に対し、ラベル名
の数は、例えば最低１個から最大９個までよいものとす
る。また、ラベル名には漢字かな交じりの名称をつける
ことも可能とする。ラベル出力部４は、認識結果として
ラベル名を表示し、また、発声者１１に対して音声入力
のタイミングなどを表示するものであり、その手段は例
えば、コンピュータのディスプレイ装置や、また、例え
ば文字データから音声を合成して出力する音声合成装置
とする。キー入力部５は、標準パターン選択部７や認識
出力部９と組み合せて認識語彙や認識結果を選択し、ま
た、キー入力だけで文章を校正するときの句点、読点、
改行、語句の消去、などを可能にするものであり、その
手段は、例えば、プッシュホンにおけるＰＢトーン認識
出力装置である。なお、このキー入力部５とラベル入力
部３は同じでもよい。FIG. 1 shows a block diagram of a voice recognition apparatus which is an embodiment of the present invention. The voice input unit 1 is a speaker 11
The voice signal is input by a microphone, a transmitter, or the like. The voice analysis unit 2 detects a voice signal input from the voice input unit 1 and analyzes the detected signal period. As a method of section detection,
For example, there is a method using short-time power. Moreover, as an analysis method, for example, there is an LPC cepstrum. The label input unit 3 inputs a label name for the analyzed voice section, and its means is, for example, a keyboard of a computer. Here, it is assumed that the number of label names for one voice is good, for example, from a minimum of 1 to a maximum of 9. It is also possible to give a name to the label name that is mixed with kanji and kana. The label output unit 4 displays a label name as a recognition result, and also displays the timing of voice input to the speaker 11, which means is, for example, a display device of a computer or a character. A voice synthesizer that synthesizes voice from data and outputs it. The key input unit 5 is used in combination with the standard pattern selection unit 7 and the recognition output unit 9 to select a recognition vocabulary and a recognition result, and also, to puncture a sentence when proofreading a sentence only by key input, a reading point,
It enables line breaks, deletion of words, etc., and its means is, for example, a PB tone recognition output device in a touch-tone phone. The key input section 5 and the label input section 3 may be the same.

【００１４】標準パターン記憶部６は、登録モード時、
音声分析部２で分析された音声パターン（これを標準パ
ターンという）に対し、ラベル入力部３で入力されたラ
ベル名をつけて保持するものである。標準パターン選択
部７は、標準パターン記憶部６に記憶された全標準パタ
ーンの中から、キー入力部５からの入力コードに対応す
る部分集合だけを取り出してパターンマッチング部８で
の計算に使用するようセットするものである。パターン
マッチング部８は、認識モード時、現在音声分析部２で
分析されたラベル名の未知の音声特徴パターン（これを
標準パターンに対して未知パターンという）と標準パタ
ーン選択部７で選択された複数の標準パターンとの間で
類似度計算を行うものであり、その手法には、例えば、
音声認識のパターンマッチング法として知られているＤ
Ｐマッチングがある。認識出力部９は、パターンマッチ
ング部８での類似度計算の結果、未知パターンと最も類
似度が高いと判定された１個あるいは複数個の標準パタ
ーンにつけられた１個または複数個のラベル名を出力す
るものである。また、この認識出力部９では、キー入力
部５からの入力コードによって特定の認識結果あるいは
ラベル名だけを選択して出力することもできる。文章出
力部１０は、認識出力部９で確定した単語列からなる文
章を出力するものである。In the registration mode, the standard pattern storage unit 6 stores
The voice pattern analyzed by the voice analysis unit 2 (this is called a standard pattern) is given the label name input by the label input unit 3 and held. The standard pattern selection unit 7 extracts only a subset corresponding to the input code from the key input unit 5 from all the standard patterns stored in the standard pattern storage unit 6 and uses it for calculation in the pattern matching unit 8. To set. In the recognition mode, the pattern matching unit 8 includes the unknown voice feature pattern of the label name currently analyzed by the voice analysis unit 2 (this is referred to as an unknown pattern with respect to the standard pattern) and the plurality of patterns selected by the standard pattern selection unit 7. The similarity calculation is performed with the standard pattern of.
D, which is known as a pattern matching method for voice recognition
There is P matching. The recognition output unit 9 outputs one or more label names attached to one or more standard patterns determined to have the highest similarity with the unknown pattern as a result of the similarity calculation in the pattern matching unit 8. It is output. The recognition output unit 9 can also select and output only a specific recognition result or label name according to the input code from the key input unit 5. The sentence output unit 10 outputs a sentence including a word string determined by the recognition output unit 9.

【００１５】図２は標準パターン記憶部６の詳細説明図
である。標準パターン記憶部６は、各標準パターンに１
個あるいは複数のラベル名を付けて保持する標準パター
ン辞書に加えて、認識語彙を部分集合に分類するため
の、部分集合名とそれに一意に付与したコードとを保持
する大語彙辞書を有している。FIG. 2 is a detailed explanatory view of the standard pattern storage unit 6. The standard pattern storage unit 6 stores 1 for each standard pattern.
In addition to the standard pattern dictionary that holds individual or multiple label names, it also has a large vocabulary dictionary that holds a subset name and a code uniquely assigned to it for classifying recognized vocabularies into subsets. There is.

【００１６】図２（ａ）は大語彙辞書の具体例で、これ
は認識語彙を品詞別集合に分類した例である。ここで、
助詞群、名詞群、動詞群、…は各々部分集合であり、各
々に固有のコードを付与し、大語彙辞書として標準パタ
ーン記憶部６に登録しておく。図２（ａ）に示すよう
に、ここでは、０１＃は助詞群、０２＃は名詞群、０３
＃は動詞群を示すとする。FIG. 2A is a specific example of a large vocabulary dictionary, which is an example in which the recognized vocabulary is classified into sets by part of speech. here,
The particle group, the noun group, the verb group, ... Are subsets, each of which is given a unique code and registered in the standard pattern storage unit 6 as a large vocabulary dictionary. As shown in FIG. 2A, here, 01 # is a particle group, 02 # is a noun group, 03
# Indicates a group of verbs.

【００１７】図２（ｂ）は標準パターン辞書の中の一つ
のレコードを示したものである。標準パターンには、最
低限１個のラベル名が必要であるが、それ以外に複数の
ラベル名（例えば最大９個のラベル名）を付けることが
可能である。図２（ｂ）において、フラグは、当該レコ
ードの音声パターンがどの部分集合に分類されるかを示
す。ここでは、図２（ａ）に示した語彙辞書中の部分集
合名が入る。音声パターンは認識に使用する標準（音
声）パターンで、音声分析部２で分析された結果がビッ
ト表現されて入る。ラベル１には、認識結果として最初
に出力されるラベル名が入る。ラベル２〜Ｎには、キー
入力部５で次候補を指示したとき次々に出力される次候
補のラベル名が入る。終了ラベルは、一つの音声パター
ンに対するレコードの終了を表わす。FIG. 2 (b) shows one record in the standard pattern dictionary. A minimum of one label name is required for the standard pattern, but a plurality of label names (for example, nine label names at the maximum) can be attached to the standard pattern. In FIG. 2B, the flag indicates to which subset the voice pattern of the record is classified. Here, the subset name in the vocabulary dictionary shown in FIG. 2A is entered. The voice pattern is a standard (voice) pattern used for recognition, and the result analyzed by the voice analysis unit 2 is expressed in bits. Label 1 contains the label name that is first output as a recognition result. Labels 2 to N contain the label names of the next candidates that are output one after another when the next candidate is designated by the key input unit 5. The end label represents the end of the record for one voice pattern.

【００１８】図２（ｃ）と（ｄ）は標準パターン辞書の
具体例で、（ｃ）は名詞群の場合、（ｄ）は動詞群の場
合である。なお、「＊＊＊＊＊＊」はビット表現された
音声パターン（標準パターン）の意味である。2 (c) and 2 (d) are concrete examples of the standard pattern dictionary. FIG. 2 (c) shows a case of a noun group and (d) shows a case of a verb group. It should be noted that "*****" means a voice pattern (standard pattern) expressed in bits.

【００１９】以下に図１の動作を説明する。動作は、認
識に用いる標準パターンを登録する登録モードと、未知
パターンを認識する認識モードの２つに分かれる。以
下、それぞれについて説明する。登録モードの場合登録モードでは、発声者１１による発声音声を標準パタ
ーン記憶部６に登録することを行う。まず、発声者１１
がキー入力部５の操作により音声登録操作を指定する
と、該音声認識装置では、ラベル出力部４より「音声を
入力して下さい」のメッセージを画面または合成音声で
出力し、発声者１１の音声入力を促す。ここで、発声者
１１が「私」、「東京」、「行く」、「あなた」、「待
つ」、「ありがとう」などの単語音声を発声すると、そ
の音声は音声入力部１より取り込まれ、音声分析部２に
よって単語単位で区間検出および信号分析され、標準パ
ターン記憶部６へ送られる。ここで、発声者１１がラベ
ル入力部３より該音声のラベル名を入力することによ
り、標準パターン記憶部６では、ラベルつき標準パター
ンとして音声特徴パターンを標準パターン辞書に登録す
る。ラベル名は、例えば、「行く」、「待つ」などの発
声した音声をそのままの名称であり、また、図２（ｄ）
に示したように、例えば「行く」という音声に対して
は、「行く」、「行かない」、「行きます」、「行け
ば」、また、「待つ」という音声に対しては、「待
つ」、「待たない」、「待てない」、「お待ち下さい」
などの複数のラベルを付ける。また、「ありがとう」と
いう音声に対して、「毎度ありがとうごさいます」など
の文章を登録するようにしてもよい。同様にして、
「は」、「で」、「ヘ」などの助詞についても登録す
る。同時に、標準パターン記憶部６では、記憶されたこ
れらのパターンに対し、部分集合名をラベル入力部３か
らの入力により付与する。例えば、図２（ｃ）や（ｄ）
に示したように、部分集合として「名詞群」を、「東
京」、「私」、「あなた」、などのパターンに登録し、
「動詞群」を「行く」、「待つ」などのパターンに登録
する。同様に、「は」、「で」、「へ」などのパターン
には「助詞群」の登録を行う。The operation of FIG. 1 will be described below. The operation is divided into a registration mode for registering a standard pattern used for recognition and a recognition mode for recognizing an unknown pattern. Each will be described below. In the case of the registration mode In the registration mode, the voice uttered by the speaker 11 is registered in the standard pattern storage unit 6. First, speaker 11
When the user designates a voice registration operation by operating the key input unit 5, the voice recognition device outputs a message "Please input voice" from the label output unit 4 on the screen or in the synthesized voice, and the voice of the speaker 11 is output. Prompt for input. Here, when the speaker 11 utters a word voice such as “I”, “Tokyo”, “go”, “you”, “wait”, “thank you”, the voice is captured from the voice input unit 1 and the voice is input. The analysis unit 2 detects a section and analyzes the signal in word units, and sends the signal to the standard pattern storage unit 6. When the speaker 11 inputs the label name of the voice from the label input unit 3, the standard pattern storage unit 6 registers the voice feature pattern as a labeled standard pattern in the standard pattern dictionary. The label name is, for example, the name of the uttered voice such as "go" or "wait" as it is.
As shown in, for example, "Go", "Don't go", "Go", "Go", and "Wait" are "Wait" , "I can't wait", "I can't wait", "Please wait"
With multiple labels such as. In addition, a sentence such as "Thank you every time" may be registered for the voice "Thank you." Similarly,
Register particles such as "ha", "de", and "he". At the same time, the standard pattern storage unit 6 gives a subset name to these stored patterns by input from the label input unit 3. For example, FIG. 2 (c) and (d)
As shown in, register the "noun group" as a subset in patterns such as "Tokyo", "I", "you",
Register "verbs" in patterns such as "go" and "wait". Similarly, "particle groups" are registered for patterns such as "ha", "de", and "he".

【００２０】また、この標準パターン辞書中の部分集合
名がキー入力部５からの何の数値（コード）の入力によ
り指定されるかを決定し、大語彙辞書に登録する。例え
ば、図２（ａ）に示したように、「助詞群」＝０１＃、
「名詞群」＝０２＃、「動詞群」＝０３＃とする。これ
らの登録はラベル入力部３とキー入力部５により行う。Further, it is determined which numerical value (code) from the key input unit 5 the subset name in the standard pattern dictionary is to be designated and is registered in the large vocabulary dictionary. For example, as shown in FIG. 2A, “particle group” = 01 #,
"Noun group" = 02 # and "verb group" = 03 #. These registrations are performed by the label input unit 3 and the key input unit 5.

【００２１】このようにして、標準パターン辞書と大語
彙辞書の標準パターン記憶部６への登録を行う。また、
キー入力部５からのＰＢトーン入力などによって句点、
読点、改行、取り消しなど音声認識を介さず直接文章を
制御するものについて、例えば「句点」＝９１＃、「読
点」＝９２＃、「改行」＝９９＃、「取り消し」＝９０
＃、等のように定める。認識モードの場合認識モードにおいては、入力された未知パターンと、標
準パターンの部分集合との認識を行った結果をもとに、
文章の作成を行う。例えば、「私は東京に行きます。」
という文章を音声入力によって作成するような場合は以
下のように行う。ここでは、単語「私」の部分の認識に
ついて示す。この音声認識の概念図を図３に示す。In this way, the standard pattern dictionary and the large vocabulary dictionary are registered in the standard pattern storage unit 6. Also,
PB tone input from the key input unit 5
For those that directly control a sentence without using voice recognition, such as reading point, line feed, and cancellation, for example, "punctuation mark" = 91 #, "reading mark" = 92 #, "line feed" = 99 #, "cancel" = 90
#, Etc. In the case of recognition mode In recognition mode, based on the result of recognition of the input unknown pattern and a subset of standard patterns,
Create sentences. For example, "I'm going to Tokyo."
If you want to create a sentence by voice input, do as follows. Here, recognition of the part of the word "I" will be shown. A conceptual diagram of this voice recognition is shown in FIG.

【００２２】最初に、発声者１１がキー入力部５の操作
により音声認識操作を指定すると、音声認識装置では、
ラベル出力部４より「ＰＢトーンを入力して下さい」な
どのメッセージを画面または合成音声で出力し、発声者
１１のＰＢ入力を促す。発声者１１は、キー入力部５に
おいて０２＃を入力し、部分集合として「名詞群」を指
定する。標準パターン選択部７では、キー入力部５より
０２＃が入力されると、まず、標準パターン記憶部６の
大語彙辞書より０２＃＝名詞群を得、次に、この「名詞
群」をキーに標準パターン辞書を検索して、「名詞群」
のフラグのついた標準パターンの部分集合を、そのラベ
ル名と対にして取り込む。ここで、再度ラベル出力部４
より「音声を入力して下さい」などのメッセージが表示
され、発声者１１の音声入力を促す。発声者１１が「わ
たし」などの音声を入力すると、音声は音声分析部２に
よって分析され、その結果が未知パターンとしてパター
ンマッチング部８へ送られる。パターンマッチング部８
では、先に標準パターン選択部７で選択された「名詞
群」の部分集合の複数の標準パターンと未知パターンと
の類似計算を行い、結果を認識出力部９へ送出する。認
識出力部９では、類似度が高い順に認識結果を並び変
え、その結果をラベル出力部４に画面または音声で出力
する。First, when the speaker 11 specifies a voice recognition operation by operating the key input unit 5, the voice recognition device
A message such as "Please input PB tone" is output from the label output unit 4 on the screen or in the synthesized voice to prompt the speaker 11 to input PB. The speaker 11 inputs 02 # in the key input unit 5 and designates a "noun group" as a subset. In the standard pattern selection unit 7, when 02 # is input from the key input unit 5, 02 # = noun group is first obtained from the large vocabulary dictionary of the standard pattern storage unit 6, and then this "noun group" is keyed. Search the standard pattern dictionary for "noun group"
Take in a subset of the standard pattern with the flag as a pair with its label name. Here, again the label output unit 4
A message such as "Please input voice" is displayed to prompt the speaker 11 to input voice. When the speaker 11 inputs a voice such as "I", the voice is analyzed by the voice analysis unit 2 and the result is sent to the pattern matching unit 8 as an unknown pattern. Pattern matching unit 8
Then, similar calculation is performed for a plurality of standard patterns of the subset of the "noun group" selected by the standard pattern selection unit 7 and unknown patterns, and the result is sent to the recognition output unit 9. The recognition output unit 9 rearranges the recognition results in descending order of similarity, and outputs the results to the label output unit 4 on a screen or by voice.

【００２３】このとき、認識結果として正当な結果が１
単語しかない場合あるいは１単語の結果しか要求されな
い場合には、認識出力部９の結果がそのまま確定する
が、複数の候補から結果を確定させる場合には、まず、
ラベル出力部４の画面または音声で第１位候補の出力を
行い、キー入力部５の入力待ち状態とする。ここで、発
声者がキー入力部５より「＃」（確定指示）を入力する
と、結果は確定し、「＊」（次候補指示）を入力すると
次候補の出力およびキー入力部５の入力待ち状態にな
り、「０」（音声入力指示）を入力すると再度音声入力
へ戻るようにする。なお、これらのキー入力と指示の対
応は一例にすぎず、組合せは任意である。このようにし
て、１単語ごとに音声で入力させ、キー入力部５のキー
入力による候補の確定を行い、文章を生成していく。At this time, the valid result is 1 as the recognition result.
When there are only words or when only one word result is requested, the result of the recognition output unit 9 is fixed as it is, but when the result is fixed from a plurality of candidates, first,
The first rank candidate is output on the screen or voice of the label output unit 4, and the key input unit 5 is set in the input waiting state. When the speaker inputs "#" (decision instruction) from the key input unit 5, the result is confirmed, and when "*" (indication of next candidate) is input, the next candidate is output and the input of the key input unit 5 is awaited. When the state is entered and "0" (voice input instruction) is input, the voice input is resumed. The correspondence between these key inputs and instructions is only an example, and the combination is arbitrary. In this way, each word is input by voice, the candidates are confirmed by the key input of the key input unit 5, and the sentence is generated.

【００２４】ここで、「動詞群」のように、１音声につ
いていくつかの語尾変化を持つ単語を登録したものにつ
いては、先に語幹部を確定させ、次の語尾変化を確定さ
せる。この音声認識の概念図を図４に示す。Here, for words such as "verbs" in which a word having some inflections is registered for one voice, the stem is first determined, and the next inflection is determined. A conceptual diagram of this voice recognition is shown in FIG.

【００２５】例えば「行く」という音声を入力して「行
きます」という単語を生成させる場合には、まず、キー
入力で０３＃を入力して「動詞群」を選択し、次に「行
く」を音声で入力し、認識結果の語幹の候補「行く」
「待つ」等のうちから、「＃」のキー入力で「行く」を
選択する。次に音声「行く」につけた複数のラベル名で
ある「行く」、「行かない」、「行きます」などの中か
ら「行きます」を選択させる。これも前記と同様にキー
入力による処理で行う。すなわち、キー入力部５の入力
待ち状態で「＃」を入力すると結果を確定し、「＊」を
入力すると次候補の表示およびキー入力部５の入力待ち
状態になり、「０」を入力すると再度音声入力へ戻るよ
うにする。このようにして、複数の認識候補があり、ま
た、それぞれの候補に対して複数のラベル名を持つ場合
でも、キー入力部５からの入力によって認識結果を確定
させることができる。For example, when inputting the voice "go" to generate the word "go", first enter 03 # by key input to select "verb group", and then "go" Is input by voice, and the candidate for the stem of the recognition result is “Go”.
From "Wait", etc., select "Go" by pressing the "#" key. Then, select "Go" from the multiple label names given to the voice "Go", such as "Go", "Do not go", and "Go". This is also performed by a key input process as described above. In other words, if you enter "#" in the input waiting state of the key input unit 5, the result is confirmed, if you enter "*", the next candidate is displayed and the key input unit 5 is in the input waiting state, and if you input "0". Try to return to voice input again. In this way, even if there are a plurality of recognition candidates and each candidate has a plurality of label names, the recognition result can be confirmed by the input from the key input unit 5.

【００２６】音声入力で生成した単語列は、さらに「句
点」、「読点」などをキー入力で補うことによって文章
として確定させ、最後に「改行」に相当するキーをキー
入力部５より入力することで、文章出力部１０より確定
された文章として出力される。The word string generated by voice input is finalized as a sentence by supplementing "punctuation marks", "reading marks" and the like with key input, and finally a key corresponding to "line feed" is input from the key input unit 5. As a result, the text output unit 10 outputs the text as a finalized text.

【００２７】以上、本発明の一実施例について説明した
が、図１の構成は端末装置とセンタ装置の組合せで実現
することも可能である。この場合、端末側は、電話機の
送受話器などを利用した音声入力部１、電話機のＰＢト
ーン出力装置などを利用したキー入力部５、及び、ディ
スプレイ装置などのラベル出力部４のみの構成とし、そ
れ以外のラベル入力部３、標準パターン記憶部６、記憶
パターン選択部７、パターンマッチング部８、認識出力
部９等はすべてセンタ側に置き、端末とセンタの間は通
信回線で結ばれる。Although one embodiment of the present invention has been described above, the configuration of FIG. 1 can also be realized by combining a terminal device and a center device. In this case, the terminal side has only a voice input unit 1 using a handset of a telephone, a key input unit 5 using a PB tone output device of a telephone, and a label output unit 4 such as a display device. The other label input unit 3, standard pattern storage unit 6, storage pattern selection unit 7, pattern matching unit 8, recognition output unit 9, etc. are all placed on the center side, and the terminals and the center are connected by a communication line.

【００２８】[0028]

【発明の効果】以上の説明から明らな如く、本発明の音
声認識方法及び装置では次のような効果が達成される。（1) 認識対象を部分集合に限定して認識させることで
高精度の大語彙音声認識を達成できる。（2) 複数の認識候補から結果を選択させることで第１
位の認識結果が誤っている場合でも認識候補に挙げられ
た場合には修正できる。（3) 一つの音声に対し複数のラベル名を持たせ、認識
結果に対し特定のラベル名を出力するように選択させる
ことで、小語彙の音声入力に対しても自由な文章を作る
ことができる。As is apparent from the above description, the following effects can be achieved by the voice recognition method and apparatus of the present invention. (1) High-accuracy large vocabulary speech recognition can be achieved by limiting recognition to a subset. (2) First by selecting a result from multiple recognition candidates
Even if the rank recognition result is incorrect, it can be corrected if it is listed as a recognition candidate. (3) By giving multiple label names to one voice and selecting to output a specific label name for the recognition result, it is possible to create a free sentence even for voice input of a small vocabulary. it can.

[Brief description of drawings]

【図１】本発明による音声認識装置の一実施例のブロッ
ク図である。FIG. 1 is a block diagram of an embodiment of a voice recognition device according to the present invention.

【図２】図１の標準パターン記憶部の詳細説明図であ
る。FIG. 2 is a detailed explanatory diagram of a standard pattern storage unit in FIG.

【図３】図１で単語「私」を認識させる場合の動作概念
図である。FIG. 3 is an operation conceptual diagram when the word “I” is recognized in FIG. 1.

【図４】図１で単語「行きます」を認識させる場合の動
作概念図である。FIG. 4 is an operation conceptual diagram when recognizing the word “go” in FIG. 1.

[Explanation of symbols]

１音声入力部２音声分析部３ラベル入力部４ラベル出力
部５キー入力部６標準パター
ン記憶部７標準パターン選択部８パターンマ
ッチング部９認識出力部１０文章出力部１１発声者1 voice input unit 2 voice analysis unit 3 label input unit 4 label output unit 5 key input unit 6 standard pattern storage unit 7 standard pattern selection unit 8 pattern matching unit 9 recognition output unit 10 sentence output unit 11 speaker

Claims

[Claims]

1. A recognition vocabulary is classified into a plurality of subsets, a standard pattern storage unit is added to a standard pattern with a flag indicating the subset together with its label name, and is registered in a standard pattern storage unit. By selecting the standard pattern and label name of the corresponding subset from the standard pattern storage unit, calculating the similarity between the standard pattern of the selected subset and the unknown voice pattern, and a standard with a high degree of similarity. A voice recognition method characterized by outputting a label name of a pattern.

2. The voice recognition method according to claim 1, wherein
A speech recognition method characterized by outputting the label names of a plurality of standard patterns judged to have a high degree of similarity and finally selecting one label name by a human code input.

3. The voice recognition method according to claim 1, wherein
A plurality of label names are given to one standard pattern and registered in the standard pattern storage unit. For standard patterns determined to have a high degree of similarity, the plurality of label names are output, and the A voice recognition method characterized in that one label name is finally selected by code input.

4. A recognized standard vocabulary is classified into a plurality of subsets, and a standard pattern storage unit that stores a standard pattern of voice with a label name and a flag indicating the subset is input, and a voice signal is input. A voice input unit, a signal analysis unit for detecting a voice section from a voice signal and extracting a voice characteristic pattern by signal analysis, a key input unit for outputting a code by a human key operation, and an input through the key input unit. A standard pattern selecting unit that selects and loads a standard pattern and label name of the corresponding subset from the standard pattern storage unit by a subset specifying code, and a characteristic pattern output from the signal analyzing unit and the standard pattern selecting unit The pattern matching unit that calculates the degree of similarity with the standard pattern that has been generated, and the calculation result of the pattern matching unit, A voice recognition device comprising a label name output unit that outputs a label name of a standard pattern that is determined to have a high degree of similarity to the signature pattern.