JPH06118989A

JPH06118989A - Continuous speech recognizing method

Info

Publication number: JPH06118989A
Application number: JP4287115A
Authority: JP
Inventors: Kazuya Takeda; 一哉武田; Shingo Kuroiwa; 眞吾黒岩; Seiichi Yamamoto; 誠一山本
Original assignee: Kokusai Denshin Denwa KK
Current assignee: KDDI Corp
Priority date: 1992-10-02
Filing date: 1992-10-02
Publication date: 1994-04-28

Abstract

PURPOSE:To recognize continuous vocalization including an unregistered word by recognizing a continuous speech which includes the unregistered word by using unregistered word patterns generated by using speeches other than regis tered words as learning data. CONSTITUTION:A sentence head segmenting process part 8 divides a sentence- vocalized input equally into three and segments the 1st part as a sentence head part, an in-sentence segmenting process part 9 segments the section other than the sentence head part and sentence tail part, and a sentence tail segmenting process part 10 divides the input equally into three and segments the final part as the sentence tail part. The results of the respective sectioning processes are sent to a standard pattern generating process part 3 as well as registered words to generate three kind of unregistered word patterns, which are sent to an unregistered word standard pattern storage part 11. The unregistered word standard pattern storage part 11 stores the three unregistered word standard patterns of the sentence head part, center part, and tail part. Therefore, a difference with the position such as possible pausing at the head of a sentence can be reflected.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声言語によるマン・
マシンインタフェイスにおける、連続に発声された音声
の認識方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention
The present invention relates to a method for recognizing continuously uttered voice in a machine interface.

【０００２】単に音声による入出力を可能にするだけで
なく、人間同様の対話を通じて複雑な情報を無駄なく伝
達しあう音声対話システムが、例えば計算機の高速化に
支えられた高度の音声モデル手法の導入により実用化さ
れようとしている。[0002] A voice dialogue system that not only enables input and output by voice but also transmits complex information through human-like dialogues without waste is an advanced voice model method supported by speeding up of computers, for example. It is about to be put to practical use by introducing it.

【０００３】かかるシステムの開発の過程において、利
用者の発話から十分な情報を抽出する音声入力方法が必
要である。In the process of developing such a system, a voice input method for extracting sufficient information from the user's utterance is required.

【０００４】[0004]

【従来の技術】図１は、標準パタンとしてマルコフモデ
ルを用い、照合処理として当該マルコフモデルから入力
音声が出力される確率値を用いる方法を用いて、構文拘
束として正規文法（有限状態オートマトン）を用いる方
法（「確立モデルによる音声認識」中川聖一著）を、ハ
ードウエアで実現する場合の構成図である。2. Description of the Related Art In FIG. 1, a Markov model is used as a standard pattern, and a probability value that an input speech is output from the Markov model is used as a matching process, and a regular grammar (finite state automaton) is used as a syntax constraint. It is a block diagram in the case of implementing the method ("Speech recognition by an established model" by Seiichi Nakagawa) by hardware.

【０００５】この方法は、与えられた文法拘束を満たす
任意の単語列毎に入力音声と標準パタンとの照合を行な
い、音声認識と構文解析を同時に進めることで実現され
る。This method is realized by matching the input voice with the standard pattern for each arbitrary word string satisfying the given grammatical constraint, and advancing the voice recognition and the syntactic analysis at the same time.

【０００６】そのためには、まず登録単語が文内で出現
しうる位置を有限状態オートマトンの形式で記述した文
法拘束を、文法拘束格納部２に格納する。For that purpose, first, the grammatical constraint storing unit 2 stores the grammatical constraint in which the position where a registered word can appear in a sentence is described in the form of a finite state automaton.

【０００７】次に、学習データとして、登録単語毎の標
準パタンを作成するために、登録単語に対応する入力音
声を音響分析処理部１において分析し、その結果1a（学
習時）を標準パタン作成処理部３へ送る。順次処理して
作成された標準パタン3aは標準パタン格納部５に格納さ
れる。Next, in order to create a standard pattern for each registered word as learning data, the input voice corresponding to the registered word is analyzed by the acoustic analysis processing unit 1, and the result 1a (during learning) is created as a standard pattern. Send to the processing unit 3. The standard pattern 3a created by sequential processing is stored in the standard pattern storage unit 5.

【０００８】単語標準パタン連結処理部４において、文
法拘束格納部２に格納された文法拘束に従い、単語標準
パタン格納部５に格納された単語標準パタンを連結し、
文標準パタンを作成する処理を行い、文標準パタン格納
部６に格納するために渡す。In the word standard pattern connection processing unit 4, the word standard patterns stored in the word standard pattern storage unit 5 are connected according to the grammatical constraints stored in the grammar constraint storage unit 2,
A process of creating a sentence standard pattern is performed, and the sentence standard pattern is stored in the sentence standard pattern storage unit 6.

【０００９】一方認識時として、利用者の認識対象の文
発声が入力された場合、音響分析の結果1c（認識時）は
パタン照合処理部７に送られ、文標準パタン格納部６に
格納された全ての文標準パタンと、パタン照合処理部７
において照合される。On the other hand, at the time of recognition, when a sentence utterance to be recognized by the user is input, the acoustic analysis result 1c (at the time of recognition) is sent to the pattern matching processing unit 7 and stored in the sentence standard pattern storage unit 6. All sentence standard patterns and pattern matching processing unit 7
Collated in.

【００１０】ここで照合距離は、標準パタンであるマル
コフモデルから音響分析結果が出力される確率の値を用
いる。照合距離とは、入力音声と標準パタンとの類似度
を表わす数値をいう。Here, as the matching distance, the value of the probability that the acoustic analysis result is output from the Markov model which is the standard pattern is used. The matching distance is a numerical value indicating the similarity between the input voice and the standard pattern.

【００１１】従って、最も良好な照合結果をあたえる文
標準パタンに対する単語列を認識結果として出力する。
これにより数千の大語彙を不特定話者の連続発声におい
て認識するシステムが可能となる。Therefore, the word string for the sentence standard pattern giving the best matching result is output as the recognition result.
This enables a system that recognizes thousands of large vocabulary words in a continuous utterance of an unspecified speaker.

【００１２】[0012]

【発明が解決しようとする課題】このような従来方法で
は登録単語の標準パタンと入力との照合により認識を行
なうため、利用者は登録単語のみを用いて発声する必要
がある。In such a conventional method, since the recognition is performed by collating the standard pattern of the registered word with the input, the user needs to speak using only the registered word.

【００１３】また音声入力中に言い淀みや言い誤りが起
きた場合、言い淀みや言い誤りと正しく照合される標準
パタンが存在しないことから、正しい認識結果を得るこ
とが困難になる。Further, when stagnation or erroneous words occur during voice input, it is difficult to obtain a correct recognition result because there is no standard pattern that is correctly collated with the stagnation and erroneous words.

【００１４】これら未登録語や言い誤りを含む音声入力
に対する連続音声認識装置の誤動作が、本発明が解決し
ようとする問題である。The malfunction of the continuous speech recognition apparatus for a speech input including these unregistered words and erroneous words is a problem to be solved by the present invention.

【００１５】[0015]

【問題を解決するための手段】本発明は、マルコフモデ
ルにより表現された単語標準パタンと入力音声とを文法
的拘束にしたがい連続的に照合することで、連続に発声
された音声を認識する連続音声認識手法において、連続
音声の中で出現する可能性のある登録外単語パタンの出
現位置を文法的拘束の中に定義し、登録単語以外の音声
を学習データに用いて作成された登録外単語パタンを用
いて登録されていない未登録単語を含む連続音声の認識
を行なうことを特徴とする。SUMMARY OF THE INVENTION The present invention continuously recognizes continuously uttered voices by continuously collating a word standard pattern represented by a Markov model with an input voice according to grammatical constraints. In the speech recognition method, unregistered words created by defining the appearance positions of unregistered word patterns that may appear in continuous speech in grammatical constraints and using speech other than registered words as learning data. The feature is that continuous speech including unregistered words that are not registered is recognized using a pattern.

【００１６】また、この認識方法において、登録外単語
パタンとして、文頭、文中、文末の３種類のパタンを作
成し、未登録単語と照合することを特徴とする。Further, in this recognition method, three types of patterns, that is, the beginning of a sentence, the middle of a sentence, and the end of a sentence, are created as unregistered word patterns, and the patterns are compared with unregistered words.

【００１７】[0017]

【実施例】図２は、登録単語以外の音声を学習データに
用いて作成された登録外単語パタンを用いて、登録単語
以外の音声を未登録単語として認識し、入力音声と照合
処理を行う、本発明の実施例を示す。FIG. 2 is a diagram showing a case in which a voice other than a registered word is recognized as an unregistered word by using an unregistered word pattern created by using a voice other than a registered word as learning data, and a matching process with an input voice is performed. An example of the present invention will be shown.

【００１８】なお、文法拘束格納部２、単語標準パタン
処理部４、標準パタン作成処理部３、単語標準パタン格
納部５、文標準パタン格納部６、パタン照合処理部７
は、従来の方法と同様の処理を実行する。A grammar constraint storage unit 2, a word standard pattern processing unit 4, a standard pattern creation processing unit 3, a word standard pattern storage unit 5, a sentence standard pattern storage unit 6, a pattern matching processing unit 7 are provided.
Performs the same processing as the conventional method.

【００１９】文頭切り出し処理部８、文中切り出し処理
部９、文末切り出し処理部１０および登録外単語標準パ
タン格納部１１が新規に付加された構成である。The sentence head cutout processing unit 8, the sentence cutout processing unit 9, the sentence end cutout processing unit 10, and the unregistered word standard pattern storage unit 11 are newly added.

【００２０】単語標準パタン作成用の音声入力の分析結
果1aに対しては、従来の方法と同様に登録単語毎に単語
標準パタンを作成し、単語標準パタン格納部に格納す
る。For the analysis result 1a of the voice input for creating the word standard pattern, a word standard pattern is created for each registered word and stored in the word standard pattern storage unit, as in the conventional method.

【００２１】次に登録外単語パタン学習時には、音声入
力の分析結果1bは、文頭切り出し処理部８、文中切り出
し処理部９、文末切り出し処理部１０により、文頭、文
中、文末の３つの区間に区分化さる。Next, at the time of learning the unregistered word pattern, the analysis result 1b of the voice input is divided into three sections, that is, the beginning of a sentence, the middle of a sentence, and the end of a sentence by the sentence head cutout processing unit 8, the sentence cutout processing unit 9, and the sentence end cutout processing unit 10. Be transformed.

【００２２】即ち文頭切り出し処理部８は、文発声され
た入力を３等分し、最初の３分の１を文頭部とし切り出
す処理を行う。文中切り出し処理部９は、文発声から文
頭部と文末部以外の区間を切り出す処理を行う。文末切
り出し処理部１０は、文発声から入力を３等分し、最後
の３分の１を文末部として切り出す処理を行う。That is, the sentence head cut-out processing unit 8 divides a sentence-uttered input into three equal parts, and cuts out the first one-third portion as the sentence head. The in-sentence cutout processing unit 9 performs a process of cutting out a section other than the sentence head and the sentence end from the sentence utterance. The sentence end cutout processing unit 10 divides the input from the sentence utterance into three equal parts, and cuts out the last one third as a sentence end part.

【００２３】各々の区分化処理の結果は登録単語の場合
と同様に標準パタン作成処理部３に送られ、３種類の登
録外単語パタンが作成され、登録外単語標準パタン格納
部１１に送られる。The results of each segmentation process are sent to the standard pattern creation processing unit 3 as in the case of registered words, three types of unregistered word patterns are created, and sent to the unregistered word standard pattern storage unit 11. .

【００２４】登録外単語標準パタン格納部１１は、文
頭、文中、文末の３つの登録外単語標準パタンを格納す
る。The unregistered word standard pattern storage unit 11 stores three unregistered word standard patterns of the beginning of a sentence, the middle of a sentence, and the end of a sentence.

【００２５】このように登録外単語パタンの学習用音声
を文頭、文中、文末の３種類に分け、各々に標準パタン
を作成することで、例えば文頭には「あのー」や「えー
っと」といった言い淀みが出現しやすいといった、位置
による違いを反映することが可能になる。As described above, the learning voice of the unregistered word pattern is divided into the three types of the beginning, the sentence, and the end of the sentence, and the standard pattern is created for each of them. For example, at the beginning of the sentence, "Ano" and "Eh" It is possible to reflect the difference depending on the position such as is likely to appear.

【００２６】単語標準パタンを連結するための文法拘束
の使用は従来例と同様であるが、本発明の実施にあたっ
ては、図３に示すとおり文法拘束中に未登録単語の出現
位置が示される。Although the use of the grammar constraint for connecting the word standard patterns is similar to the conventional example, in the practice of the present invention, the appearance position of the unregistered word is shown in the grammar constraint as shown in FIG.

【００２７】次に入力音声として「えーっと、右に、90
度回れよ。」という発声が入力された場合を例に、本実
施例と従来法との違いを説明する。Next, as the input voice, "Well, on the right, 90
Turn around. The difference between the present embodiment and the conventional method will be described by taking the case where the utterance "" is input as an example.

【００２８】従来法では「えーっと」「( 回れ) よ」に
対応する標準パターンが用意されていないため、誤った
認識結果が出力される可能性が高い。In the conventional method, since the standard patterns corresponding to "um" and "(turn)" are not prepared, there is a high possibility that an incorrect recognition result will be output.

【００２９】本実施例では、「えーっと」の部分が状態
０において、文頭の未登録単語パタンと高い確率で照合
される。In the present embodiment, in the state 0, the "um" portion is matched with the unregistered word pattern at the beginning of the sentence with high probability.

【００３０】また「( 回れ) よ」は状態９において、文
末の未登録単語パタンと高い確率で照合される。In the state 9, "(turn) yo" is matched with the unregistered word pattern at the end of the sentence with high probability.

【００３１】本実施例が示すとおり、本発明により発声
中に言い淀み等の未登録単語が挿入された連続音声を正
しく認識することが可能になる。As shown in this embodiment, the present invention makes it possible to correctly recognize a continuous voice in which an unregistered word such as stagnation is inserted during utterance.

【００３２】[0032]

【発明の効果】本発明により、単に登録された単語によ
る音声入力だけでなく、人間同様の未登録語を含む連続
発声を認識する音声認識装置が実現可能である。According to the present invention, it is possible to realize a voice recognition device which recognizes not only voice input using registered words but also continuous speech including unregistered words similar to human beings.

【００３３】また、認識率の向上が可能であるととも
に、使用者への発話の制限が少なくなり自由な発話を受
け付けることができ、利用範囲の拡大が可能となる。Further, it is possible to improve the recognition rate, limit the utterances to the user, and receive free utterances, and it is possible to expand the range of use.

[Brief description of drawings]

【図１】従来の方法をハードウエアで実現する場合を示
す構成図である。FIG. 1 is a configuration diagram showing a case where a conventional method is implemented by hardware.

【図２】本発明の一実施例をハードウエアで実現する場
合を示す構成図である。FIG. 2 is a configuration diagram showing a case where an embodiment of the present invention is implemented by hardware.

【図３】本発明の実施例で用いられる有限状態オートマ
トンで与えられる文法拘束を示す図である。FIG. 3 is a diagram showing grammatical constraints given by a finite state automaton used in an embodiment of the present invention.

[Explanation of symbols]

１音響分析処理部２文法拘束格納部３標準パタン作成処理部４単語標準パタン連結処理部５単語標準パタン格納部６文標準パタン格納部７パタン照合処理部８文頭切り出し処理部９文中切り出し処理部１０文末切り出し処理部１１登録単語外標準パタン格納部 1 Acoustic analysis processing section 2 Grammar constraint storage section 3 Standard pattern creation processing section 4 Word standard pattern concatenation processing section 5 Word standard pattern storage section 6 Sentence standard pattern storage section 7 Pattern matching processing section 8 Sentence extraction processing section 9 Sentence extraction processing section 10 Sentence cutout processing unit 11 Non-registered word standard pattern storage unit

Claims

[Claims]

1. A continuous speech recognition method for recognizing a continuously uttered speech by continuously matching a word standard pattern represented by a Markov model with an input speech according to grammatical constraints. The feature is that the position of the unregistered word pattern that may appear in the inside is defined in the grammatical constraint, and continuous speech including unregistered words that are not registered is recognized using the unregistered word pattern. And continuous speech recognition method.

2. The continuous speech recognition method according to claim 1, wherein three types of patterns, that is, the beginning of a sentence, the middle of a sentence, and the end of a sentence, are created as unregistered word patterns and the unregistered words are compared with each other. .