JPH02250097A

JPH02250097A - Speech recognition system

Info

Publication number: JPH02250097A
Application number: JP1070936A
Authority: JP
Inventors: Hiromi Shibuya; 渋谷　浩洋; Munekazu Maeda; 宗万前田; Yasutomo Onishi; 大西　康友
Original assignee: Matsushita Refrigeration Co
Current assignee: Panasonic Holdings Corp
Priority date: 1989-03-23
Filing date: 1989-03-23
Publication date: 1990-10-05

Abstract

PURPOSE:To prevent the recognition rate from decreasing by inhibiting a speech analyzing means from performing speech analytic processing unless a speaking person inputs a voicing start signal to the speech analyzing means by a voicing indication means. CONSTITUTION:When the speaking person voices a recognizable word, the start of voicing can be transmitted to a speech recognition system by the voicing indication means 9. Namely, when the switch 11 of the voicing indication means 9 is turned on, an 'H' signal (voicing start signal) is inputted to the speech analyzing means B10 and the speech analytic processing is performed. Further, when the switch 11 of the voicing indication means 9 is turned off, an 'L' signal is inputted to a speech analyzing means B10 and the speech analyzing processing is not performed. Thus, a period wherein the speaking person voices the recognizable word is determined to suppress the voicing of words other than the recognizable word and increase the recognition rate.

Description

【発明の詳細な説明】産業上の利用分野本発明は、特定話者及び不特定話者が入力した単語音声
を認識し、その音声内容によ多数々の処理を行うための
音声認識システムに関し、特に、不特定話者に対応した
自動販売機用の音声認識システムに関するものである。[Detailed Description of the Invention] Industrial Application Field The present invention relates to a speech recognition system for recognizing word speech input by specific speakers and unspecified speakers, and performing various processes based on the speech content. In particular, the present invention relates to a voice recognition system for vending machines that supports unspecified speakers.

従来の技術従来、カップ飲料等の自動販売機（以後単にカップ自販
機と称する）を始めとする自販機用音声認識システムは
、第４図に示すように、まず、利用者がマイクロホン１
によシ入カした音声を音声分析手段Ａ２にょシ分析して
音声パターンを抽出する。分析には帯域通過フィルタ群
を使ったＢＰＦ（Ｂａｎｄ　Ｐａｔｈ　Ｆｉｌｔｅｒ）
分析結果を時間軸と周波数軸で標本化し強度をディジタ
ル処理する手法を用いる。標準パターン記憶手段３には
、同様の手法によシ抽出した多数の不特定話者が発声し
た複数の離散単語の音声パターンを標準パターンとして
あらかじめ記憶しである。ただし、ここで標準パターン
として記憶されている単語はカップ自販機で販売ｆるフ
レーバー（コーヒー、ジューヌ等飲料の品名）の呼称と
いくつかの返答単語（″はい″　６いいえ”、′ホット
、″アイヌ”等）である。そして標準パターン選出手段
４で、標準パターンの中から入力音声パターンに最も近
い標準パターンをＤ　Ｐ　（Ｄｙｎａｍｉｃ　Ｐｒｏｇ
ｒａｍｍｉｎｇ）　マ、７チング法によシ選出し音声を
認識するものである。2. Description of the Related Art Conventionally, in a voice recognition system for vending machines such as vending machines for cup beverages (hereinafter simply referred to as cup vending machines), the user first presses the microphone 1, as shown in FIG.
The input voice is analyzed by the voice analysis means A2 and a voice pattern is extracted. BPF (Band Path Filter) using a group of band pass filters for analysis
A method is used to sample the analysis results on the time and frequency axes and digitally process the intensity. The standard pattern storage means 3 stores in advance, as standard patterns, audio patterns of a plurality of discrete words uttered by a large number of unspecified speakers, extracted using a similar method. However, the words stored as standard patterns here are the names of flavors sold in cup vending machines (product names of coffee, June, etc.) and some response words (``Yes'', ``6 No'', ``Hot'', ``Ainu'', etc.). ”, etc.). Then, the standard pattern selection means 4 selects the standard pattern closest to the input audio pattern from among the standard patterns as D P (Dynamic Prog.
ramming) is used to select and recognize speech using the 7-timing method.

ＤＰマツチング法とは動的計画法と訳され、１９６７年
に米国のＢｅｌｌｍａｎ　が提案した数理計画法の一手
法で、多段決定過程の最適化に適用される。その手法は
、各段で、ある決定（制御）を行って状態を変換させな
がら、目的の状態に達するまでの過程での制御の良さ／
悪さを評価する関数を最大／最小とするというものであ
る。又、音声認識システムが特定話者に対応する場合は
、標準パターン記憶手段３に特定話者が発声した認識単
語の音声パターンを登録し、一方、不特定話者に対応す
る場合は、不特定多数の話者が発声した認識単語の音声
パターンの代表パターンのいくつかを登録する。The DP matching method, which is translated as dynamic programming, is a method of mathematical programming proposed by Bellman in the United States in 1967, and is applied to the optimization of multi-stage decision processes. The method involves making a certain decision (control) at each stage to transform the state, and then evaluating the quality of the control in the process until the desired state is reached.
The function that evaluates the badness is maximized/minimized. In addition, when the speech recognition system is compatible with a specific speaker, the speech pattern of the recognized word uttered by the specific speaker is registered in the standard pattern storage means 3, and on the other hand, when the speech recognition system is compatible with an unspecified speaker, Some representative speech patterns of recognized words uttered by many speakers are registered.

発声誘導手段５は、音声合成手段によ多構成されし後述
する制御手段６の処理に応じて、利用者の発声を促すた
めに音声による誘導を行う。例えば、フレーバーの選択
時は、「いらっしゃいませ。The utterance guiding means 5 is composed of a voice synthesizing means, and performs voice guidance to encourage the user to utter according to the processing of the control means 6, which will be described later. For example, when choosing a flavor, say, ``Welcome.

何になさいますか。」と発声して、利用者にフレーバー
塩の発声を促す。ただし、フレーバー塩はカップ自販機
前面のパネル板等に明記してあシ、利用者はその中から
好みのフレーバー塩を１つ選んで発声するものである。What would you like. ” to prompt the user to say flavored salt. However, the flavored salts are clearly marked on the panel on the front of the cup vending machine, and the user must select one flavored salt of their choice and say it out loud.

制御手段θは、処理に応じて、発声誘導手段５に誘導音
声の発声を指示し、標準パターン選出手段４により選出
した標準パターンから利用者が発声した単語を認識する
とともに、認識結果によシ以後のカップ自販機の動作を
制御するものである。又、了はコインの受は取シと釣シ
４銭の払い戻しを行うコイン受は取シ手段、８は選択さ
れたフレーバーをカップに注ぎ搬出する飲料搬出手段で
ある。The control means θ instructs the utterance guidance means 5 to utter the guidance voice according to the process, recognizes the words uttered by the user from the standard pattern selected by the standard pattern selection means 4, and uses the recognition result to This controls the subsequent operation of the cup vending machine. Further, 8 is a coin receiving means for receiving coins and refunding 4 sen, and 8 is a beverage discharging means for pouring the selected flavor into a cup and carrying it out.

発明が解決しようとする課題しかしながら、上記のような方法では、発声者が認識可
能な単語を発声する期間が決められないため、例えば、
発声者は、フレーバー選択時に迷っているときに、「え
−と、（フレーバー塩）」と発声したシして、認識可能
な単語以外の単語を発声する可能性がある。このため、
正当に認識できずに認識率が低下してしまうという不具
合いが生じるという課題があった。本発明は上記従来の
課題を解決するもので、発声者が認識可能な単語を発声
する期間を決めることによシ、認識可能な単語以外の発
声を抑え、認識率を上げる音声認識システムを提供する
こを目的とする。Problems to be Solved by the Invention However, with the above method, the period during which the speaker utters recognizable words cannot be determined.
When the speaker is unsure about flavor selection, there is a possibility that the speaker may utter a word other than the recognizable word, such as ``Um, (flavor salt)''. For this reason,
There was a problem in that the recognition rate was reduced due to incorrect recognition. The present invention solves the above-mentioned conventional problems, and provides a speech recognition system that suppresses the utterance of words other than recognizable words and increases the recognition rate by determining the period during which the speaker utters recognizable words. The purpose is to do something.

課題を解決するための手段上記課題を解決するために本発明の音声認識システムは
、複数の離散単語音声の標準パターン群を記憶した標準
パターン記憶手段と、発声者の音声を分析し音声パター
ンを抽出する音声分析手段と、発声者が発声を開始した
ことを゛発声開始信号により前記音し音声パターンを抽
出する音声分析手段と、前記音声分析手段によシ抽出し
た音声パターンに最も近い標準パターンを前記標準パタ
ーン群から選出する標準パターン選出手段と、発声者に
単語を発声するように誘導する発声誘導手段とを備えた
ものである。Means for Solving the Problems In order to solve the above-mentioned problems, the speech recognition system of the present invention includes a standard pattern storage means that stores a group of standard patterns of a plurality of discrete word sounds, and a standard pattern storage means that analyzes the speech of a speaker and generates speech patterns. A voice analysis means for extracting a voice pattern, a voice analysis means for extracting a voice pattern by making a sound based on the voice start signal indicating that the speaker has started speaking, and a standard pattern closest to the voice pattern extracted by the voice analysis means. The present invention includes standard pattern selection means for selecting a word from the standard pattern group, and utterance guidance means for guiding a speaker to utter the word.

作　　用本発明は上記した構成により、発声者が認識可能な単語
を発声する時に、音声分析手段によって発声開始を音声
認識システムに伝えることを可能としている。Operation The present invention, with the above-described configuration, enables the speech analysis means to notify the speech recognition system of the start of speech when the speaker pronounces a recognizable word.

実施例以下本発明の一実施例の音声認識システムについて図面
を参照しながら説明する。本実施例は不特定話者に対す
る音声認識システムをカップ自販機に適用したものであ
る。ただし、構成要件中、従来例と同構成のものは同番
号を付し、説明を割愛する。Embodiment Hereinafter, a speech recognition system according to an embodiment of the present invention will be described with reference to the drawings. In this embodiment, a voice recognition system for unspecified speakers is applied to a cup vending machine. However, among the configuration requirements, those with the same configuration as the conventional example are given the same numbers and explanations are omitted.

第１図は、本発明の実施例における音声認識システムの
機能ブロック図を示すものである。９は音声分析手段で
あシ、発声者が認識可能な単語を発声する時に発生開始
信号を出力する。１０は音声分析手段Ｂであり、前記音
声分析手段Ｓよシ発生開始信号が入力されると、前記マ
イクロホン１によシ入力した音声を分析して音声パター
ンを抽出する。第２図は、本発明の実施例における前記
音声分析手段９０回路図を示すものである。１１はスイ
ッチ、１２は抵抗であシ、発声者が前記スイッチ１１を
ＯＮすると前記音声分析手段Ｂ１０に″Ｈ″信号が入力
される。FIG. 1 shows a functional block diagram of a speech recognition system in an embodiment of the present invention. Reference numeral 9 denotes a voice analysis means, which outputs a generation start signal when a speaker utters a recognizable word. Reference numeral 10 denotes a voice analysis means B, which analyzes the voice input to the microphone 1 and extracts a voice pattern when the voice generation start signal is input to the voice analysis means S. FIG. 2 shows a circuit diagram of the voice analysis means 90 in an embodiment of the present invention. 11 is a switch, and 12 is a resistor. When the speaker turns on the switch 11, an "H" signal is input to the voice analysis means B10.

表表は前記音声分析手段９と前記音声分析手段Ｂ１０の動
作関係表を示すものである。表に示すように、前記音声
分析手段９の前記スイッチ１１をＯＮすると前記音声分
析手段Ｂ１０にＨ”信号（発声開始信号）が入力され、
音声分析処理が行なわれる。また、前記音声分析手段９
の前記スイッチ１１をＯＦＦすると前記音声分析手段Ｂ
１０に”Ｌ”信号が入力され、音声分析処理は行なわれ
ない。The table shows an operation relationship table between the voice analysis means 9 and the voice analysis means B10. As shown in the table, when the switch 11 of the voice analysis means 9 is turned on, an H" signal (voice start signal) is input to the voice analysis means B10,
Voice analysis processing is performed. Further, the voice analysis means 9
When the switch 11 is turned off, the voice analysis means B
An "L" signal is input to 10, and no voice analysis processing is performed.

以上のように構成されたカップ自販機用音声認識システ
ムについて、第３図のフローチャートを用いてその販売
動作を説明する。The vending operation of the voice recognition system for a cup vending machine configured as described above will be explained using the flowchart shown in FIG.

第３図において、まずミステップ２０１で、コイン受は
取り手段７にコインが投入されたか否かを判定し、コイ
ンが投入されればステップ２０２へ進む。ステップ２０
２では、発声誘導手段５によシロいらっしゃいませ、何
になさいますか′″と誘導する。In FIG. 3, first in step 201, the coin receiver determines whether a coin has been inserted into the taking means 7, and if a coin has been inserted, the process proceeds to step 202. Step 20
In step 2, the voice guiding means 5 is used to guide the user by saying, ``Welcome, what would you like to say?''.

そして、ステップ２０３で前記発生告知手段９からの発
声告知信号がある゛まで待ち続ける。発声告知信号があ
ると、ステップ２０４で、音声パターン選出手段４によ
シ、前記標準パターン記憶手段３に記憶されている標準
パターンから、入力された音声パターンに最も近い標準
パターンを選出シテフレーバー名を認識する。ステップ
２０５では、ステップ２０４での認識結果が適当か否か
を判定シ、リジェクトの場合はステップ２０６へ進み、
発声誘導手段６によシ”もう−度お答え下さい”と誘導
してステップ２０３へ戻る。一方、リジェクトではない
場合はステップ２０７へ進む。Then, in step 203, the process continues to wait until there is an audible notification signal from the occurrence notification means 9. When there is a voice announcement signal, in step 204, the voice pattern selection means 4 selects a standard pattern closest to the input voice pattern from among the standard patterns stored in the standard pattern storage means 3 and selects the name of the voice pattern. Recognize. In step 205, it is determined whether the recognition result in step 204 is appropriate or not, and in the case of rejection, the process proceeds to step 206.
The voice guidance means 6 is guided to "Please answer again" and the process returns to step 203. On the other hand, if the request is not rejected, the process advances to step 207.

ステップ２０了では、ステップ２０４で認識したフレー
バーによシ以降の動作を分岐するものであるが、本実施
例においてはコーヒーを認識したものとし、他のフレー
バーを認識した場合の動作についてはコーヒーの場合と
同様であるため説明を割愛する。次に、ステップ２０８
では、発声認識手段６によシロコーヒーですね”と確認
し、客の返答を誘導する。そして、ステップ２０９で前
記発生告知手段９からの発声告知信号があるまで待ち続
ける。発声告知信号があると、ステップ２１０で、フレ
ーバー塩と同様の方法で、はいかいいえの返答を認識す
る。ステップ２１１では、ステップ２１０での認識結果
が適当か否かを判定し、リジェクトの場合はステップ２
０８へ戻シ、そうでない場合はステップ２１２へ進ム。At the completion of step 20, the operation after the flavor recognized at step 204 is branched, but in this embodiment, it is assumed that coffee has been recognized, and the operation when another flavor is recognized is that of coffee. Since this is the same as in the case, the explanation will be omitted. Next, step 208
Then, the voice recognition means 6 confirms, ``It's white coffee,'' and prompts the customer to respond.Then, in step 209, the wait is continued until there is a voice notification signal from the occurrence notification means 9.There is a voice notification signal. Then, in step 210, the answer of yes or no is recognized in the same way as for flavored salt.In step 211, it is determined whether the recognition result in step 210 is appropriate, and if it is rejected, step 2 is performed.
Return to step 08; otherwise, proceed to step 212.

ステップ２１２では、ステップ２１０で認識した返答が
はいの場合はステップ２１３へ進み、いいえの場合はス
テップ２０８へ戻る。ステップ２１３では、コーヒーを
飲料搬出手段８を使ってカップに注ぎ搬出する。そして
、ステップ２１４で、釣シ銭がある場合は、コイン受は
取り手段７によシ釣シ銭を払い戻し、最後に、ステップ
２１５で、発声誘導手段５によシ″あシがとうございま
した”と発声して一連の動作（販売）を終了する。In step 212, if the answer recognized in step 210 is yes, the process advances to step 213; if the answer is no, the process returns to step 208. In step 213, the coffee is poured into a cup and carried out using the beverage carrying means 8. Then, in step 214, if there is change, the coin receiver refunds the change to the taking means 7, and finally, in step 215, it sends a message to the voice guiding means 5 saying "Thank you very much." ” and complete the series of actions (sales).

以上のように本実施例によれば、本発明は発声者が音声
分析手段により発声開始信号を音声分析手段に入力しな
ければ、音声分析手段が音声分析処理を行なわないよう
に構成されているだめ、発声者は認識可能な単語を発声
するとき、にのみ音声分析手段により発声開始の合図を
してやることにより、例えばフレーバー選択時に迷って
いるときに、「え−と、（フレーバー塩）」と発声して
誤認識を招いたシすることが激減′し、認識率の低下を
防ぐことが可能となる。As described above, according to the present embodiment, the present invention is configured such that the voice analysis means does not perform voice analysis processing unless the speaker inputs a voice start signal to the voice analysis means using the voice analysis means. No, when the speaker utters a recognizable word, the voice analysis means is used to signal the start of the utterance, so that, for example, when he or she is unsure about choosing a flavor, he or she can say, "Um, (flavor salt)." The number of utterances that lead to erroneous recognition is drastically reduced, making it possible to prevent a drop in the recognition rate.

また、発声者自身が認識可能な単語を発声する期間を決
められるため、発声者は安心して自分の篤ベーヌで対話
を進めることができる。その上、発声者が急いでいると
きには、発声の順序が正しければ発声誘導を待つことな
くすばやく商品を購入できることも可能となるなどその
効果は犬である。In addition, since the speaker can decide the period during which he/she will utter recognizable words, the speaker can proceed with the dialogue at his/her own pace with peace of mind. Moreover, when the speaker is in a hurry, if the order of vocalizations is correct, the product can be quickly purchased without having to wait for the vocalization guidance, so the effect is great.

発明の効果以上のように本発明の音声認識システムは、複数の離散
単語音声の標準パターン群を記憶した標準パターン記憶
手段と、発声者の音声を分析し音声パターンを抽出する
音声分析手段と、発声者が発声を開始したことを発声開
始信号により前記音し音声パターンを抽出する音声分析
手段と、前記音声分析手段によシ抽出した音声パターン
に最も近い標準パターンを前記標準パターン群から選出
する標準パターン選出手段と、発声者に単語を発声する
ように誘導する発声誘導手段とを設けることによシ、発
声者が発声開始時期を決めることができるため、発声者
の認識可能な単語以外の単語を音声分析処理することが
激減し、認識率の低下を防ぐことが可能となる。Effects of the Invention As described above, the speech recognition system of the present invention includes: a standard pattern storage means that stores a group of standard patterns of a plurality of discrete word sounds; a speech analysis means that analyzes a speaker's speech and extracts a speech pattern; A voice analysis means for sounding the start of voice by a voice start signal and extracting a voice pattern, and selecting a standard pattern closest to the voice pattern extracted by the voice analysis means from the group of standard patterns. By providing a standard pattern selection means and a utterance guidance means for guiding the speaker to utter a word, the speaker can decide when to start uttering a word, so that words other than those that the speaker can recognize can be used. The number of words subjected to phonetic analysis processing is drastically reduced, making it possible to prevent a drop in the recognition rate.

[Brief explanation of drawings]

第１図は本発明の実施例における音声認識システムの機
能ブロック図、第２図は第１図の音声分析手段の回路図
、第３図は本発明の実施例における音声認識システムの
動作例を示すフローチャート、第４図は従来の音声認識
システムの機能ブロック図である。３・・・・・・標準パター涜憶手段、４・・・・・・標
準パターン選出手段、６・・・・・・発声誘導手段、９
・・団・音声分析手段、１０・・・・・・音声分析手段
Ｂ０代理人の氏名　弁理士　粟　野　重　孝　ほか１名
嬉図FIG. 1 is a functional block diagram of a speech recognition system according to an embodiment of the present invention, FIG. 2 is a circuit diagram of the speech analysis means of FIG. 1, and FIG. 3 is an example of the operation of the speech recognition system according to an embodiment of the present invention. The flowchart shown in FIG. 4 is a functional block diagram of a conventional speech recognition system. 3...Standard putter memory means, 4...Standard pattern selection means, 6...Voice guidance means, 9
...Group/Voice analysis means, 10...Voice analysis means B0 Name of agent Patent attorney Shigetaka Awano and 1 other person

Claims

[Claims]

a standard pattern storage means that stores a group of standard patterns of a plurality of discrete word sounds; a voice analysis means that analyzes the voice of a vocalization and extracts a voice pattern; a standard pattern selection means for selecting a standard pattern closest to the voice pattern extracted by the voice analysis means from the standard pattern group; and a voice guidance for guiding the speaker to pronounce the word. A voice recognition system consisting of means.