JP2000194386A

JP2000194386A - Voice recognizing and responsing device

Info

Publication number: JP2000194386A
Application number: JP10368099A
Authority: JP
Inventors: Shoji Kitagawa; 昇治北川; Hisataka Yamagishi; 久高山岸; Koji Omoto; 大本　　浩司; Hiroshi Nakajima; 宏中嶋
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 1998-12-24
Filing date: 1998-12-24
Publication date: 2000-07-14

Abstract

PROBLEM TO BE SOLVED: To make a voice recognizing and responsing device used by a user more easily and also to make the user perform dialogues which are more efficient and natural with respect to the device. SOLUTION: When voice information are judged to be one of character information by a recognized result judging device 14, the skill of the user is judged by a skill judging device 15 and also the environmental degree of the user is judged by an environmental degree judging device 16. Then, a voice guide selecting device 18 selects the class and the speed of the voice guide by referring to a voice guide dictionary 21, based on the skill judged by the skill judging device 15 and the environmental degree judged by the environmental degree judging device 16 and the controlled result of a dialogue controller 17.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は電話を利用した音声
認識応答装置及び方法に関し、特にユーザに利用しやす
く、かつ効率の良い自然な対話ができるようにした音声
認識応答装置及び方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition response apparatus and method using a telephone, and more particularly to a speech recognition response apparatus and method which can be used easily by a user and enable efficient natural conversation.

【０００２】[0002]

【従来の技術】近年、電話を利用して機械と対話しつつ
物品の購入をしたり、航空券の予約をする音声認識応答
装置が普及している。2. Description of the Related Art In recent years, a voice recognition response device for purchasing an article or making a reservation for an air ticket while interacting with a machine using a telephone has become widespread.

【０００３】ここで、この装置を利用して、例えば、航
空券の予約をする場合を説明する。一例として、出発地
が羽田、到着地が札幌、出発の年月日が１９９８年１２
月３１日午後５時３０分の航空券を予約する場合、以下
のようにして装置と対話する。Here, a case will be described in which, for example, a flight ticket is reserved using this apparatus. As an example, the departure place is Haneda, the arrival place is Sapporo, and the departure date is December 12, 1998.
When booking a flight ticket at 5:30 pm on the 31st of the month, interact with the device as follows.

【０００４】以下、（１）、（２）、（３）の３つの手
法がある。There are three methods (1), (2) and (3) below.

【０００５】なお、以下の説明では、システムプロンプ
トとしての発話（装置側の発話）は「」内に、それに対
するユーザの発話は『』内に示す。In the following description, the utterance as the system prompt (the utterance on the device side) is shown in "", and the utterance of the user corresponding thereto is shown in "".

【０００６】（１）固定的な装置の音声ガイドに従って
装置と一問一答し、入力項目に対してひとつずつ確認を
する。(1) A question and answer with the device is made in accordance with a voice guide of the fixed device, and each input item is confirmed one by one.

【０００７】例えば、図９（ａ）に示すごとく、「出発
地を言ってください」の質問に対して、『はねだ』と返
答し、「出発地ははねだ空港でよろしいですか」の確認
に対して、『はい』と確認し、「到着地を言ってくださ
い」の質問に対して、『さっぽろ』と返答し、「到着地
はさっぽろ空港でよろしいですか」の確認に対して、
『はい』と確認し、さらに、「出発の年月日を言ってく
ださい」の質問に対して、同様に、出発の年月日を返答
するというものである。For example, as shown in FIG. 9 (a), in response to the question "Please tell me the departure place", "Haneda" is replied, and "Is the departure place at Haneda Airport?" In response to the confirmation of `` Yes '', in response to the question `` Please say your destination '', reply `` Sapporo '' and in response to the confirmation of `` Is the destination you like at Sapporo Airport? '' ,
Confirm "yes" and respond to the question "Please tell me the date of departure" in the same way.

【０００８】（２）基本的には一問一答であるが、
“年”と“月”と“日”や“出発地”と“到着地”な
ど、入力項目をある程度まとめて入力及び確認する。(2) Basically, the answer is one by one.
The input items such as "year", "month" and "day" and "departure place" and "arrival place" are collectively input and confirmed to some extent.

【０００９】例えば、図９（ｂ）に示すごとく、「出発
地を言ってください」の質問に対して、『はねだ』と返
答し、「はねだ空港からどちらへ行かれますか」の質問
に対して、『さっぽろ』と返答し、「はねだ空港発、さ
っぽろ空港行きでよろしいですか」の２項目の確認に対
して、『はい』と確認する。For example, as shown in FIG. 9 (b), in response to the question "Please tell me the place of departure", "Haneda" is replied, and "Where do you go from Haneda Airport?""Sapporo" is answered in response to the question, and "Yes" is confirmed in response to the two items of "Is it OK to depart from Haneda Airport and go to Sapporo Airport?"

【００１０】（３）必要項目を一度に入力し，後でまと
めて確認する。(3) The necessary items are input at a time, and are collectively confirmed later.

【００１１】例えば、図９（ｃ）に示すごとく、「出発
地、到着地，出発の年月日及び時間を言ってください」
の質問に対して、『はねだからさっぽろ、１９９８年１
２月３１日午後３０分』と返答し、「はねだ空港発、さ
っぽろ空港行き、１９９８年１２月３１日午後３０分
で、よろしいですか」の確認に対して、『はい』と確認
する。For example, as shown in FIG. 9 (c), "Please state the departure place, destination, departure date and time."
Asked, "Hane So Sapporo, 1998 January
Reply "30 minutes afternoon on February 31" and confirm "Yes" in response to "Departure from Haneda Airport to Sapporo Airport, 30 minutes afternoon on December 31, 1998." .

【００１２】[0012]

【発明が解決しようとする課題】しかしながら、上記の
ごとき従来の音声認識応答装置では、装置の音声ガイド
や対話の流れがユーザやユーザの状況に関わらず固定的
であり、常に同じ音声、同じ調子、同じ音声ガイドで、
同じやり取りを同じ順序で繰り返す。However, in the above-described conventional voice recognition response device, the flow of voice guidance and dialogue of the device is fixed irrespective of the user and the situation of the user, and the same voice and the same tone are always present. , With the same audio guide,
Repeat the same exchange in the same order.

【００１３】従って、上記（１）、（２）、（３）の場
合、それぞれ以下の問題点があった。Accordingly, the above-mentioned cases (1), (2) and (3) have the following problems.

【００１４】（１）認識エラーが続くと何度も同じ対話
を繰り返さねばならない。例えば、図１０（ａ）に示す
ごとく、「出発地を言ってください」の質問に対して、
『はねだ』と返答し、「出発地ははかた空港でよろしい
ですか」の確認に対して、『いいえ』と返事し、「出発
地を言ってください」の質問に対して、『はねだです』
と返答し、「出発地ははこだて空港でよろしいですか」
の確認に対して、『ちがう』と返事し、「認識できませ
ん。お答えは『はい』『いいえ』でお願いします」とい
うように、同じやり取りを繰り返すことになる。(1) If a recognition error continues, the same dialogue must be repeated many times. For example, as shown in FIG.
Replying to "Haneda", replying "No" to the confirmation of "Is the departure place at the airport?" It is splashing
"Do you want to start at Hakodate Airport?"
In response to the confirmation, "No" is replied, and the same exchange is repeated, such as "I do not recognize. Please answer" Yes "" No "."

【００１５】また、同じ言葉を繰り返し発話させるとユ
ーザは『は・ね・だ』のように区切って発話したり、
『はーねーだ』のように伸ばすなどゆっくり発音する傾
向があり、そのことがかえって音声認識をさせにくく
し、結果として誤認識を繰り返させることになる。When the same word is repeatedly spoken, the user speaks in a delimited manner like "ha-ne-da",
They tend to pronounce slowly, such as being stretched out like "ha-ne-da", which makes it harder to make speech recognition, and as a result, causes incorrect recognition to be repeated.

【００１６】（２）一問一答の場合、ユーザの発話回数
が多くなり、時間が長くなる。例えば、図１０（ｂ）に
示すごとく、「出発地を言ってください」の質問に対し
て、『はねだ』と返答し、「はかた空港からどちらへ行
かれますか」の質問に対して、『……』となり、「認識
できません」、「はかた空港からどちらへ行かれます
か」の再度の質問となる。(2) In the case of one-by-one answer, the number of utterances of the user increases and the time becomes longer. For example, as shown in Fig. 10 (b), in response to the question "Please tell me the departure place", reply "Haneda" and ask the question "Where do you go from Hata Airport?" On the other hand, it becomes “……”, and the question is asked again, “Would you like to go from Hakata Airport”.

【００１７】また、例えば、図１１（ｃ）に示すごと
く、「出発地を言ってください」の質問に対して、『は
ねだ』と返答し、「はかた空港からどちらへ行かれます
か」の質問に対して、『さっぽろ』と返答し、「はかた
空港発、さっぽろ空港行きでよろしいですか」の質問に
対して『いいえ』と返答し、「出発地ははかた空港でよ
ろしいですか」の質問に対して『いいえ』と返答し、
「出発地を言ってください」という最初の質問に返る。For example, as shown in FIG. 11 (c), in response to the question "Please tell me your departure place", "Haneda" is replied, and "Where do you go from Hata Airport?" To the question "Do you want to depart from Hakata Airport to Sapporo Airport?" And reply "No." Are you sure? "
Return to the first question, "Please tell me where you started."

【００１８】上記のように、初心者でも分かりやすいよ
うに単語ごとに入力させ、入力ごとに『はい』、『いい
え』で確認させると、不慣れなユーザでも確実に必要項
目を入力することができるが、人間のオペレータ相手に
対話するのに比べて，非常に長い間機械を相手に不自然
な会話をしなければならず、ユーザに時間的・心理的負
担を強いることになる。As described above, even if it is made to input for each word so that even a beginner can understand easily, and it is confirmed with "Yes" or "No" for each input, an inexperienced user can surely input necessary items. However, compared to talking with a human operator, an unnatural conversation with the machine must be performed for a very long time, which imposes a time and psychological burden on the user.

【００１９】（３）一括入力および一括確認はかえって
所要時間を長くする。例えば、図１１（ｄ）に示すごと
く、「出発地，到着地、出発の年月日および時間を言っ
てください」の質問に対して、『はねだからさっぽろ、
１９９８年１２月３１日午後５時３０分』と返答し、
「はねだ空港発、さっぽろ空港行き、１９９９年１０月
３１日午後５時５０分でよろしいですか」の確認に対し
て、『いいえ』と返答し、「出発地は、はねだ空港でよ
ろしいですか」の質問に対して『はい』と返答し、「到
着地はさっぽろ空港でよろしいですか」の質問に対して
『はい』と返答し、「出発の年は１９９９年でよろしい
ですか」の質問に対して、『いいえ』と返答し、「出発
の年を言ってください」と質問が続くことになる。(3) Collective input and collective confirmation make the required time longer. For example, as shown in FIG. 11 (d), in response to the question "Please tell me the departure place, arrival place, departure date and time", "Haneda Sapporo.
5:30 pm on December 31, 1998 "
In response to the confirmation, "Is it okay to leave Haneda Airport for Sapporo Airport at 5:50 pm on October 31, 1999?" Are you sure you want to answer "Yes" to the question "Are you sure you want to arrive at the airport" and "Yes" to the question "Do you want to arrive at the airport in Sapporo?"",Answer" No, "followed by" Please tell us the year of departure. "

【００２０】このように、一括して入力・確認を行う場
合，誤認識時に誤認識個所の確認および訂正が必要であ
るが、このやり取りは非常に煩雑なものであり、ユーザ
に忍耐を強いることになる。また、初心者が上記のよう
な音声ガイドを聞き取り，言いよどみや言い間違いなく
発話できる可能性は非常に少なく、何を言ったらよいか
分からず，黙り込んでしまうことになることが多い。ユ
ーザに不安感を与えることになるとともに、所要時間を
長くする要因となる。As described above, when inputting and confirming in a lump, it is necessary to confirm and correct the erroneously recognized portion at the time of erroneous recognition. However, this exchange is very complicated and requires the patience of the user. become. In addition, it is extremely unlikely that a beginner can hear the above-mentioned audio guide and speak utterly without stagnation or mistake, and often becomes silent without knowing what to say. This gives the user a feeling of anxiety and also causes a longer required time.

【００２１】また上記（１）、（２）、（３）に共通の
問題点であるが、不慣れなユーザに合わせた音声ガイド
は慣れたユーザには冗長である。固定的な音声ガイド
は、システム側の発話のタイミングおよび速度でユーザ
が音声を聞き取りつつ理解する必要があり、画面に静的
表示される文字情報に比べてユーザに大きな負担を強い
る。また，文字情報のように読み飛ばすことができない
ため、初心者用の詳細ガイドを提示した場合、ユーザは
要不要にかかわらず同じ時間音声ガイドを聞く必要があ
り，慣れたユーザには非常な忍耐を強いることになる。Although there is a problem common to the above (1), (2), and (3), a voice guide adapted to an unfamiliar user is redundant for an accustomed user. The fixed voice guide requires the user to listen to and understand the voice at the timing and speed of the utterance on the system side, and imposes a large burden on the user as compared with character information statically displayed on the screen. Also, because it is not possible to skip over text information, if a detailed guide for beginners is presented, the user must listen to the audio guide for the same time regardless of whether it is necessary or not, and the experienced user is extremely patient. Will be forced.

【００２２】そこで、この発明は、ユーザが利用しやす
く、かつ効率の良い自然な対話ができるようにした音声
認識応答装置及び方法を提供することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a speech recognition response apparatus and a method which enable a user to easily and efficiently perform a natural conversation.

【００２３】[0023]

【課題を解決するための手段】上記目的を達成するため
に、請求項１の発明は、電話による相手側からの発話を
認識して応答する音声認識応答装置において、上記相手
側の発話状態を検出する発話状態検出手段と、上記発話
状態検出手段で検出された相手側の発話状態に基づいて
該相手側への発話を制御する発話制御手段と、を有する
ことを特徴とする。According to a first aspect of the present invention, there is provided a voice recognition / response apparatus for recognizing and responding to an utterance made by a caller by telephone. An utterance state detecting means for detecting, and utterance control means for controlling utterance to the other party based on the utterance state of the other party detected by the utterance state detecting means.

【００２４】また、請求項２の発明は、請求項１の発明
において、上記発話状態検出手段は、上記相手側の発話
の習熟度を検出する習熟度検出手段と、上記相手側の発
話の環境度を検出する環境度検出手段と、を有すること
を特徴とする。According to a second aspect of the present invention, in the first aspect of the present invention, the utterance state detecting means includes a proficiency detecting means for detecting the proficiency of the utterance of the other party, and an environment of the utterance of the other party. Environmental level detecting means for detecting the level.

【００２５】また、請求項３の発明は、請求項１の発明
において、上記発話制御手段は、発話内容と発話速度を
制御することを特徴とする。According to a third aspect of the present invention, in the first aspect of the present invention, the utterance control means controls an utterance content and an utterance speed.

【００２６】また、請求項４の発明は、請求項２の発明
において、上記習熟度検出手段は、装置側からの発話に
対して相手側が応答するまでの時間と、該相手側の応答
時間に基づいて検出されることを特徴とする。According to a fourth aspect of the present invention, in the second aspect of the present invention, the proficiency level detecting means determines a time required for the other party to respond to the utterance from the apparatus and a response time of the other party. It is characterized in that it is detected on the basis of:

【００２７】また、請求項５の発明は、請求項２の発明
において、上記環境度検出手段は、相手側の環境ノイズ
の強さと、相手側の発話の強さと、相手側の発話の強さ
と相手側の環境ノイズの強さとの差に基づいて環境度を
検出することを特徴とする。According to a fifth aspect of the present invention, in the second aspect of the present invention, the environmental level detecting means is configured to determine the strength of the environmental noise of the partner, the strength of the utterance of the partner, and the strength of the utterance of the partner. It is characterized in that the degree of environment is detected based on a difference from the strength of environmental noise of the other party.

【００２８】また、請求項６の発明は、電話による相手
側からの発話を認識して応答する音声認識応答装置にお
いて、上記相手側の発話を文字情報として認識する音声
認識手段と、上記認識された相手側の発話の習熟度を検
出する習熟度検出手段と、上記相手側の発話の環境度を
検出する環境度検出手段と、上記相手側の発話の習熟度
及び環境度に基づいて装置側からの発話内容を変更する
発話内容変更手段と、を有することを特徴とする。According to a sixth aspect of the present invention, there is provided a voice recognition and response device for recognizing and responding to an utterance from a partner by telephone, and a voice recognition means for recognizing the utterance of the other party as character information; Proficiency detecting means for detecting the proficiency of the utterance of the other party, environmental level detecting means for detecting the proficiency of the utterance of the other party, and the apparatus Utterance content changing means for changing the utterance content from the user.

【００２９】また、請求項７の発明は、請求項６の発明
において、上記音声認識手段は、複数の認識候補を抽出
する認識候補抽出手段と、上記認識候補抽出手段で抽出
された複数の認識候補中から１つの確定候補を抽出する
ためのルールと、を有し、複数の認識候補中から上記ル
ールを使用して１つの確定候補を抽出することを特徴と
する。According to a seventh aspect of the present invention, in the sixth aspect of the present invention, the speech recognition means includes a recognition candidate extracting means for extracting a plurality of recognition candidates, and a plurality of recognition candidates extracted by the recognition candidate extracting means. And a rule for extracting one fixed candidate from the candidates, wherein one fixed candidate is extracted from the plurality of recognition candidates using the rule.

【００３０】また、請求項８の発明は、電話による相手
側からの発話を認識して応答する音声認識応答方法にお
いて、上記相手側の発話状態を検出するステップと、上
記検出された相手側の発話状態に基づいて該相手側への
発話を制御するステップと、を有することを特徴とす
る。According to a further aspect of the present invention, there is provided a voice recognition response method for recognizing and responding to an utterance made by the other party by telephone, wherein the step of detecting the utterance state of the other party includes the steps of: Controlling the utterance to the other party based on the utterance state.

【００３１】また、請求項９の発明は、請求項８の発明
において、上記発話状態の検出は、上記相手側の発話の
習熟度を検出し、さらに上記相手側の発話の環境度を検
出することを特徴とする。According to a ninth aspect of the present invention, in the invention of the eighth aspect, the utterance state is detected by detecting a proficiency level of the utterance of the other party, and further detecting an environmental level of the utterance of the other party. It is characterized by the following.

【００３２】また、請求項１０の発明は、請求項８の発
明において、上記発話の制御は、発話内容と発話速度を
制御することを特徴とする。According to a tenth aspect of the present invention, in the eighth aspect of the present invention, the utterance control includes controlling utterance content and utterance speed.

【００３３】また、請求項１１の発明は、請求項９の発
明において、上記習熟度の検出は、装置側からの発話に
対して相手側が応答するまでの時間と、該相手側の応答
時間に基づいて検出されることを特徴とする。According to an eleventh aspect of the present invention, in the ninth aspect of the present invention, the detection of the proficiency level is based on a time until the other party responds to the utterance from the apparatus and a response time of the other party. It is characterized in that it is detected on the basis of:

【００３４】また、請求項１２の発明は、請求項９の発
明において、上記環境度の検出は、相手側の環境ノイズ
の強さと、相手側の発話の強さと、相手側の発話の強さ
と相手側の環境ノイズの強さとの差に基づいて環境度を
検出することを特徴とする。According to a twelfth aspect of the present invention, in the ninth aspect of the invention, the environmental level is detected by determining the strength of environmental noise of the other party, the strength of speech of the other party, and the strength of speech of the other party. It is characterized in that the degree of environment is detected based on a difference from the strength of environmental noise of the other party.

【００３５】また、請求項１３の発明は、電話による相
手側からの発話を認識して応答する音声認識応答方法に
おいて、上記相手側の発話を文字情報として認識するス
テップと、上記認識された相手側の発話の習熟度を検出
するステップと、上記相手側の発話の環境度を検出する
ステップと、上記相手側の発話の習熟度及び環境度に基
づいて装置側からの発話内容を変更するステップと、を
有することを特徴とする。According to a thirteenth aspect of the present invention, in the voice recognition response method for recognizing and responding to a speech from the other party by telephone, a step of recognizing the speech of the other party as character information, Detecting the proficiency of the utterance on the side, detecting the environmental level of the utterance of the partner, and changing the content of the utterance from the device based on the proficiency and environment of the utterance of the partner. And the following.

【００３６】また、請求項１４の発明は、請求項１３の
発明において、上記音声の認識は、複数の認識候補を抽
出するステップと、上記抽出された複数の認識候補中か
ら１つの確定候補を抽出するためのルールと、を有し、
複数の認識候補中から上記ルールを使用して１つの確定
候補を抽出することを特徴とする。According to a fourteenth aspect of the present invention, in the thirteenth aspect, the voice recognition includes the step of extracting a plurality of recognition candidates and the step of extracting one fixed candidate from the plurality of the extracted recognition candidates. And a rule for extracting
One fixed candidate is extracted from the plurality of recognition candidates using the rule.

【００３７】また、請求項１５の発明は、電話による相
手側からの発話を認識して応答する音声認識応答装置に
おいて、上記相手側の発話内容を認識する発話内容認識
手段と、上記発話内容認識手段で認識された相手側の発
話内容を確認する発話内容確認手段と、上記発話内容確
認手段の確認結果に基づいて相手側への発話内容を変更
する発話内容変更手段と、を有することを特徴とする。The invention according to claim 15 is a speech recognition response device which recognizes and responds to a speech from the other party by telephone, wherein the speech content recognition means for recognizing the speech content of the other party, and the speech content recognition means. Utterance content confirmation means for confirming the utterance content of the other party recognized by the means, and utterance content change means for changing the utterance content to the other party based on the confirmation result of the utterance content confirmation means. And

【００３８】また、請求項１６の発明は、請求項１５の
発明において、上記発話内容確認手段は、複数の認識対
象候補を検出する認識対象候補検出手段と、上記認識対
象が含まれる関連認識対象を獲得する関連認識対象獲得
手段と、を有し、上記発話内容認識手段によって相手側
の発話内容を確認できなかった場合、上記認識対象候補
検出手段で検出された複数の認識対象候補と上記関連認
識対象獲得手段で獲得された上記認識対象が含まれる関
連認識対象との整合をとって相手側の発話内容を特定す
ることを特徴とする。According to a sixteenth aspect of the present invention, in the fifteenth aspect, the utterance content confirmation means includes a recognition target candidate detecting means for detecting a plurality of recognition target candidates, and a related recognition target including the recognition target. And a related recognition target obtaining means for obtaining the recognition target candidate detected by the recognition target candidate detection means when the utterance content of the other party cannot be confirmed by the utterance content recognition means. The utterance content of the other party is specified by matching with the related recognition target including the recognition target acquired by the recognition target acquisition unit.

【００３９】また、請求項１７の発明は、請求項１５の
発明において、上記発話内容確認手段は、相手側の発話
に７を示す「しち」の発音がある場合、「なな」に置き
換えて確認することを特徴とする。According to a seventeenth aspect of the present invention, in the invention of the fifteenth aspect, the utterance content confirming means replaces the utterance of the other party with “Nana” when the utterance of “7” is pronounced as “7”. And confirm it.

【００４０】また、請求項１８の発明は、請求項１５の
発明において、上記発話内容確認手段は、相手側の発話
を記憶する発話記憶手段を有し、所定回数以上相手側の
発話内容を確認できない場合は、確認を求める発話内容
に上記記憶された相手側の発話を含めることを特徴とす
る。According to an eighteenth aspect of the present invention, in the invention of the fifteenth aspect, the utterance content confirmation means has utterance storage means for storing the utterance of the other party, and confirms the utterance content of the other party at least a predetermined number of times. If it is not possible, the utterance content to be confirmed includes the utterance of the other party stored above.

【００４１】また、請求項１９の発明は、電話による相
手側からの発話に応答して数字を含む音声を発話する音
声認識応答装置において、０から９までの１桁の音声プ
ロンプトを記憶する１桁音声プロンプト記憶手段と、０
０から９９までの２桁の音声プロンプトを記憶する２桁
音声プロンプト記憶手段と、を有し、偶数の数字は上記
２桁音声プロンプト記憶手段に記憶された２桁の音声プ
ロンプトだけを用いて音声発話を作成し、奇数の数字は
上記１桁音声プロンプト記憶手段に記憶された１桁の音
声プロンプトを１つだけ用いて音声発話を作成すること
を特徴とする。According to a nineteenth aspect of the present invention, in a voice recognition responding apparatus that utters a voice including a number in response to a voice uttered by a partner, 1 digit voice prompt from 0 to 9 is stored. Digit voice prompt storage means, 0
A two-digit voice prompt storage means for storing two-digit voice prompts from 0 to 99, wherein even numbers are voiced using only the two-digit voice prompts stored in the two-digit voice prompt storage means. An utterance is generated, and an odd number is generated by using only one one-digit voice prompt stored in the one-digit voice prompt storage means.

【００４２】また、請求項２０の発明は、電話の相手か
らの発話に基づいて複数の項目中から相手側の発話項目
を抽出する音声認識応答装置において、上記複数の項目
を登録する第１の登録手段と、上記第１の登録手段で登
録された複数の項目を特定の属性で関連性のあるものご
とに分割して登録する第２の登録手段と、上記関連性を
抽出する関連性抽出手段と、を有し、上記第１の登録手
段で上記相手側の発話項目が抽出できない場合、上記関
連性抽出手段で上記発話項目が含まれる特定の属性での
関連性を抽出し、この特定の属性での関連性に基づいて
第２の登録手段中から相手側の発話項目を抽出すること
を特徴とする。According to a twentieth aspect of the present invention, there is provided a voice recognition response apparatus for extracting a speech item of a caller from a plurality of items based on speech from a caller. Registering means, second registering means for dividing and registering a plurality of items registered by the first registering means for specific attributes having relevance, and relevance extraction for extracting the relevance Means, and when the utterance item of the other party cannot be extracted by the first registration means, the relevance extraction means extracts a relevance in a specific attribute including the utterance item, and The utterance item of the other party is extracted from the second registration means based on the relevance of the attribute of the other party.

【００４３】また、請求項２１の発明は、電話による相
手側からの発話を認識して応答する音声認識応答装置に
おいて、上記相手側の発話から相手側の音量を検出する
音量検出手段と、上記音量検出手段で検出された相手側
の音量に基づいて上記応答の音量を制御する応答音量制
御手段と、を有することを特徴とする。Also, the invention of claim 21 is a voice recognition response device which recognizes and responds to a speech from the other party by telephone, wherein the volume detection means for detecting the volume of the other party from the speech of the other party; Response volume control means for controlling the volume of the response based on the volume of the other party detected by the volume detection means.

【００４４】また、請求項２２の発明は、請求項２１の
発明において、上記応答音量制御手段は、相手側の音量
が大きいほど応答音量を大きくし、相手側の音量が小さ
いほど応答音量を小さくすることを特徴とする。According to a twenty-second aspect of the present invention, in the invention of the twenty-first aspect, the response volume control means increases the response volume as the volume of the partner increases, and decreases the response volume as the volume of the partner decreases. It is characterized by doing.

【００４５】また、請求項２３の発明は、電話による相
手側からの発話を認識して応答する音声認識応答方法に
おいて、上記相手側の発話内容を認識するステップと、
上記認識された相手側の発話内容を確認するステップ
と、上記確認結果に基づいて相手側への発話内容を変更
するステップと、を有することを特徴とする。Also, the invention according to claim 23 is a voice recognition response method for recognizing and responding to a speech from the other party by telephone, wherein the step of recognizing the speech content of the other party includes:
The method further comprises the steps of: confirming the uttered content of the recognized other party; and changing the uttered content to the other party based on the confirmation result.

【００４６】また、請求項２４の発明は、請求項２３の
発明において、上記発話内容の確認は、複数の認識対象
候補を検出するステップと、上記認識対象が含まれる関
連認識対象を獲得するステップと、を有し、上記相手側
の発話内容を確認できなかった場合、上記認識対象候補
を検出するステップで検出された複数の認識対象候補と
上記関連認識対象を獲得するステップで獲得された上記
認識対象が含まれる関連認識対象との整合をとって相手
側の発話内容を特定することを特徴とする。According to a twenty-fourth aspect of the present invention, in the twenty-third aspect of the present invention, the confirmation of the utterance content includes a step of detecting a plurality of recognition target candidates and a step of acquiring a related recognition target including the recognition target. And, if the utterance content of the other party could not be confirmed, the plurality of recognition target candidates detected in the step of detecting the recognition target candidate and the above-mentioned obtained in the step of obtaining the related recognition target The utterance content of the other party is specified by matching with the related recognition target including the recognition target.

【００４７】また、請求項２５の発明は、請求項２３の
発明において、上記発話内容を確認するステップは、相
手側の発話に７を示す「しち」の発音がある場合、「な
な」に置き換えて確認することを特徴とする。According to a twenty-fifth aspect of the present invention, in the twenty-third aspect of the present invention, the step of confirming the utterance content is performed in a case where the utterance of the other party has a pronunciation of “Shi” indicating “7”. It is characterized in that it is replaced with and confirmed.

【００４８】また、請求項２６の発明は、請求項２３の
発明において、上記発話内容を確認するステップは、相
手側の発話を記憶し、所定回数以上相手側の発話内容を
確認できない場合は、確認を求める発話内容に上記記憶
された相手側の発話を含めることを特徴とする。According to a twenty-sixth aspect of the present invention, in the invention of the twenty-third aspect, the step of confirming the utterance content stores the utterance of the other party, and if the utterance content of the other party cannot be confirmed more than a predetermined number of times, The utterance content to be confirmed includes the utterance of the other party stored above.

【００４９】また、請求項２７の発明は、電話による相
手側からの発話に応答して数字を含む音声を発話する音
声認識応答方法において、０から９までの１桁の音声プ
ロンプトを記憶する１桁音声プロンプト記憶手段と、０
０から９９までの２桁の音声プロンプトを記憶する２桁
音声プロンプト記憶手段と、を有し、偶数の数字は上記
２桁音声プロンプト記憶手段に記憶された２桁の音声プ
ロンプトだけを用いて音声発話を作成し、奇数の数字は
上記１桁音声プロンプト記憶手段に記憶された１桁の音
声プロンプトを１つだけ用いて音声発話を作成すること
を特徴とする。According to a twenty-seventh aspect of the present invention, in the voice recognition response method for uttering a voice including a number in response to a voice uttered by the other party, a one-digit voice prompt from 0 to 9 is stored. Digit voice prompt storage means, 0
A two-digit voice prompt storage means for storing two-digit voice prompts from 0 to 99, wherein even numbers are voiced using only the two-digit voice prompts stored in the two-digit voice prompt storage means. An utterance is generated, and an odd number is generated by using only one one-digit voice prompt stored in the one-digit voice prompt storage means.

【００５０】また、請求項２８の発明は、電話の相手か
らの発話に基づいて複数の項目中から相手側の発話項目
を抽出する音声認識応答方法において、上記複数の項目
を登録する第１の登録手段と、上記第１の登録ステップ
で登録された複数の項目を特定の属性で関連性のあるも
のごとに分割して登録する第２の登録手段と、上記関連
性を抽出する関連性抽出ステップと、を有し、上記第１
の登録手段で上記相手側の発話項目が抽出できない場
合、上記関連性抽出ステップで上記発話項目が含まれる
特定の属性での関連性を抽出し、この特定の属性での関
連性に基づいて第２の登録手段中から相手側の発話項目
を抽出することを特徴とする。According to a twenty-eighth aspect of the present invention, in the voice recognition response method for extracting an utterance item of the other party from a plurality of items based on an utterance from the other party of the telephone, the first of the plurality of items is registered. Registration means, second registration means for dividing and registering a plurality of items registered in the first registration step for each of specific attributes having relevance, and relevance extraction for extracting the relevance And the first step.
If the utterance item of the other party cannot be extracted by the registration means, the relevance extraction step extracts the relevance at a specific attribute including the utterance item, and based on the relevance at this specific attribute, The utterance item of the other party is extracted from the second registration means.

【００５１】また、請求項２９の発明は、電話による相
手側からの発話を認識して応答する音声認識応答方法に
おいて、上記相手側の発話から相手側の音量を検出し、
上記検出された相手側の音量に基づいて上記応答の音量
を制御することを特徴とする。According to a twenty-ninth aspect of the present invention, in the voice recognition response method for recognizing and responding to a speech from the other party by telephone, the volume of the other party is detected from the speech of the other party,
The volume of the response is controlled based on the detected volume of the other party.

【００５２】また、請求項３０の発明は、請求項２９の
発明において、上記応答の音量の制御は、相手側の音量
が大きいほど応答音量を大きくし、相手側の音量が小さ
いほど応答音量を小さくすることを特徴とする。According to a thirtieth aspect of the present invention, in the invention of the twenty-ninth aspect, the response volume is controlled such that the response volume increases as the volume of the other party increases, and the response volume increases as the volume of the other party decreases. It is characterized in that it is made smaller.

【００５３】また、請求項３１の発明は、電話による相
手側からの発話を認識して確認の応答をする音声認識応
答装置において、上記相手側の発話を確認項目ごとに分
割して確認プロンプトを作成する確認プロンプト作成手
段と、上記確認プロンプト中の確認項目ごとの発話時間
を設定する発話時間設定手段と、上記発話時間設定手段
で設定された発話時間中に相手側からの入力があった場
合、該入力された相手側の発話内容を検出する発話内容
検出手段と、上記入力された相手側の発話の入力時点を
検出する入力時点検出手段と、を有し、上記入力時点検
出手段で検出された相手側の発話の入力時点と、上記発
話内容検出手段で検出された相手側の発話内容に基づい
て確認の応答をすることを特徴とする。According to a thirty-first aspect of the present invention, in the voice recognition response device which recognizes a speech from the other party by telephone and responds to the confirmation, the speech of the other party is divided for each confirmation item and a confirmation prompt is issued. A confirmation prompt creating means to be created, an utterance time setting means for setting an utterance time for each confirmation item in the above confirmation prompt, and an input from the other party during the utterance time set by the utterance time setting means Utterance content detecting means for detecting the input utterance content of the other party, and input time detecting means for detecting the input time of the input utterance of the other party. A response of confirmation is made based on the input time of the uttered speech of the other party and the utterance content of the other party detected by the utterance content detecting means.

【００５４】また、請求項３２の発明は、請求項３１の
発明において、上記入力時点検出手段で検出された相手
側の発話の入力時点と、上記発話内容検出手段で検出さ
れた相手側の発話内容に基づいて確認プロンプトの内容
を変更する確認プロンプト変更手段を有することを特徴
とする。According to a thirty-second aspect of the present invention, in the thirty-first aspect, the input time of the utterance of the other party detected by the input time detecting means and the utterance of the other party detected by the utterance content detecting means It has a confirmation prompt change means for changing the contents of the confirmation prompt based on the contents.

【００５５】また、請求項３３の発明は、電話による相
手側からの発話を認識して確認の応答をする音声認識応
答方法において、上記相手側の発話を確認項目ごとに分
割して確認プロンプトを作成するステップと、上記確認
プロンプト中の確認項目ごとの発話時間を設定するステ
ップと、上記発話時間を設定するステップで設定された
発話時間中に相手側からの入力があった場合、該入力さ
れた相手側の発話内容を検出するステップと、上記入力
された相手側の発話の入力時点を検出するステップと、
を有し、上記入力時点を検出ステップで検出された相手
側の発話の入力時点と、上記発話内容を検出するステッ
プで検出された相手側の発話内容に基づいて確認の応答
をすることを特徴とする。The invention according to claim 33 is a voice recognition response method for recognizing a speech by recognizing a speech from the other party by telephone, wherein the speech of the other party is divided for each confirmation item and a confirmation prompt is issued. Creating, setting an utterance time for each confirmation item in the confirmation prompt, and, if there is an input from the other party during the utterance time set in the utterance time setting step, the input is made. Detecting the content of the utterance of the other party, and detecting the input time point of the input utterance of the other party,
And responding to the confirmation based on the input time point of the utterance of the other party detected in the detecting step of the input time point and the utterance content of the other party detected in the step of detecting the utterance content. And

【００５６】また、請求項３４の発明は、請求項３３の
発明において、上記入力時点を検出ステップで検出され
た相手側の発話の入力時点と、上記発話内容を検出する
ステップで検出された相手側の発話内容に基づいて確認
プロンプトの内容を変更することを特徴とする。According to a thirty-fourth aspect of the present invention, in the thirty-third aspect, the input time point of the utterance of the other party detected in the step of detecting the input time point and the other party detected in the step of detecting the utterance content The content of the confirmation prompt is changed based on the content of the utterance on the side.

【００５７】[0057]

【発明の実施の形態】以下、本発明に係わる音声認識応
答装置の第１の実施の形態について図面を参照して詳細
に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a first embodiment of a voice recognition response device according to the present invention will be described in detail with reference to the drawings.

【００５８】なお、以下の説明でも、本実施形態が電話
による航空券予約システムに適用された場合について説
明する。本システムは，出発地の空港名，目的地の空港
名，出発の年月日および時間を電話による音声認識によ
り獲得し、該当する航空券の予約を行うものである。な
お、実際のシステムでは、枚数、クラス情報、支払方
法、セキュリティなど種々の機能が必要であるが、これ
らの機能は本実施形態の目的とは直接関係ないので、以
下の説明では省略する。In the following description, a case will be described in which the present embodiment is applied to an airline ticket reservation system by telephone. In this system, the name of the airport at the point of departure, the name of the airport at the destination, the date and time of departure are obtained by voice recognition over the telephone, and the corresponding airline ticket is reserved. In an actual system, various functions such as the number of sheets, class information, a payment method, and security are necessary. However, since these functions are not directly related to the purpose of the present embodiment, they will not be described below.

【００５９】図１は、本実施形態に係わる音声認識応答
装置の全体構成を示す概略ブロック図である。FIG. 1 is a schematic block diagram showing the overall configuration of the voice recognition and response device according to the present embodiment.

【００６０】図１において、音声認識応答装置１０は、
音声入力装置１１、音声波形特徴抽出装置１２、音声認
識装置１３、認識結果判定装置１４、習熟度判定装置１
５、環境度判定装置１６、対話制御装置１７、音声ガイ
ド選択装置１８、波形特徴分析装置１９、音声出力装置
２０、音声ガイド辞書２１、認識辞書２２、波形特徴デ
ータベース（波形特徴ＤＢ）２３、フレーム・意味辞書
２４、認識結果データベース（認識結果ＤＢ）２５、対
話履歴データベース（対話履歴ＤＢ）２６より構成され
ている。In FIG. 1, the voice recognition response device 10
Voice input device 11, voice waveform feature extraction device 12, voice recognition device 13, recognition result determination device 14, proficiency level determination device 1
5, environment degree determination device 16, dialogue control device 17, voice guide selection device 18, waveform feature analysis device 19, voice output device 20, voice guide dictionary 21, recognition dictionary 22, waveform feature database (waveform feature DB) 23, frame A semantic dictionary 24, a recognition result database (recognition result DB) 25, and a dialog history database (dialog history DB) 26;

【００６１】ここで、音声入力装置１１は通話相手の音
声が入力されるもので、電話受話器のマイク等より構成
され、入力された音声情報を電気的な音声波形信号に変
換する。Here, the voice input device 11, to which the voice of the other party is input, is constituted by a microphone of a telephone receiver or the like, and converts the input voice information into an electrical voice waveform signal.

【００６２】音声波形特徴抽出装置１２は音声入力装置
１１で得られた音声波形信号からその音声波形を解析
し、周波数や信号の強さおよび変異から、発話の語数と
発話速度、発話速度の変異、発話の強さ、発話の強さの
変異，発話可能時から実際の発話時までの遅れを求め
る。The speech waveform feature extraction device 12 analyzes the speech waveform from the speech waveform signal obtained by the speech input device 11 and, based on the frequency and the strength of the signal and the variation, changes the number of words of the utterance, the utterance speed, and the utterance speed. Then, the utterance strength, the variation of the utterance strength, and the delay from when the utterance is possible to when the utterance is actually performed are obtained.

【００６３】音声認識装置１３は、音声入力装置１１で
得られた音声波形信号から通話相手の音声発話を文字情
報で求める。認識結果は１つではなく、認識の確からし
さを示す確信度（Ｎ−Ｂｅｓｔ）を付加した形で、いく
つかの認識結果候補として求められる。認識結果は認識
結果データベース（認識結果ＤＢ）２５に保存されると
ともに、認識結果の確定は認識結果判定装置１４で行わ
れる。なお、音声認識処理自体はすでにさまざまな方法
が知られているので、その詳細説明はしない。The voice recognition device 13 obtains the voice utterance of the other party from the voice waveform signal obtained by the voice input device 11 using character information. The recognition result is not one, but is obtained as some recognition result candidates in a form to which a certainty factor (N-Best) indicating the certainty of the recognition is added. The recognition result is stored in a recognition result database (recognition result DB) 25, and the recognition result is determined by the recognition result determination device 14. Since various methods are already known for the voice recognition processing itself, detailed description thereof will not be given.

【００６４】認識結果判定装置１４は、音声認識装置１
３で求められた認識結果候補のなかから、認識辞書２２
およびフレーム・意味辞書２４を用いて認識結果を確定
させる。なお、この確定の処理方法は図２および図３を
参照して後に詳述する。The recognition result judging device 14 includes the speech recognition device 1
3 from the recognition result candidates obtained in step 3, the recognition dictionary 22
Then, the recognition result is determined using the frame / semantic dictionary 24. Incidentally, the processing method of this determination will be described later in detail with reference to FIGS.

【００６５】習熟度判定装置１５は、ユーザの本システ
ムへの習熟度を判定するもので、低・中・高の３段階で
判定し、習熟度によって、対話の流れ、システム発話内
容、発話速度を調整する。なお、この習熟度の判定方法
は、図４を参照して後に詳述する。The proficiency judging device 15 judges a user's proficiency in the present system, and judges the proficiency in three stages of low, medium and high, and, depending on the proficiency, the flow of the dialogue, the contents of the system utterance, and the utterance speed. To adjust. The method for determining the proficiency will be described later in detail with reference to FIG.

【００６６】環境度判定装置１６は発話者の声質、声の
大きさ、周囲の環境等、音声認識へ影響を与える環境変
数を優・良・可の３段階で評価するもので、対話の流れ
やシステム発話内容を調整する。なお、この環境度の判
定方法は、図５を参照して後に詳述する。The environmental level determination device 16 evaluates environmental variables, such as the voice quality, loudness of the speaker, and the surrounding environment, which affect speech recognition in three stages of excellent, good, and acceptable. And adjust the content of system utterances. The method of determining the degree of environment will be described later in detail with reference to FIG.

【００６７】対話制御装置１７は、習熟度判定装置１５
および環境度判定装置１６の判定結果に基づいてユーザ
との対話の流れを制御するものである。The dialogue control device 17 includes the proficiency level determination device 15
It controls the flow of the dialogue with the user based on the determination result of the environment degree determination device 16.

【００６８】音声ガイド選択装置１８は、習熟度判定装
置１５および環境度判定装置１６の判定結果に基づい
て、音声ガイドの選択及び音声ガイドの速度を決定する
ものである。習熟度が高い場合は簡潔で短い音声ガイド
を速く発話し、習熟度が低い場合は詳細な音声ガイドを
ゆっくり発話する。また、ユーザが黙っている場合は再
入力を促し、音声入力レベルが低い場合は声を大きくす
るようにガイドする。なお、この音声ガイドの選択方法
は、図６を参照して後に詳述する。The voice guide selection device 18 determines the selection of the voice guide and the speed of the voice guide based on the determination results of the proficiency level determination device 15 and the environmental level determination device 16. When the proficiency is high, a short and short voice guide is uttered quickly, and when the proficiency is low, a detailed voice guide is uttered slowly. When the user is silent, the user is prompted to re-enter the input, and when the voice input level is low, the user is guided to increase the voice. The method of selecting the voice guide will be described later in detail with reference to FIG.

【００６９】波形特徴分析装置１９は、音声波形特徴抽
出装置１２で抽出された抽出結果に基づいて音声波形の
特徴を評価し、習熟度判定装置１５で利用可能な形にす
る。The waveform characteristic analysis device 19 evaluates the characteristics of the audio waveform based on the extraction result extracted by the audio waveform characteristic extraction device 12, and makes the waveform usable by the proficiency level determination device 15.

【００７０】音声出力装置２０は電気信号を音声信号に
変換するもので、電話受話器のスピーカ等が使用され
る。The audio output device 20 converts an electric signal into an audio signal, and uses a speaker of a telephone receiver or the like.

【００７１】音声ガイド辞書２１は、装置が発話する音
声ガイドの音声情報を保持する。The voice guide dictionary 21 holds voice information of voice guides spoken by the device.

【００７２】認識辞書２２は、音声認識に用いる認識辞
書を保持する。The recognition dictionary 22 holds a recognition dictionary used for speech recognition.

【００７３】波形特徴データベース（波形特徴ＤＢ）２
３は、音声波形特徴抽出装置１２で抽出された音声波形
の特徴および波形特徴分析装置１９で分析された音声波
形の特徴を保持する。Waveform feature database (waveform feature DB) 2
Reference numeral 3 holds the characteristics of the audio waveform extracted by the audio waveform characteristic extraction device 12 and the characteristics of the audio waveform analyzed by the waveform characteristic analysis device 19.

【００７４】フレーム・意味辞書２４は、音声認識結果
間の関係および意味的な制約情報を保持する。例えば航
空券予約フレームにおいて出発地スロットと目的地スロ
ットには同じ空港名は入らないという空港スロット制約
（意味情報）がある。また、月スロットには１から１２
までの整数が入る日付スロット制約（意味情報）があ
る。なお、フレーム・意味辞書２４の構成およびこのフ
レーム・意味辞書２４を用いた音声認識の判定処理方法
は後に図２及び図３を用いて詳述する。The frame / semantic dictionary 24 holds relations between speech recognition results and semantic constraint information. For example, in the airline ticket reservation frame, there is an airport slot restriction (semantic information) that the same airport name cannot be entered in the departure place slot and the destination slot. Also, 1 to 12 for the month slot
There is a date slot constraint (semantic information) that contains an integer up to. The structure of the frame / semantic dictionary 24 and the method of determining speech recognition using the frame / meaning dictionary 24 will be described later in detail with reference to FIGS.

【００７５】認識結果データベース（認識結果ＤＢ）２
５は、音声認識装置１３で認識された複数の認識結果候
補とその確信度データ、および認識結果判定装置１４で
確定された認識結果データとその確定された認識結果候
補の音声認識装置１３での認識結果データを保持する。Recognition result database (recognition result DB) 2
Reference numeral 5 denotes a plurality of recognition result candidates recognized by the voice recognition device 13 and their certainty data, and the recognition result data determined by the recognition result determination device 14 and the determined recognition result candidates by the voice recognition device 13. Holds recognition result data.

【００７６】対話履歴データベース（対話履歴ＤＢ）２
６は、習熟度判定装置１５で判定された習熟度の変移、
環境度判定装置１６で判定された環境度の変移、対話制
御装置１７で制御された対話の変移データを保持する。Dialogue history database (dialogue history DB) 2
6 is a transition of the proficiency level determined by the proficiency level determination device 15,
The transition data of the environment degree determined by the environment degree determination device 16 and the transition data of the dialogue controlled by the dialogue control device 17 are held.

【００７７】図２は、フレーム・意味辞書２４に格納さ
れたフレームと意味情報の説明図である。図２には、航
空券を予約する場合の航空券予約フレーム３０と、航空
券を予約する場合の意味情報３１が示されている。FIG. 2 is an explanatory diagram of frames and semantic information stored in the frame / semantic dictionary 24. FIG. 2 shows an airline ticket reservation frame 30 for airline ticket reservation and semantic information 31 for airline ticket reservation.

【００７８】ここで、航空券予約フレーム３０は航空券
の予約に必要な複数のスロット（項目）、すなわち、
「出発地」、「到着地」、「日付：月」、「日付：
日」、「時間」の５つのスロットを有し，それぞれのス
ロットには、「はねだ」、「さっぽろ」、「１２月」、
「３１日」、「午後５時５０分」が記述されている。Here, the air ticket reservation frame 30 includes a plurality of slots (items) required for air ticket reservation, ie,
"Departure place", "Arrival place", "Date: Month", "Date:
It has five slots, "day" and "time". Each slot has "Haneda", "Sapporo", "December",
“31st” and “5:50 pm” are described.

【００７９】また、意味情報３１は、航空券予約フレー
ム３０の各スロットに記述される記述内容の制約条件か
ら構成されている。The semantic information 31 is composed of constraints on the description contents described in each slot of the airline ticket reservation frame 30.

【００８０】例えば、「空港スロット間制約」３１ａと
しては、（１）同じ空港名は入らない（２）便のない組み合わせは指定されないという制約がある。For example, the "inter-airport slot restriction" 31a has a restriction that (1) the same airport name is not entered and (2) a combination without a flight is not specified.

【００８１】また、「日付スロット制約」３１ｂとして
は、（１）月は１から１２までの整数（２）日は１から３１までの整数（３）過去の日付は指定されないという制約がある。The "date slot restriction" 31b has the following restrictions: (1) month is an integer from 1 to 12; (2) day is an integer from 1 to 31; (3) a past date is not specified.

【００８２】また、「時間スロット制約」３１ｃとして
は、（１）時は、１から１２または２４までの整数（２）分は、０から５９までの整数という制約がある。As the "time slot constraint" 31c, there is a constraint that (1) hour is an integer from 1 to 12 or 24, and (2) minutes is an integer from 0 to 59.

【００８３】従って、音声認識装置１３の認識結果に加
えて、意味情報３１に含まれるスロット制約を用いるこ
とで、認識結果判定装置１４における認識精度を向上さ
せることができる。Therefore, by using the slot constraint included in the semantic information 31 in addition to the recognition result of the speech recognition device 13, the recognition accuracy of the recognition result determination device 14 can be improved.

【００８４】図３に、上記のごときスロット制約を用い
ることで、認識結果判定装置１４で複数の認識結果候補
から１つの認識結果が判定される場合の処理手順を説明
する。FIG. 3 illustrates a processing procedure in the case where one recognition result is determined from a plurality of recognition result candidates by the recognition result determination device 14 by using the slot constraint as described above.

【００８５】図３には航空券予約フレーム３０の月の判
定を行う場合、すなわち「日付：月」スロットの記述を
行う場合の処理手順が示されている。FIG. 3 shows a processing procedure when the month of the airline ticket reservation frame 30 is determined, that is, when a “date: month” slot is described.

【００８６】ところで、「日付：月」スロットの記述を
行う場合、「日付スロット制約」３１ｂのうち、（１）月は１から１２までの整数（３）過去の日付は指定されないが適用される。By the way, when describing the “date: month” slot, of the “date slot constraint” 31b, (1) month is an integer from 1 to 12 and (3) past date is not specified. .

【００８７】従って、現在の日付が９月１５日の場合、
ユーザ発話（実際のユーザの発話）３５として「１２
月」があって、音声認識装置１３が認識処理３６として
２月，１４月，１２月の３つの候補をこの順番の確信度
で認識した場合、認識結果判定装置１４の認識判定処理
３７としては、まず（１）の月は１から１２までの整数
という制約で１４月が削除され、次に、（３）の過去の
日付は指定されないという制約で２月が削除される。Therefore, if the current date is September 15,
As the user utterance (actual user utterance) 35, "12
When the voice recognition device 13 recognizes the three candidates of February, April, and December with the certainty factor in this order as the recognition process 36, the recognition determination process 37 of the recognition result determination device 14 First, 14 months are deleted with the constraint that the month (1) is an integer from 1 to 12, and then February is deleted with the constraint that the past date is not specified in (3).

【００８８】従って、音声認識装置１３の音声認識で確
信度が３番目であった１２月が確認結果としてアプリ
（アプリ本体）３８に渡される。Accordingly, the December having the third degree of certainty in the voice recognition of the voice recognition device 13 is passed to the application (application main body) 38 as a confirmation result.

【００８９】図４は、習熟度判定装置１５で判定される
習熟度の判定処理方法を説明する図である。FIG. 4 is a diagram for explaining a proficiency level determination processing method determined by the proficiency level determination device 15.

【００９０】習熟度の判定は、認識結果判定装置１４で
判定されたユーザの発話認識結果と、音声波形特徴抽出
装置１２の抽出結果を分析する波形特徴分析装置１９の
分析結果により判定される。The skill level is determined based on the utterance recognition result of the user determined by the recognition result determination device 14 and the analysis result of the waveform feature analysis device 19 that analyzes the extraction result of the voice waveform feature extraction device 12.

【００９１】すなわち、認識結果判定装置１４でユーザ
の発話認識結果が判定されると、（１）音声出力装置２
０から音声ガイドの発話が開始されてからユーザが発話
するまでの時間Ａ、（２）ユーザの発話時間Ｔ、（３）
波形特徴分析装置１９の分析結果より得られたユーザの
発話音数（発話語数）Ｎ、に基づいて、ＡおよびＴの時
間が短く、発話音数（発話語数）Ｎが少ないほど習熟度
は高く、ＡおよびＴの時間が長く、発話音数（発話語
数）Ｎが多いほど習熟度は低いと判定する。That is, when the recognition result determination device 14 determines the utterance recognition result of the user, (1) the voice output device 2
Time A from the start of utterance of the voice guide from 0 until the user speaks, (2) User's utterance time T, (3)
Based on the number of uttered sounds (number of uttered words) N of the user obtained from the analysis result of the waveform feature analysis device 19, the times of A and T are short, and the less the number of uttered sounds (number of uttered words) N is, the higher the proficiency level is. , A and T are longer and the number of uttered sounds (number of uttered words) N is larger, the proficiency is determined to be lower.

【００９２】これは、装置に習熟している人ほど何を入
力したらよいかすばやく判断でき、かつ音声ガイドが終
わるのを待たずに発話し（時間Ａが短い）、発話内容も
必要最低限のことしか言わない傾向がある（時間Ｂが短
く発話音数（発話語数）Ｎが少ない）からである。[0092] This means that a person who is more proficient in the device can quickly determine what to input, and speak without waiting for the end of the voice guide (the time A is short), and the contents of the speech are minimal. This is because there is a tendency to say nothing (the time B is short and the number of uttered sounds (number of uttered words) N is small).

【００９３】従って、Ａ，Ｔ，Ｎを上記のごとく定義
し、装置が期待した標準ユーザの発話音数（発話語数）
をｎとすると、習熟度Ｐは以下の式で求められる。Therefore, A, T, and N are defined as described above, and the number of uttered sounds (the number of uttered words) of the standard user expected by the apparatus is as follows.
Is n, the proficiency level P is obtained by the following equation.

【００９４】Ｐ＝Ａ−（Ｎ−ｎ＋Ｔ）P = A− (N−n + T)

【００９５】ただし、Ａは次のようなクリプスな値をと
る。３：音声ガイドの前半でユーザが発話した場合２：音声ガイドの後半から音声ガイド終了直後（終了後
０．２秒程度まで）にユーザが発話した場合１：音声ガイド終了後にユーザが発話するか、発話され
ない場合However, A takes the following crisp value. 3: When the user speaks in the first half of the voice guide 2: When the user speaks immediately after the voice guide ends (about 0.2 seconds after the end) from the latter half of the voice guide 1: Whether the user speaks after the voice guide ends , If not spoken

【００９６】以上が習熟度判定装置１５で判定される習
熟度の判定処理方法である。The above is the proficiency level determination method determined by the proficiency level determination device 15.

【００９７】図５は、環境度判定装置１６で判定される
環境度の判定処理方法を説明する図である。FIG. 5 is a diagram for explaining a method of determining the environmental level determined by the environmental level determining device 16.

【００９８】環境度の判定は、認識結果判定装置１４で
判定されたユーザの発話認識結果と、音声波形特徴抽出
装置１２の抽出結果を分析する波形特徴分析装置１９の
分析結果により判定される。The determination of the environmental level is made based on the user's utterance recognition result determined by the recognition result determination device 14 and the analysis result of the waveform feature analysis device 19 that analyzes the extraction result of the voice waveform feature extraction device 12.

【００９９】図５において、は装置側およびユーザ側
ともに発話していない時に入る環境ノイズの強さ、は
ユーザの発話の強さ、はユーザの発話の強さと環境ノ
イズの強さの差を示す。In FIG. 5, indicates the intensity of environmental noise which is input when neither the device side nor the user side is uttering, indicates the user's utterance intensity, and indicates the difference between the user's utterance intensity and the environmental noise intensity. .

【０１００】ここで、環境度は、が小さくが大き
い、すなわち環境ノイズが小さくユーザの発話の強さが
大きいほど環境は良く、逆にが大きくが小さい、す
なわち環境ノイズが大きくユーザの発話の強さが小さい
ほど環境は悪いと判定する。また、仮にが大きくても
が大きい、すなわちに対してが相対的に大きい場
合は環境は良いと判定する。Here, the environment degree is small and large, that is, the environment is better as the environmental noise is small and the user's utterance strength is large, and conversely, the environment level is large and small, that is, the environment noise is large and the user's utterance strength is large. It is determined that the smaller the value, the worse the environment. If the value is large even if the value is large, that is, if the value is relatively large, the environment is determined to be good.

【０１０１】なお、音声信号と雑音の分離は、ＦＦＴ
（高速離散フーリエ変換）等の既知の手法を用いて行
い、任意の時点ｉでのＦＦＴで変換された周波数領域で
の振幅情報をＧ「ｉ」、雑音振幅をｇ「ｉ」とすると、
それぞれのエネルギーは次式で与えられる。Note that the speech signal and noise are separated by FFT.
(High-speed discrete Fourier transform), and the amplitude information in the frequency domain converted by the FFT at an arbitrary time point i is G “i” and the noise amplitude is g “i”.
Each energy is given by the following equation.

【０１０２】雑音を含む周波数領域のエネルギーＥ１
は、Ｅ１＝ΣＧ「ｉ」² 雑音の周波数領域のエネルギーＥ２は、Ｅ２＝Σｇ「ｉ」² 音声の周波数領域のエネルギーは、Ｅ３＝ΣＧ「ｉ」²−Σｇ「ｉ」² Energy E1 in frequency domain including noise
Is: E1 = ΣG “i”^Two The energy E2 in the frequency domain of the noise is: E2 = Σg “i”^Two The energy in the voice frequency domain is: E3 = ΣG “i”^Two−Σg “i”^Two

【０１０３】従って、環境度Ｓは次式で与えられる。Ｓ＝Ｅ３／Ｅ１×α（但し、αは定数）Therefore, the environmental degree S is given by the following equation. S = E3 / E1 × α (where α is a constant)

【０１０４】図６は、音声ガイド選択装置１８で選択さ
れる音声ガイドの選択手法を説明する図である。FIG. 6 is a diagram for explaining a method of selecting a voice guide selected by the voice guide selecting device 18. As shown in FIG.

【０１０５】音声ガイドの選択は、習熟度判定装置１５
で判定されたユーザの習熟度と、環境度判定装置１６で
判定された環境度と、対話制御装置１７で制御された装
置とユーザの対話履歴に基づき、波形特徴データベース
（波形特徴ＤＢ）２３、認識結果データベース（認識結
果ＤＢ）２５、対話履歴データベース（対話履歴データ
ベース）２６に格納されたデータを参照して行われる。The selection of the audio guide is performed by the proficiency judgment device 15.
The waveform feature database (waveform feature DB) 23, based on the user's proficiency determined in the above, the environment degree determined by the environment degree determination device 16, and the interaction history of the user and the device controlled by the dialog control device 17. This is performed by referring to data stored in a recognition result database (recognition result DB) 25 and a dialog history database (dialog history database) 26.

【０１０６】すなわち、音声ガイド選択装置１８は、音
声ガイド用ルール４０と、音声ガイドデータベース（音
声ガイドＤＢ）４１より構成され、習熟度、環境度およ
び再入力回数によって、発話する音声ガイドの内容、発
話速度を変更している。例えば、習熟度が高ければ「出
発地をどうぞ」という短いガイドを速い速度で発話させ
る。また，再入力の際には環境度をチェックし、環境が
悪ければ悪環境の音声ガイドを発話した後に入力要求の
音声ガイドを発話する。また、再入力が繰り返されるご
とに、音声ガイドを詳しいものに変えていく。That is, the voice guide selecting device 18 is composed of a voice guide rule 40 and a voice guide database (voice guide DB) 41, and the content of the voice guide to be uttered according to the proficiency level, the environmental level and the number of re-inputs. The utterance speed has been changed. For example, if the proficiency level is high, a short guide "Please go to the departure place" is uttered at a high speed. At the time of re-input, the degree of environment is checked. If the environment is bad, a voice guide of the bad environment is uttered, and then a voice guide of the input request is uttered. Also, every time re-input is repeated, the voice guide is changed to a detailed one.

【０１０７】従って、例えば出発地の音声ガイドの場
合、音声ガイド用ルール４０には、以下のようなルール
が用意されている。（１）ＩＦ熟練度＝高ＴＨＥＮ音声ガイド＝出発地「０」、発話速度＝速（２）ＩＦ熟練度＝中ＴＨＥＮ音声ガイド＝出発地「１」、発話速度＝中（３）ＩＦ熟練度＝低ＴＨＥＮ音声ガイド＝出発地「２」、発話速度＝遅（４）ＩＦ０＜ループ回数＜３ＴＨＥＮ音声ガイド＝出発地「ループ回数」発話速
度＝速（５）ＩＦ環境度＝可ＴＨＥＮ音声ガイド＝悪環境「０」＋音声ガイド（６）ループ回数＋＝１Accordingly, for example, in the case of a voice guide at the departure place, the following rules are prepared as the voice guide rules 40. (1) IF skill level = high THEN voice guide = departure place "0", utterance speed = fast (2) IF skill level = medium THEN voice guide = departure place "1", utterance rate = medium (3) IF skill level = Low THEN Audio guide = Departure point "2", Speech rate = Slow (4) IF 0 <Number of loops <3 THEN Audio guide = Departure point "Loop count" Speech rate = Speed (5) IF Environment = OK THEN Voice Guide = bad environment "0" + voice guide (6) Number of loops + = 1

【０１０８】また、音声ガイドデータベース（音声ガイ
ドＤＢ）４１には、出発地要求と悪環境について、以下
のような音声ガイドデータが格納されている。[0108] The voice guide database (voice guide DB) 41 stores the following voice guide data regarding the departure point request and the bad environment.

【０１０９】出発地要求「３」として、「出発地をどう
ぞ」「出発地の空港名を言ってください」「出発地の空
港名を『羽田空港』のように言ってください」悪環境と
して、「周囲の環境か回線状態が良くありません。受話
器を口に少し近づけてお話ください。」As the departure point request “3”, “please specify the departure point”, “please state the airport name of departure point”, “please state the airport name of departure point like“ Haneda Airport ”” "The surroundings or the line are not good. Please talk with the handset a little closer to your mouth."

【０１１０】以上が音声ガイド選択装置１８の内容であ
る。The contents of the voice guide selecting device 18 have been described above.

【０１１１】次に、本実施形態の処理手順を図７のフロ
ーチャートに基づいて説明する。なお、本実施形態で
は、習熟度判定装置１５で判定された習熟度と環境度判
定装置１６で判定された環境度に基づいて、、、
の３種の音声ガイドを行う。Next, the processing procedure of this embodiment will be described with reference to the flowchart of FIG. In the present embodiment, based on the proficiency level determined by the proficiency level determination device 15 and the environmental level determined by the environmental level determination device 16,
Of the three types of audio guidance.

【０１１２】ここで、、、の３種の音声ガイド
は、図９に示した３種の音声ガイド（ａ）、（ｂ）、
（ｃ）に対応しており、は図９の（ａ）のように、１
項目ずつ入力して、１項目ずつ確認するものである。ま
た、は図９の（ｂ）のように、２項目入力されたら２
項目まとめて確認をおこなうものである。また、は図
９の（ｃ）のように、一度にすべての項目を入力し、そ
の後にすべての項目の確認を行うものである。Here, the three types of voice guides are the three types of voice guides (a), (b), and
(C), and as shown in FIG.
The user inputs each item and confirms each item. Also, as shown in FIG. 9B, when two items are input, 2
Items are checked together. Further, as shown in FIG. 9C, all the items are input at once, and thereafter all the items are confirmed.

【０１１３】すなわち、図７において、プログラムがス
タートされると、習熟度が「高」、「中」、「低」のい
ずれであるかが調べられる（ステップ１００）。That is, in FIG. 7, when the program is started, it is checked whether the proficiency level is “high”, “medium” or “low” (step 100).

【０１１４】ここで、習熟度が「高」の場合はステップ
１０２に進み、環境度が「高」、「中」、「低」のいず
れであるかが調べられる。ここで、環境度が「高」の場
合はの処理に進み、出発地、目的地、日付、時間のす
べてが入力され（ステップ１０４）、続いて確認が行わ
れる（ステップ１０６）。そして、この確認の処理で１
項目でも間違いがあると（ステップ１０６でＮＯ）、再
度ステップ１０４の入力処理が行われる。また、すべて
の項目が確認されると（ステップ１０６でＹＥＳ）、全
体の確認が行われ（ステップ１３４）、当処理を終了す
る。If the proficiency level is "high", the process proceeds to step 102, where it is checked whether the environmental level is "high", "medium" or "low". Here, the process proceeds to the case where the environmental degree is “high”, and all of the departure place, the destination, the date, and the time are input (step 104), and then the confirmation is performed (step 106). And in this confirmation processing, 1
If there is an error in the item (NO in step 106), the input process in step 104 is performed again. When all the items have been confirmed (YES in step 106), the overall confirmation is performed (step 134), and the process ends.

【０１１５】一方、ステップ１０２で環境度が「中」と
判定された場合はの処理が行われ、ステップ１２６に
進む。また、環境度が「低」と判定された場合はの処
理が行われ、ステップ１０８に進む。On the other hand, if it is determined in step 102 that the environmental level is "medium", the processing is performed, and the routine proceeds to step 126. When the environmental level is determined to be “low”, the process is performed, and the process proceeds to step 108.

【０１１６】次に、ステップ１００で習熟度が「低」と
判定された場合は、の処理が行われ、ステップ１０８
に進む。ステップ１０８では、まず出発地が入力され、
続いてその確認が行われる（ステップ１１０）。Next, when the proficiency level is determined to be "low" in step 100, the process of
Proceed to. In step 108, the departure place is first entered,
Subsequently, the confirmation is performed (step 110).

【０１１７】ここで、出発地の確認ができない場合は
（ステップ１１０でＮＯ）、再度ステップ１０８の処理
が行われるが、出発地の確認ができた場合は（ステップ
１１０でＹＥＳ）、ステップ１１２に進む。If the departure place cannot be confirmed (NO in step 110), the process in step 108 is performed again. If the departure place can be confirmed (YES in step 110), the process proceeds to step 112. move on.

【０１１８】ステップ１１２では、続いて目的地が入力
され、続いてその確認が行われる（ステップ１１４）。At step 112, the destination is input, and the confirmation is subsequently performed (step 114).

【０１１９】ここで、目的地の確認ができない場合は
（ステップ１１４でＮＯ）、再度ステップ１１２の処理
が行われるが、目的地の確認ができた場合は（ステップ
１１４でＹＥＳ）、ステップ１１６に進む。If the destination cannot be confirmed (NO in step 114), the process in step 112 is performed again. If the destination can be confirmed (YES in step 114), the process proceeds to step 116. move on.

【０１２０】ステップ１１６では、続いて日付が入力さ
れ、続いてその確認が行われる（ステップ１１８）。At step 116, the date is subsequently input, and the confirmation is subsequently performed (step 118).

【０１２１】ここで、日付の確認ができない場合は（ス
テップ１１８でＮＯ）、再度ステップ１１６の処理が行
われるが、日付の確認ができた場合は（ステップ１１８
でＹＥＳ）、ステップ１２０に進む。If the date cannot be confirmed (NO in step 118), the process in step 116 is performed again. If the date can be confirmed (step 118).
YES), and proceed to step 120.

【０１２２】ステップ１２０では、続いて時間が入力さ
れ、続いてその確認が行われる（ステップ１２２）。At step 120, a time is subsequently input, and then confirmation is made (step 122).

【０１２３】ここで、時間の確認ができない場合は（ス
テップ１２２でＮＯ）、再度ステップ１２０の処理が行
われるが、時間の確認ができた場合は（ステップ１２２
でＹＥＳ）、ステップ１３４に進んで全体確認が行わ
れ、当処理を終了する。If the time cannot be confirmed (NO in step 122), the process in step 120 is performed again. If the time can be confirmed (step 122).
YES), the process proceeds to step 134, the overall confirmation is performed, and the process ends.

【０１２４】一方、ステップ１００で習熟度が「中」と
判定された場合は、ステップ１２４に進み、環境度が調
べられる。On the other hand, if the proficiency level is determined to be "medium" in step 100, the process proceeds to step 124, where the environmental level is checked.

【０１２５】ここで、環境度が「低」の場合はの処理
に進み、ステップ１０８以下の処理が行われる。Here, the process proceeds to the case where the environmental level is "low", and the processes from step 108 onward are performed.

【０１２６】また、環境度が「低」以外の場合、すなわ
ち「高」、「中」の場合は、ステップ１２６に進み、
の処理が行われる。If the environmental level is other than "low", that is, "high" or "medium", the routine proceeds to step 126,
Is performed.

【０１２７】ステップ１２６では、まず出発地と目的地
が入力され、続いてその確認が行われる（ステップ１２
８）。At step 126, a departure place and a destination are first inputted, and then confirmation is made (step 12).
8).

【０１２８】ここで、出発地と目的地の確認ができない
場合は（ステップ１２８でＮＯ）、再度ステップ１２６
の処理が行われるが、出発地と目的地の確認ができた場
合は（ステップ１２８でＹＥＳ）、ステップ１３０に進
む。If the starting point and the destination cannot be confirmed (NO in step 128), step 126 is performed again.
Is performed, but if the departure place and the destination are confirmed (YES in step 128), the process proceeds to step 130.

【０１２９】ステップ１３０では、続いて日付と時間が
入力され、続いてその確認が行われる（ステップ１３
２）。In step 130, the date and time are subsequently input, and the confirmation is subsequently performed (step 13).
2).

【０１３０】ここで、日付と時間の確認ができない場合
は（ステップ１３２でＮＯ）、再度ステップ１３０の処
理が行われるが、日付と時間の確認ができた場合は（ス
テップ１３２でＹＥＳ）、ステップ１３４に進んで全体
確認が行われ、当処理を終了する。If the date and time cannot be confirmed (NO in step 132), the process in step 130 is performed again. If the date and time can be confirmed (YES in step 132), the process proceeds to step 130. Proceeding to 134, the overall confirmation is performed, and this processing ends.

【０１３１】以上が本実施形態の処理手順である。The above is the processing procedure of the present embodiment.

【０１３２】次に、図１に示した各構成要素間の作用を
図８を参照しながら説明する。Next, the operation between the components shown in FIG. 1 will be described with reference to FIG.

【０１３３】図８において、音声入力装置１１に音声が
入力されると、入力された音声信号は音声波形特徴抽出
装置１２および音声認識装置１３に送出される。In FIG. 8, when a voice is input to the voice input device 11, the input voice signal is transmitted to the voice waveform feature extraction device 12 and the voice recognition device 13.

【０１３４】音声波形特徴抽出装置１２に送出された音
声信号は音声波形特徴抽出装置１２で発話の語数、発話
速度、発話速度の変移、発話の強さ、発話の強さの変
移、発話可能時から実際の発話までの遅れ時間等が抽出
され、これらのデータはさらに波形特徴分析装置１９に
おいて習熟度判定装置１５で利用可能な形に加工され
る。そして、波形特徴分析装置１９で加工されたデータ
は波形特徴データベース（波形特徴ＤＢ）２３に格納さ
れる。The speech signal sent to the speech waveform feature extraction device 12 is used by the speech waveform feature extraction device 12 for the number of words of speech, speech speed, change in speech speed, speech intensity, speech intensity change, and when speech is possible. , And a delay time until an actual utterance is extracted, and these data are further processed by the waveform characteristic analyzer 19 into a form usable by the proficiency determining device 15. The data processed by the waveform feature analyzer 19 is stored in a waveform feature database (waveform feature DB) 23.

【０１３５】また、音声認識装置１３に送出された音声
信号は認識辞書２２を参照しながら文字情報に変換さ
れ、複数の認識結果候補が求められる。Further, the speech signal sent to the speech recognition device 13 is converted into character information while referring to the recognition dictionary 22, and a plurality of recognition result candidates are obtained.

【０１３６】認識結果判定装置１４は、フレーム・意味
辞書２４を参照しながら音声認識装置１３で求められた
複数の認識結果候補中から１つの認識結果を確定する。The recognition result determination device 14 determines one recognition result from the plurality of recognition result candidates obtained by the speech recognition device 13 while referring to the frame / semantic dictionary 24.

【０１３７】また、音声認識装置１３及び認識結果判定
装置１４で得られたデータは認識結果データベース（認
識結果ＤＢ）２５に格納される。The data obtained by the voice recognition device 13 and the recognition result determination device 14 are stored in a recognition result database (recognition result DB) 25.

【０１３８】認識結果判定装置１４で音声情報が１つの
文字情報に確定されると、習熟度判定装置１５でユーザ
の習熟度が判定されるとともに環境度判定装置１６でユ
ーザの環境度が判定される。When the voice information is determined to be one character information by the recognition result determining device 14, the proficiency determining device 15 determines the user's proficiency level and the environmental level determining device 16 determines the user's environmental level. You.

【０１３９】対話制御装置１７は習熟度判定装置１５で
判定された習熟度および環境度判定装置１６で判定され
た環境度に基づいて、図７に示すごとく、対話の流れを
制御する。The dialogue control device 17 controls the flow of the dialogue as shown in FIG. 7 based on the proficiency level determined by the proficiency level determination device 15 and the environment level determined by the environment level determination device 16.

【０１４０】音声ガイド選択装置１８は、習熟度判定装
置１５で判定された習熟度、環境度判定装置１６で判定
された環境度および対話制御装置１７の制御結果に基づ
き、音声ガイド辞書２１を参照して音声ガイドの種別選
択および音声ガイドの速度を選択する。また、この際、
装置側とユーザの対話履歴が対話履歴データベース（対
話履歴ＤＢ）１６に格納され、対話履歴データベース
（対話履歴ＤＢ）１６に格納された対話履歴は音声ガイ
ド選択装置１８で利用される。The voice guide selecting device 18 refers to the voice guide dictionary 21 based on the proficiency determined by the proficiency determining device 15, the environmental level determined by the environmental level determining device 16, and the control result of the dialog control device 17. To select the type of voice guide and the speed of the voice guide. At this time,
The dialog history between the device and the user is stored in a dialog history database (dialog history DB) 16, and the dialog history stored in the dialog history database (dialog history DB) 16 is used by the voice guide selecting device 18.

【０１４１】そして、音声ガイド選択装置１８で選択さ
れた音声ガイドは音声出力装置２０から出力される。The voice guide selected by the voice guide selecting device 18 is output from the voice output device 20.

【０１４２】このように、第１の実施の形態では、ユー
ザの熟練度及び環境を判定し、音声ガイド、対話及び対
話の流れを変化させるようにしたので、ユーザの熟練度
や周囲の環境及び回線の状態に応じて適切な応答をする
ことができ、ユーザに負担をかけない自然な対話を実現
するとともに、通話時間を短くすることができる。As described above, in the first embodiment, the skill level and environment of the user are determined, and the voice guide, the dialogue, and the flow of the dialogue are changed. An appropriate response can be made in accordance with the state of the line, a natural conversation without burdening the user can be realized, and the talk time can be shortened.

【０１４３】従って、ユーザにストレスを与えない対話
装置とすることができる。また、電話回線の有効利用を
図ることができる。Therefore, it is possible to provide an interactive device that does not give stress to the user. Further, it is possible to effectively use the telephone line.

【０１４４】ところで、従来の音声認識応答装置では、
以下の不具合もあった。By the way, in the conventional voice recognition response device,
There were also the following problems.

【０１４５】（１）従来の音声認識応答装置では、認識
精度の向上を図ればスムーズな対話が実現できると考え
られてきた。しかし、不特定多数の人々の発声内容を１
００％正しく認識するというのは非常に困難である。ま
た、一度誤認識したユーザの発声は、繰り返し入力して
もらったとしても、なぜ認識されないのかが理解できな
いため同じ発声をしてしまい、その都度、誤認識してし
まうのが一般的である。そして、こういった誤認識が発
生すると、正解するまで同じ質問が繰り返され、結局は
オペレータに接続して言いたいことを伝えることになっ
てしまう。こうした処理は、できるだけ短い対話時間で
伝えたいことを伝えるというユーザのニーズに沿ってい
ない。(1) It has been considered that a conventional speech recognition response device can realize a smooth conversation if the recognition accuracy is improved. However, the content of unspecified number of people
It is very difficult to recognize 00% correctly. In addition, even if the user's utterance that was once erroneously recognized is repeatedly input, it is not common to understand why the utterance is not recognized, so that the user utters the same utterance. When such an erroneous recognition occurs, the same question is repeated until a correct answer is given, and eventually the operator is connected to tell what he / she wants to say. Such a process does not meet the user's need to convey what he / she wants to convey in as short a conversation time as possible.

【０１４６】（２）また、ユーザの発声量は認識精度に
大きく影響するが、従来その対策がなされていない。ま
た、ゲイン調整、変調などのフィルタ処理をした後特徴
量を算出するという処理であるならば、算出された特徴
量に若干のずれが生じ、認識精度に悪影響を及ぼす恐れ
がある。(2) Although the amount of utterance of the user greatly affects the recognition accuracy, no countermeasure has been taken conventionally. In addition, if the processing is to calculate a characteristic amount after performing filter processing such as gain adjustment and modulation, the calculated characteristic amount may be slightly shifted, which may adversely affect recognition accuracy.

【０１４７】（３）装置から発話される認識結果確認の
ための復唱プロンプト（質問）に複数桁の数字を含む場
合、従来は、予め録音しておいた１桁の数字プロンプト
（０〜９）を、その桁数分発話してプロンプトの作成を
行っていたので、時間のつながりが不自然で、聞き取り
にくい面がある。(3) In the case where a repeat prompt (question) for confirming the recognition result uttered from the device includes a plurality of digits, conventionally, a single-digit number prompt (0 to 9) recorded in advance is used. , The number of digits is uttered to create the prompt, which makes the connection of time unnatural and difficult to hear.

【０１４８】そこで、以下第２の実施形態として、以下
のような構成を採用した音声認識応答装置を説明する。（１）対話中同じシステムプロンプトをできるだけ繰り
返さないようにして、ユーザに発生する不快感をおさえ
る。（２）同じ内容の質問を繰り返すことなく関連性のある
質問をして認識結果を絞り込み、認識精度を上げて対話
時間を減少させる。（３）ユーザの応答の仕方をもとに、装置がユーザに働
きかけることで認識しやすい状況を作り出し、認識精度
をあげて対話時間を減少させる。（４）不自然な数字復唱プロンプトの作成を工夫するこ
とで、低コストに聞き取りやすいシステムプロンプトを
作成できるようにする。Thus, as a second embodiment, a speech recognition responder adopting the following configuration will be described below. (1) The user does not repeat the same system prompt during the conversation as much as possible to reduce the discomfort generated by the user. (2) Relevant questions are narrowed down without repetition of questions having the same content to narrow down recognition results, thereby increasing recognition accuracy and reducing conversation time. (3) Based on how the user responds, the apparatus works on the user to create a situation that is easy to recognize, and increases the recognition accuracy to reduce the conversation time. (4) It is possible to create a low-cost and easy-to-hear system prompt by devising an unnatural number repeat prompt.

【０１４９】なお、以下の説明では、本実施形態が航空
機発着案内サービスを行う音声認識応答装置に適用され
た場合について説明する。In the following description, a case will be described in which the present embodiment is applied to a voice recognition response device that performs an aircraft departure and arrival guidance service.

【０１５０】なお、航空機発着案内サービスには２種類
の案内サービスがあり、一つはユーザが航空機の便名を
入力することでその便の発着情報が発話されるものであ
り、一つは発着地と到着地を入力することでその路線を
運航されるすべての便の情報が発話されるものである。The aircraft departure / arrival guidance service has two types of guidance services. One is for the user to enter the flight number of the aircraft to speak the flight arrival / departure information, and the other is for the flight arrival / departure information. By inputting the destination and the destination, information on all the flights operated on that route is uttered.

【０１５１】図１２は本実施形態の全体的な処理の流れ
を示す構成図である。FIG. 12 is a block diagram showing the overall processing flow of this embodiment.

【０１５２】なお、以下の説明では、全体的な処理の流
れをステート単位で説明する。ステートとは、システム
プロンプト（装置側からの発話）と、それに対するユー
ザの発話（答え）の組を意味する。In the following description, the overall processing flow will be described for each state. The state means a set of a system prompt (utterance from the device side) and a user's utterance (answer) in response to the prompt.

【０１５３】なお、以下の説明では、システムプロンプ
トとしての発話は「」内に、それに対するユーザの発話
は『』内に示す。In the following description, the utterance as the system prompt is shown in "", and the utterance of the user corresponding thereto is shown in "".

【０１５４】（２０１）サービス選択ステートこのステートは、上記のごとく、この実施形態では２種
類のサービスが提供されるので、どちらのサービスを利
用するかをユーザに問いかけ、その応答を得るステート
である。(201) Service Selection State As described above, in this embodiment, two types of services are provided. In this state, the user is asked which service to use and a response is obtained. .

【０１５５】システムプロンプトでは、「便名、路線の
どちらのサービスをご利用されますか？」の旨を発話
し、ユーザは『便名』あるいは『路線』という応答を返
すことになる。そして、この選択により、サービスの分
岐が実施される。認識できたならば選択サービス確認ス
テート２０２へ、認識できなかった場合はＹｅｓＮｏス
テート２０３へ遷移する。At the system prompt, the user utters "Which service is used, flight number or route?", And the user returns a response of "flight name" or "route". Then, by this selection, the service is branched. If it can be recognized, the state transitions to the selected service confirmation state 202. If it cannot be recognized, the state transitions to the YesNo state 203.

【０１５６】（２０２）選択サービス確認ステート選択したサービスを復唱プロンプトでユーザに返し、そ
の正否を得る部分である。システムプロンプトは「『便
名サービス』でよろしいでしょうか？」あるいは「『路
線サービス』でよろしいでしょうか？」となり、それに
対するユーザの応答は『はい』、『いいえ』のどちらか
になる。(202) Selected Service Confirmation State This is the part for returning the selected service to the user at the repeat prompt and obtaining the correctness. The system prompt is "Are you sure with" flight service "?" Or "Are you sure with" route service "?", And the user's response to it is either "yes" or "no."

【０１５７】ユーザの応答が『はい』であるならば認識
結果確定部２０４へ遷移し、『いいえ』あるいは認識が
できなかった場合はＹｅｓＮｏステート２０３へ遷移す
る。If the user's response is "yes", the flow proceeds to the recognition result determination unit 204. If the response is "no" or recognition is not possible, the flow proceeds to the YesNo state 203.

【０１５８】（２０３）ＹｅｓＮｏステートサービス選択ステート２０１で認識できなかった場合、
選択サービス確認ステート２０２で誤認識したことがわ
かった場合、または『はい』、『いいえ』を認識できな
かった場合に遷移するステートである。ここでのシステ
ムプロンプトは、「便名ですね？」か、「路線ですね
？」のどちらかを発話し、ユーザが『はい』、『いい
え』と答えればよいステートである。(203) YesNo state When the service cannot be recognized in the service selection state 201,
This is a state to which a transition is made when it is found that misrecognition has been made in the selected service confirmation state 202, or when "yes" or "no" cannot be recognized. Here, the system prompt is a state in which the user utters either "is the flight number?" Or "is the route?" And the user can answer "yes" or "no".

【０１５９】すなわち、システムプロンプトが「便名で
すね？」の場合は、ユーザが『はい』と発話すると「便
名情報サービス」を求めていることが分かり、『いい
え』と発話すると「路線情報サービス」を求めているこ
とが分かる。That is, if the system prompt is "Flight name?", The user utters "Yes" to know that he / she wants the "flight name information service". Service ".

【０１６０】また、システムプロンプトが「路線ですね
？」の場合は、ユーザが『はい』と発話すると「路線情
報サービス」を求めていることが分かり、『いいえ』と
発話すると「便名情報サービス」を求めていることが分
かる。If the system prompt is "Route line?", If the user speaks "Yes", it is understood that the user is requesting "Route information service". "Is required.

【０１６１】（２０４）認識結果確定部ここでは、ユーザが利用するサービスを、認識結果に基
づき「便名」あるいは「路線」のどちらかに決定する。(204) Recognition Result Determining Unit Here, the service used by the user is determined to be either “flight name” or “route” based on the recognition result.

【０１６２】（２０５）前ステートユーザ発声量チェッ
ク部ここでは、ユーザはこれまでのステートでいくつかの発
話をしてきたので、そのときの発声量の大きさを検出し
て、システムプロンプト音量調節部２０６に渡す。(205) Previous State User Speech Amount Check Unit Here, since the user has made several utterances in the previous state, the magnitude of the utterance amount at that time is detected, and the system prompt volume adjustment unit is detected. Pass it to 206.

【０１６３】（２０６）システムプロンプト音量調節部前ステートユーザ発声量チェック部２０５で検出された
ユーザの発声量に基づいてシステムプロンプトの音量を
調節する。(206) System Prompt Volume Adjusting Unit The volume of the system prompt is adjusted based on the user's utterance detected by the pre-state user utterance checking unit 205.

【０１６４】なお、この場合の処理手順は後に詳説する
が、音量調整はユーザの発声量が大きいほどシステムプ
ロンプト音量を大きくし、ユーザの発声量が小さいほど
システムプロンプト音量を小さくする。Although the processing procedure in this case will be described later in detail, the volume adjustment is such that the system prompt volume increases as the user's utterance increases, and the system prompt volume decreases as the user's utterance decreases.

【０１６５】（２０７）サービス分岐部ここで、便名情報サービスと路線情報サービスに分岐す
る。(207) Service branching unit Here, a branch is made to the flight number information service and the route information service.

【０１６６】（２０８）便名認識ステート便名情報サービスを利用する場合に遷移してくるステー
トである。ここでのシステムプロンプトは「お調べにな
りたい便名をおっしゃってください」となり、ユーザは
桁ありあるいは桁なしのどちらかの表現で、調べたい便
名を言う。(208) Flight Number Recognition State This is a state to which a transition is made when using the flight number information service. The system prompt here is "Please say the flight number you want to look up", and the user says the flight name you want to look up in either with or without digits.

【０１６７】この場合、桁ありの場合は『“ななひゃく
さんじゅうご”びん』という表現になり、桁なしの場合
は『“ななさんご”びん』という表現になる。In this case, if there is a digit, the expression is "Nanahyakusanjugo" bottle, and if there is no digit, the expression is "" Nanasango "bottle".

【０１６８】なお、便名認識ステート２０８おいて、
「認識不可」とあるのは、認識ができなかったとき、も
う一度このステートを繰り返すことを示す。以下も同様
である。In the flight number recognition state 208,
"Unrecognizable" means that this state is repeated again when recognition is not possible. The same applies to the following.

【０１６９】（２０９）便名確認ステート復唱プロンプトで、便名認識ステート２０８で認識した
便名の正否を確認する。(209) Flight number confirmation state At the repetition prompt, the correctness of the flight number recognized in the flight number recognition state 208 is confirmed.

【０１７０】確認した結果、ユーザの応答が『はい』の
場合は認識結果確定部２１１に遷移するが、『いいえ』
の場合は、数字発音法プロンプト生成部２１０か便名認
識ステート２０８に遷移する。数字発音法プロンプト生
成部２１０に遷移するのは認識結果の数字に１か７のど
ちらかを含む場合である。As a result of the confirmation, if the user's response is “yes”, the processing transits to the recognition result deciding section 211, but “no”
In this case, the process proceeds to the digit pronunciation method prompt generation unit 210 or the flight number recognition state 208. The transition to the number pronunciation method prompt generation unit 210 occurs when the number of the recognition result includes either 1 or 7.

【０１７１】（２１０）数字発音法プロンプト生成部便名確認ステート２０９でユーザが『いいえ』と答え、
かつ認識した数字の中に１か７が含まれていたら、「数
字の７を発音する場合は“なな”とおっしゃってくださ
い。」という内容のシステムプロンプトを発話する。(210) Numeric pronunciation method prompt generation unit In the flight number confirmation state 209, the user answers "No",
If 1 or 7 is included in the recognized number, a system prompt of "Please say" Nana "to pronounce the number 7" is uttered.

【０１７２】（２１１）認識結果確定部ユーザが運航状況を知りたいと思っている便名を確定す
る部分である。(211) Recognition result deciding section This section decides the flight number for which the user wants to know the operation status.

【０１７３】（２１２）運航状況案内部運航状況をユーザに知らせる部分である。便名情報サー
ビスを利用した場合は、その便の発着状況を発話する。(212) Operation status guidance section This section informs the user of the operation status. When the flight name information service is used, the status of arrival and departure of the flight is spoken.

【０１７４】以上が便名情報サービスを選択した場合の
処理の流れである。The above is the processing flow when the flight number information service is selected.

【０１７５】（２１３）出発空港名認識ステート路線情報サービスを選択した場合、サービス分岐部２０
７から遷移してくるステートである。ここでのシステム
プロンプトは、「お調べになりたい路線の出発空港名を
おっしゃってください。」となり、ユーザは出発空港名
を言うことになる。(213) Departure Airport Name Recognition State When the route information service is selected, the service branching unit 20
This is the state that transitions from 7. The system prompt here is "Please tell me the departure airport name of the route you want to look up.", And the user will say the departure airport name.

【０１７６】なお、この出発空港名認識ステートでは、
類似した発音を有する複数の空港名が認識される。In this departure airport name recognition state,
A plurality of airport names having similar pronunciations are recognized.

【０１７７】（２１４）出発空港名確認ステート復唱プロンプトで出発空港名認識ステート２１３で認識
した複数の空港名のうち、確信度のもっとも高い空港名
を復唱して出発空港名の正否を確認する。ユーザの答え
が『はい』であれば認識結果確定部２１７に遷移し、
『いいえ』なら都道府県名認識ステート２１５に遷移す
る。(214) Departure Airport Name Confirmation State At the repetition prompt, the airport name with the highest certainty among the plurality of airport names recognized in the departure airport name recognition state 213 is repeated to confirm the correctness of the departure airport name. If the user's answer is "yes", the process proceeds to the recognition result determination unit 217,
If “No”, the state transits to the prefecture name recognition state 215.

【０１７８】（２１５）都道府県名認識ステート出発空港名確認ステート２１４で『いいえ』と答えたと
き、空港の存在する都道府県名を聞くものである。例え
ばユーザが『伊丹』と言ったにもかかわらず、『北見』
と認識された場合、「出発空港のある都道府県名をおっ
しゃってください」とシステムプロンプトが発話され、
ユーザは『おおさか』と答えることになる。(215) Prefecture Name Recognition State When "No" is answered in the departure airport name confirmation state 214, the name of the prefecture where the airport is located is heard. For example, despite the user saying "Itami", "Kitami"
, The system prompt says, "Please state the name of the state or province where the departure airport is located."
The user answers "Osaka".

【０１７９】（２１６）空港名確認ステート都道府県名認識ステート２１５での認識結果と、出発空
港名認識ステート２１３で認識された複数の認識候補名
を利用して空港名の確認をとるものである。(216) Airport Name Confirmation State The airport name is confirmed using the recognition result in the prefecture name recognition state 215 and a plurality of recognition candidate names recognized in the departure airport name recognition state 213. .

【０１８０】例えば、都道府県名認識ステート２１５で
の認識された都道府県名が「おおさか」で、出発空港名
認識ステート２１３で認識された認識候補のなかに「い
たみ」が存在するなら「大阪の伊丹空港でよろしいでし
ょうか？」というシステムプロンプトが発話されること
になる。これに対して、ユーザは『はい』、『いいえ』
で答えを返す。そして、『はい』の場合は認識結果確定
部２１７に遷移し、『いいえ』の場合は都道府県名認識
ステート２１５に戻る。For example, if the prefecture name recognized in the prefecture name recognition state 215 is “Osaka” and “itami” exists in the recognition candidates recognized in the departure airport name recognition state 213, “Osaka” Is it OK at Itami Airport? " On the other hand, the user can answer “Yes” or “No”
To return the answer. If “yes”, the process transits to the recognition result determination unit 217, and if “no”, the process returns to the prefectural name recognition state 215.

【０１８１】（２１７）認識結果確定部ユーザが運航状況を知りたい路線の出発空港名を確定す
る部分である。(217) Recognition result determination section This is a section for determining the departure airport name of the route on which the user wants to know the operation status.

【０１８２】（２１８）到着空港名認識ステート認識結果確定部２１７で出発空港名が確定した場合遷移
してくるステートである。ここでのシステムプロンプト
は、「到着空港名をおっしゃってください。」となり、
ユーザは到着空港名を言うことになる。(218) Arrival Airport Name Recognition State This is a state to which the state changes when the departure airport name is determined by the recognition result determination unit 217. The system prompt here is "Please state your arrival airport name."
The user will say the arrival airport name.

【０１８３】なお、この到着空港名認識ステートでも、
複数の空港名が認識される。In this arrival airport name recognition state,
Multiple airport names are recognized.

【０１８４】（２１９）到着空港名確認ステート復唱プロンプトで、到着空港名認識ステート２１８で認
識した複数の空港名のうち、確信度のもっとも高い空港
名を復唱して到着空港名の正否を確認する。ユーザの答
えが『はい』であれば認識結果確定部２２２に遷移し、
『いいえ』なら都道府県名認識ステート２２０に遷移す
る。(219) Arrival Airport Name Confirmation State At the repetition prompt, among the airport names recognized in the arrival airport name recognition state 218, the airport name with the highest certainty is repeated to confirm the correctness of the arrival airport name. . If the user's answer is "yes", the process proceeds to the recognition result determination unit 222,
If “No”, the state transits to the prefecture name recognition state 220.

【０１８５】（２２０）都道府県名認識ステート到着空港名確認ステート２１９で『いいえ』と答えたと
き、空港の存在する都道府県名を聞くものである。例え
ばユーザが『北見』と言ったにもかかわらず、「伊丹」
と認識された場合、「到着空港のある都道府県名をおっ
しゃってください」とシステムプロンプトが発話され、
ユーザは『ほっかいどう』と答えることになる。(220) Prefecture Name Recognition State When "No" is answered in the arrival airport name confirmation state 219, the name of the prefecture where the airport is located is heard. For example, despite the user saying "Kitami", "Itami"
Is recognized, a system prompt is spoken saying "Please state the name of the state or province where the arrival airport is located."
The user will answer "How are you?"

【０１８６】（２２１）空港名確認ステート都道府県名認識ステート２２０での認識結果と、到着空
港名認識ステート２１８で認識された複数の認識候補名
を利用して空港名の確認をとるものである。(221) Airport Name Confirmation State The airport name is confirmed using the recognition result in the prefecture name recognition state 220 and a plurality of recognition candidate names recognized in the arrival airport name recognition state 218. .

【０１８７】例えば、都道府県名認識ステート２２０で
認識された都道府県名が「ほっかいどう」で、到着空港
名認識ステート２１８で認識された認識候補のなかに
「きたみ」が存在するなら「北海道の北見空港でよろし
いでしょうか？」というシステムプロンプトが発話され
ることになる。これに対して、ユーザは『はい』、『い
いえ』で答えを返す。そして、『はい』の場合は認識結
果確定部２２２に遷移し、『いいえ』の場合は都道府県
名認識ステート２２０に戻る。For example, if the prefecture name recognized in the prefecture name recognition state 220 is “Hokkaido” and “Kitami” exists in the recognition candidates recognized in the arrival airport name recognition state 218, “ Is it OK at Kitami Airport in Hokkaido? " In response, the user answers “yes” or “no”. Then, in the case of “Yes”, the process transits to the recognition result determination unit 222, and in the case of “No”, the process returns to the prefectural name recognition state 220.

【０１８８】（２２２）認識結果確定部ユーザが運航状況を知りたい路線の到着空港名を確定す
る部分である。(222) Recognition result determination section This is a section for determining the arrival airport name of the route on which the user wants to know the operation status.

【０１８９】（２１２）運航状況案内部路線情報サービスを利用した場合は、その路線の発着状
況を発話する。(212) Operation Status Guidance Unit When the route information service is used, the status of the route is spoken.

【０１９０】以上が路線情報サービスを選択した場合の
処理の流れである。The above is the processing flow when the route information service is selected.

【０１９１】次に、図１３乃至図１６に示したフローチ
ャートを参照して本実施形態に係わる音声認識応答装置
における航空機発着案内サービスの処理手順を説明す
る。Next, with reference to the flow charts shown in FIGS. 13 to 16, the processing procedure of the aircraft departure and arrival guidance service in the voice recognition response apparatus according to the present embodiment will be described.

【０１９２】この処理では、まず利用サービスの選択が
行われる（ステップ２５１）。これは、以下のシステム
プロンプトによって開始される。「航空機発着案内サー
ビスです。」「このサービスは、お調べになりたい便
名、あるいは路線の発着空港をもとに、航空機の発着状
況をご案内するものです。」「便名、路線のどちらのサ
ービスをご利用されますか。」In this processing, first, a service to be used is selected (step 251). It is started by the following system prompt: "This is an aircraft departure and arrival service.""This service provides information on the status of aircraft based on the flight number you want to check or the airport at which the route departs and arrives." Do you use the service? "

【０１９３】これに対して、ユーザの応答があるが、次
に、このユーザの応答が認識可か否かが調べられる（ス
テップ２５２）。In response to this, there is a response from the user. Next, it is checked whether the response from the user is recognizable (step 252).

【０１９４】ここで、ユーザの応答が認識不可の場合は
（ステップ２５２でＮＯ）、ステップ２５３に進むが、
認識可の場合は（ステップ２５２でＹＥＳ）、認識結果
の確認が行われる（ステップ２５４）。If the response from the user cannot be recognized (NO in step 252), the flow advances to step 253.
When the recognition is possible (YES in step 252), the recognition result is confirmed (step 254).

【０１９５】ここでの、システムプロンプトは、「便名
サービスでよろしいでしょうか？」または「路線サービ
スでよろしいでしょうか？」となる。Here, the system prompt is "Is it OK with the flight number service?" Or "Is it OK with the route service?"

【０１９６】これに対して、ユーザが『はい』と答えた
場合は（ステップ２５４でＹＥＳ）、利用サービスが確
定するが（ステップ２５５）、『いいえ』と答えた場合
は（ステップ２５４でＮＯ）、ステップ２５３に進む。On the other hand, if the user answers "yes" (YES in step 254), the service to be used is determined (step 255), but if the user answers "no" (NO in step 254). , And proceed to step 253.

【０１９７】ステップ２５２でＮＯの場合およびステッ
プ２５４でＮＯの場合、ステップ２５３に進むが、ここ
では利用サービスの問いかけが行われる。In the case of NO in step 252 and in the case of NO in step 254, the process proceeds to step 253, where an inquiry about the service to be used is performed.

【０１９８】ここでの、システムプロンプトは、「便名
ですね？」または「路線ですね？」となる。Here, the system prompt is “flight name?” Or “route?”.

【０１９９】これに対して、ユーザは応答の発話をする
が、次にこの応答の発話が認識可か否かが調べられる
（ステップ２５６）。On the other hand, the user utters a response. Next, it is checked whether the utterance of the response is recognizable (step 256).

【０２００】ここで、認識不可の場合は（ステップ２５
６でＮＯ）、再びステップ２５３の処理に戻るが認識可
の場合は（ステップ２５６でＹＥＳ）、利用サービスの
確定処理が行われる（ステップ２５５）。Here, if the recognition is not possible (step 25)
(NO at 6), returns to the process at step 253, but if the recognition is possible (YES at step 256), the process of determining the used service is performed (step 255).

【０２０１】すなわち、「便名ですね？」という問いに
対して、ユーザが『はい』と答えた場合は、利用サービ
スは“便名”と確定する。That is, if the user answers “Yes” to the question “Flight name?”, The service to be used is determined to be “Flight name”.

【０２０２】また、「便名ですね？」という問いに対し
て、ユーザが『いいえ』と答えた場合は、利用サービス
は“路線”と確定する。If the user answers "No" to the question "Flight name?", The service to be used is determined to be "Route".

【０２０３】また、「路線ですね？」という問いに対し
て、ユーザが『はい』と答えた場合は、利用サービスは
“路線”と確定する。If the user answers "yes" to the question "is it a route?", The service to be used is determined to be "route".

【０２０４】また、「路線ですね？」という問いに対し
て、ユーザが『いいえ』と答えた場合は、利用サービス
は“便名”と確定する。If the user answers "No" to the question "is it a route?", The service to be used is determined to be "flight name".

【０２０５】以上の処理で利用サービスが確定すると、
次に、ユーザの発声量が調べられる（ステップ２５
７）。これは、これまでの処理で何度かユーザの発話を
獲得しているので、この獲得データに基づいてユーザの
発声量を調べる（ステップ２５７）。When the service to be used is determined by the above processing,
Next, the utterance amount of the user is checked (step 25).
7). In this case, since the utterance of the user has been acquired several times in the processing so far, the utterance amount of the user is checked based on the acquired data (step 257).

【０２０６】次に、ステップ２５７の処理で得られたユ
ーザ発声量に基づいてシステムプロンプトの音量調節を
行う（ステップ２５８）。Next, the volume of the system prompt is adjusted based on the user utterance amount obtained in the process of step 257 (step 258).

【０２０７】これは、ユーザの発声量に基づいて作成さ
れたシステムプロンプト音量決定テーブルやユーザの発
声量に基づいて作成されたシステムプロンプト音量算出
式に基づいて行うが、この処理は後に詳述する。This is performed based on a system prompt sound volume determination table created based on the user's utterance amount or a system prompt sound volume calculation formula created based on the user's utterance amount. This processing will be described in detail later. .

【０２０８】次に、案内サービスの種別が調べられ、便
名サービスか否かが調べられる（ステップ２５９）。こ
れは、すでにステップ２５５の処理で利用サービスの種
別情報を得ているので、この情報をもとに判別できる。Next, the type of the guidance service is checked to determine whether it is a flight number service (step 259). This can be determined based on this information because the type information of the used service has already been obtained in the process of step 255.

【０２０９】ここで、便名サービスの場合は（ステップ
２５９でＹＥＳ）、図１４のステップ２６０に進み、路
線サービスの場合は（ステップ２５９でＮＯ）、図１５
のステップ２７０に進む。Here, in the case of the flight number service (YES in step 259), the flow proceeds to step 260 in FIG. 14, and in the case of the route service (NO in step 259), the flow proceeds to FIG.
Proceed to step 270.

【０２１０】まず、図１４を参照して便名サービスにつ
いて説明する。First, the flight number service will be described with reference to FIG.

【０２１１】この処理では、まず便名を取得する（ステ
ップ２６０）。In this process, first, a flight number is obtained (step 260).

【０２１２】この場合のシステムプロンプトは、「お調
べになりたい便名をおっしゃってください。」となる。The system prompt in this case is "Please tell us the flight number you want to look up."

【０２１３】これに対して、ユーザは例えば『いちいち
しちびん』の応答をしたとする。[0213] On the other hand, it is assumed that the user has made a response of, for example, "ichiichishibin".

【０２１４】すると、次にこの応答の発話が認識可か否
かが調べられる（ステップ２６１）。Then, it is checked whether or not the utterance of this response can be recognized (step 261).

【０２１５】ここで、認識不可の場合は（ステップ２６
１でＮＯ）、再びステップ２６０の処理に戻るが、認識
可の場合は（ステップ２６１でＹＥＳ）、便名の確認処
理が行われる（ステップ２６２）。Here, if the recognition is not possible (step 26)
(1; NO), the process returns to step 260, but if recognition is possible (YES in step 261), a flight number confirmation process is performed (step 262).

【０２１６】ところで、この場合の復唱プロンプトは、
「いちいちしちびんですね？」ではなく、「いちいちな
なびんですね？」となる。なお、このような場合の復唱
プロンプトの作成方法は後に詳述する。By the way, the repetition prompt in this case is
Instead of "Is it a little bottle?", "Is it a little bottle?" A method of creating a repeat prompt in such a case will be described later in detail.

【０２１７】これに対して、ユーザは応答の発話をする
が、この場合、応答の発話が認識できる場合とできない
場合があり、応答の発話が認識できる場合でも『はい』
と『いいえ』の応答がある。On the other hand, the user utters a response. In this case, there are cases where the utterance of the response can be recognized and cases where the utterance of the response cannot be recognized.
"No" response.

【０２１８】ここで、ユーザの『はい』という応答が確
認された場合は（ステップ２６２でＹＥＳ）、ステップ
２６３に進んで、便名の確定処理が行われる。Here, when the response of “Yes” of the user is confirmed (YES in step 262), the flow proceeds to step 263, and the flight number is determined.

【０２１９】そして、該当便の運航状況案内をする（ス
テップ２６４）。これは、運航状況をデータベースから
取得し、例えば次のような案内情報を作成した後、ユー
ザに発話する。「いちいちなな便は、隠岐空港を１５時
に出発し、伊丹空港に１６時３０分に到着しました。」Then, the operation status of the corresponding flight is notified (step 264). In this method, the operation status is acquired from a database, and the following guidance information is created, for example, and then spoken to the user. "Every flight left Oki Airport at 15:00 and arrived at Itami Airport at 16:30."

【０２２０】一方、ステップ２６２の処理で、ユーザの
『いいえ』という応答が確認された場合、および応答の
発話が認識できない場合は、認識結果のチェックが行わ
れ、認識結果に１または７が含まれるか否かが調べられ
る（ステップ２６５）。On the other hand, if the response of “No” from the user is confirmed in the process of step 262 and the utterance of the response cannot be recognized, the recognition result is checked, and 1 or 7 is included in the recognition result. It is checked whether or not it is performed (step 265).

【０２２１】ここで、認識結果に１または７が含まれな
い場合は（ステップ２６５でＮＯ）、ステップ２６０の
便名取得処理に戻るが、認識結果に１または７が含まれ
る場合は（ステップ２６５でＹＥＳ）、７の発音例プロ
ンプトの作成処理を行う（ステップ２６６）。これは、
“しち”という発音は“いち”という発音と似ていて誤
認識しやすいからである。Here, if 1 or 7 is not included in the recognition result (NO in step 265), the process returns to the flight number acquisition process in step 260, but if 1 or 7 is included in the recognition result (step 265). Then, the process of creating a pronunciation example prompt of 7 is performed (step 266). this is,
This is because the pronunciation of “Shi” is similar to the pronunciation of “Ichi” and is easily misrecognized.

【０２２２】そこで、次のようなシステムプロンプトを
作成発話する。「数字の“７”を発音する場合は“な
な”とおっしゃってください。」Then, the following system prompt is created and spoken. "If you pronounce the number" 7, "say" Nana. "

【０２２３】そして、ステップ２６０の便名取得処理に
戻る。Then, the flow returns to step 260 for obtaining the flight number.

【０２２４】以上が便名サービスの内容である。The above is the contents of the flight number service.

【０２２５】次に、図１５および図１６を参照して路線
サービスについて説明する。Next, the route service will be described with reference to FIG. 15 and FIG.

【０２２６】この処理では、まず出発空港名を取得する
（ステップ２７０）。In this processing, first, the departure airport name is obtained (step 270).

【０２２７】この場合のシステムプロンプトは、「お調
べになりたい路線の出発空港名をおっしゃってくださ
い。」となる。In this case, the system prompt is "Please tell us the departure airport name of the route you want to check."

【０２２８】これに対して、ユーザが例えば『いたみ』
の応答をしたとする。[0228] On the other hand, for example, if the user
Suppose you respond

【０２２９】すると、次にこの応答の発話が認識可か否
かが調べられる（ステップ２７１）。Then, it is checked whether or not the utterance of the response can be recognized (step 271).

【０２３０】ここで、認識不可の場合は（ステップ２７
１でＮＯ）、再びステップ２７０の処理に戻るが、認識
可の場合は（ステップ２７１でＹＥＳ）、ステップ２７
２に進んで空港名の確認処理が行われる。Here, if the recognition is not possible (step 27)
1 is NO), the process returns to step 270 again, but if recognition is possible (YES in step 271), step 27
Proceeding to step 2, the airport name is confirmed.

【０２３１】なお、ステップ２７０では、ユーザからの
応答に基づいて出発空港についての認識が取得される
が、この場合複数の認識候補がその確信度の順番に取得
され、認識空港名候補として記憶される（ステップ２７
４）。このステップ２７４で記憶された認識空港名候補
は、後述するようにステップ２７７の処理で利用される
ことがある。In step 270, recognition of the departure airport is obtained based on the response from the user. In this case, a plurality of recognition candidates are obtained in the order of their certainty factors and stored as recognized airport name candidates. (Step 27
4). The recognized airport name candidate stored in step 274 may be used in the process of step 277 as described later.

【０２３２】なお、ステップ２７２の処理で確認される
のは、上記のごとくして認識された認識候補のうち、確
信度の1番高い空港名についてである。It is to be noted that what is confirmed in the process of step 272 is the airport name having the highest certainty factor among the recognition candidates recognized as described above.

【０２３３】ステップ２７２の確認処理における復唱プ
ロンプトは、「伊丹空港ですね？」となる。The repetition prompt in the confirmation process of step 272 is "Itami Airport?"

【０２３４】これに対して、ユーザは応答の発話をする
が、この場合、ユーザの『はい』という応答が確認され
た場合は（ステップ２７２でＹＥＳ）、ステップ２７３
に進んで、出発空港名の確定処理が行われる。On the other hand, the user utters a response. In this case, if the user responds “yes” (YES in step 272), the process proceeds to step 273.
Then, the departure airport name is determined.

【０２３５】一方、ユーザは『いたみ』と言ったのに装
置側が誤認識して復唱プロンプトが例えば「北見空港で
すね？」となった場合は、ステップ２７２の処理では、
ユーザの答えは『いいえ』となり（ステップ２７２でＮ
Ｏ）、この場合は都道府県名の取得処理を行う（ステッ
プ２７５）。[0235] On the other hand, if the user says "Itami" but the device side misrecognizes and the repetition prompt is, for example, "Kitami Airport?"
The user's answer is "No" (N in step 272).
O) In this case, a process of acquiring the name of the prefecture is performed (step 275).

【０２３６】この場合のシステムプロンプトは、「出発
空港のある都道府県名をおっしゃってください。」とな
る。In this case, the system prompt is "Please state the name of the prefecture where the departure airport is located."

【０２３７】これに対して、ユーザが例えば『おおさ
か』の応答をしたとする。[0237] In response to this, it is assumed that the user has responded, for example, "Osaka".

【０２３８】すると、次にこの応答の発話が認識可か否
かが調べられる（ステップ２７６）。Then, it is checked whether or not the utterance of this response can be recognized (step 276).

【０２３９】ここで、認識不可の場合は（ステップ２７
６でＮＯ）、再びステップ２７５の処理に戻るが、認識
可の場合は（ステップ２７６でＹＥＳ）、ステップ２７
７に進んで空港名と都道府県名の整合処理が行われる。Here, if the recognition is not possible (step 27
(NO at 6), returns to the process of step 275 again, but if recognition is possible (YES at step 276), step 27
Proceeding to 7, the matching process between the airport name and the prefecture name is performed.

【０２４０】ここでは、ステップ２７４の処理で記憶し
た複数の認識空港名候補とステップ２７５の処理で取得
した都道府県名を整合させて空港名の特定処理を行う。Here, the airport name identification processing is performed by matching the plurality of recognized airport name candidates stored in the processing in step 274 with the prefecture names acquired in the processing in step 275.

【０２４１】例えば、ステップ２７４の処理で記憶した
認識空港名候補の中に「いたみ」があり、ステップ２７
５の処理で記憶した都道府県名が「おおさか」だとする
と、出発空港名は「伊丹」と特定できる。For example, if the recognized airport name candidate stored in the processing of step 274 is “damaged”,
If the prefecture name stored in the process of step 5 is “Osaka”, the departure airport name can be specified as “Itami”.

【０２４２】こうして、出発空港名が特定されると、次
にその確認処理が行われる（ステップ２７８）。When the departure airport name is specified in this way, the confirmation processing is performed next (step 278).

【０２４３】この場合のシステムプロンプトは、「大阪
にある伊丹空港でよろしいでしょうか？」となる。The system prompt in this case is "Is it OK at Itami Airport in Osaka?"

【０２４４】ここで、ユーザが『はい』の発話をすると
（ステップ２７８でＹＥＳ）、ステップ２７３に進ん
で、出発空港名の確定処理が行われる。Here, if the user utters “Yes” (YES in step 278), the flow advances to step 273 to execute a process for determining the departure airport name.

【０２４５】一方、『いいえ』の発話をすると（ステッ
プ２７８でＮＯ）、再びステップ２７５の処理に戻る。On the other hand, if "No" is uttered (NO in step 278), the process returns to step 275 again.

【０２４６】こうして、出発空港名が確認されると、次
に図１６の到着空港名取得処理を行う（ステップ２８
０）。When the departure airport name is confirmed in this way, the arrival airport name acquisition processing shown in FIG. 16 is performed (step 28).
0).

【０２４７】この場合のシステムプロンプトは、「到着
空港名をおっしゃってください。」となる。In this case, the system prompt is "Please state the arrival airport name."

【０２４８】これに対して、ユーザが例えば『おき』の
応答をしたとする。[0248] In response to this, it is assumed that the user makes a response of "Oki", for example.

【０２４９】すると、次にこの応答の発話が認識可か否
かが調べられる（ステップ２８１）。Then, it is checked whether or not the utterance of this response can be recognized (step 281).

【０２５０】ここで、認識不可の場合は（ステップ２８
１でＮＯ）、再びステップ２８０の処理に戻るが、認識
可の場合は（ステップ２８１でＹＥＳ）、ステップ２８
２に進んで空港名の確認処理が行われる。Here, if the recognition is not possible (step 28)
1 is NO), the process returns to step 280 again.
Proceeding to 2, the airport name is confirmed.

【０２５１】なお、ステップ２８０では、ユーザからの
応答に基づいて到着空港についての認識が取得される
が、この場合複数の認識候補がその確信度の順番に取得
され、認識空港名候補として記憶される（ステップ２８
６）。このステップ２８６で記憶された認識空港名候補
は、後述するようにステップ２８９の処理で利用される
ことがある。In step 280, recognition of the arrival airport is obtained based on the response from the user. In this case, a plurality of recognition candidates are obtained in the order of their certainty factors and stored as recognized airport name candidates. (Step 28
6). The recognized airport name candidate stored in step 286 may be used in the process of step 289 as described later.

【０２５２】なお、ステップ２８２の処理で確認される
のは、上記のごとくして認識された認識候補のうち、確
信度の１番高い空港名についてである。It is to be noted that what is confirmed in the process of step 282 is an airport name having the highest certainty factor among the recognition candidates recognized as described above.

【０２５３】ステップ２８２の確認処理における復唱プ
ロンプトは、「隠岐空港ですね？」となる。The repetition prompt in the confirmation processing of step 282 is "Oki Airport?"

【０２５４】これに対して、ユーザは応答の発話をする
が、この場合、ユーザの『はい』という応答が確認され
た場合は（ステップ２８２でＹＥＳ）、ステップ２８３
に進んで、到着空港名の確定処理が行われる。On the other hand, the user utters a response. In this case, if the user responds “yes” (YES in step 282), the process proceeds to step 283.
Then, the arrival airport name is determined.

【０２５５】一方、ユーザは『おき』と言ったのに装置
側が誤認識して復唱プロンプトが例えば「壱岐空港です
ね？」となった場合は、ステップ２８２の処理では、ユ
ーザの答えは『いいえ』となり（ステップ２８２でＮ
Ｏ）、この場合は都道府県名の取得処理を行う（ステッ
プ２８７）。On the other hand, if the user says "Oki" but the device side misrecognizes and the repetition prompt is, for example, "Iki Airport?", In the process of step 282, the user's answer is "No". (N at step 282)
O) In this case, a process of acquiring the name of the prefecture is performed (step 287).

【０２５６】この場合のシステムプロンプトは、「到着
空港のある都道府県名をおっしゃってください。」とな
る。The system prompt in this case is "Please state the name of the prefecture where the arrival airport is located."

【０２５７】これに対して、ユーザが例えば『しまね』
の応答をしたとする。[0257] On the other hand, for example, if the user
Suppose you respond

【０２５８】すると、次にこの応答の発話が認識可か否
かが調べられる（ステップ２８８）。Then, it is checked whether or not the utterance of the response is recognizable (step 288).

【０２５９】ここで、認識不可の場合は（ステップ２８
８でＮＯ）、再びステップ２８７の処理に戻るが、認識
可の場合は（ステップ２８８でＹＥＳ）、ステップ２８
９に進んで空港名と都道府県名の整合処理が行われる。Here, if the recognition is not possible (step 28)
8 is NO), the process returns to step 287 again, but if recognition is possible (YES in step 288), step 28
Proceeding to 9, the matching process of the airport name and the prefecture name is performed.

【０２６０】そして、ステップ２８６の処理で記憶した
複数の認識空港名候補とステップ２８７の処理で取得し
た都道府県名を整合させて空港名の特定処理を行う。Then, the airport name specification processing is performed by matching the plurality of recognized airport name candidates stored in the processing in step 286 with the prefecture names obtained in the processing in step 287.

【０２６１】例えば、ステップ２８６の処理で記憶した
認識空港名候補の中に「おき」があり、ステップ２８７
の処理で記憶した都道府県名が「しまね」だとすると、
到着空港名は「隠岐」と特定できる。For example, the recognition airport name candidate stored in the processing of step 286 includes “OK”, and step 287
If the name of the prefecture memorized in the processing of
The arrival airport name can be specified as "Oki".

【０２６２】こうして、到着空港名が特定されると、次
にその確認処理が行われる（ステップ２９０）。When the arrival airport name is specified in this way, the confirmation processing is performed next (step 290).

【０２６３】この場合のシステムプロンプトは、「島根
にある隠岐空港でよろしいでしょうか？」となる。The system prompt in this case is "Is it OK at Oki Airport in Shimane?"

【０２６４】ここで、ユーザが『はい』の発話をすると
（ステップ２９０でＹＥＳ）、ステップ２８３に進ん
で、到着空港名の確定処理が行われる。Here, if the user utters "Yes" (YES in step 290), the flow advances to step 283 to execute a process for determining the arrival airport name.

【０２６５】一方、『いいえ』の発話をすると（ステッ
プ２９０でＮＯ）、再びステップ２８７の処理に戻る。On the other hand, if "No" is uttered (NO in step 290), the process returns to step 287 again.

【０２６６】以上の処理により、出発空港名と到着空港
名が確定されると、情報提供する路線名の確定処理が行
われ（ステップ２８４）、続いて該当路線の運航状況が
案内される（ステップ２８５）。これは、該当路線の運
航状況をデータベースから取得し、次のようなシステム
プロンプトを作成した後、ユーザに発話する。「１１７
便は、伊丹空港を９時に出発し、隠岐空港に１０時０５
分に到着しました。」「２６５便は、伊丹空港を１２時
に出発し、隠岐空港に１３時０５分に到着する予定で
す。」「２８５便は、伊丹空港を１６時に出発し、隠岐
空港に１７時０５分に到着する予定です。」When the departure airport name and the arrival airport name are determined by the above processing, the route name for which information is provided is determined (step 284), and the operation status of the corresponding route is guided (step 284). 285). In this method, the operation status of the route is obtained from a database, the following system prompt is created, and then the user is spoken. "117
The flight departs from Itami Airport at 9:00 and arrives at Oki Airport at 10:05
Arrived in minutes. "Flight 265 departs from Itami Airport at 12:00 and arrives at Oki Airport at 13:05.""Flight 285 departs from Itami Airport at 16:00 and arrives at Oki Airport at 17:05." intend to do something."

【０２６７】以上が、本実施形態に係わる音声認識応答
装置における航空機発着案内サービスの処理手順であ
る。The above is the processing procedure of the aircraft departure and arrival guidance service in the voice recognition response device according to the present embodiment.

【０２６８】次に、図１３のステップ２５８のシステム
プロンプトの音量調整の内容を図１７及び図１８を参照
して説明する。Next, the contents of the volume adjustment of the system prompt in step 258 of FIG. 13 will be described with reference to FIGS.

【０２６９】すでに述べたように、本実施形態ではユー
ザの発声量に応じてシステムプロンプトの音量調節をす
る。As described above, in the present embodiment, the volume of the system prompt is adjusted in accordance with the utterance amount of the user.

【０２７０】ところで、この場合、ユーザの発声量が大
きければシステムプロンプトの音量も大きくし、ユーザ
の発声量が小さければシステムプロンプトの音量も小さ
くする。In this case, the volume of the system prompt is increased if the user's utterance is large, and the volume of the system prompt is also reduced if the user's utterance is small.

【０２７１】これは、ユーザの発声量が小さいのは、
（１）システムプロンプトの音量が大きすぎて受話器を
離している、（２）ユーザの癖で受話器を離し気味にし
ている、（３）声がもともと小さい、等の原因が考えら
れる。[0271] This is because the utterance amount of the user is small.
Possible causes include: (1) the volume of the system prompt is too loud and the receiver is released, (2) the user is likely to release the receiver due to his habit, and (3) the voice is originally low.

【０２７２】そこで、システムプロンプトの音量を小さ
くすることにより、若干聞き取りにくい状況を生み出
し、受話器を顔に近づけるように働きかける。また、人
間には聞こえてくる声が小さいと自分の声を大きくする
という特性があるので、この特性を利用してユーザの発
声量が小さければシステムプロンプトの音量も小さくす
る。Therefore, by lowering the volume of the system prompt, a situation in which the system prompt is hard to hear is created, and the receiver is made to approach the face. Also, since a human has a characteristic that his voice is louder when the voice heard is low, the volume of the system prompt is also reduced if the user's utterance amount is small by using this characteristic.

【０２７３】一方、ユーザの声量が大きいのは、（１）
システムプロンプトの音量が小さすぎて受話器を顔に近
づけすぎている、（２）声がもともと大きい、等の原因
が考えられる。On the other hand, the reason why the user's voice volume is large is (1)
Possible causes are that the volume of the system prompt is too low and the receiver is too close to the face, and (2) the voice is already loud.

【０２７４】そこで、システムプロンプトの音量を大き
くすることで、受話器を顔から離し気味にさせるように
働きかける。Therefore, by increasing the volume of the system prompt, it works to make the receiver slightly away from the face.

【０２７５】このため、本実施形態では、図１７に示す
ごとく、ユーザの発声量とシステムプロンプト音量との
関係式を用意しておくとか、図１８に示すごとく、ユー
ザの発声量とシステムプロンプト音量との関係テーブル
２９０を用意しておく。For this reason, in this embodiment, as shown in FIG. 17, a relational expression between the user's utterance amount and the system prompt volume is prepared, or as shown in FIG. 18, the user's utterance amount and the system prompt volume are prepared. Is prepared in advance.

【０２７６】図１７および図１８においては、いずれも
ユーザの発声量が大きければシステムプロンプトの音量
も大きくし、ユーザの発声量が小さければシステムプロ
ンプトの音量も小さくするようにしている。In FIGS. 17 and 18, the volume of the system prompt is increased when the volume of the user's voice is large, and the volume of the system prompt is also reduced when the volume of the user's voice is small.

【０２７７】次に、図１４のステップ２６２の認識結果
確認処理における復唱プロンプトの作成方法について説
明する。これは、桁なしの数字プロンプトを作成する場
合である。Next, a method of creating a repeat prompt in the recognition result confirmation processing in step 262 of FIG. 14 will be described. This is the case when creating a numeric prompt without digits.

【０２７８】この処理では、音声ファイルに、例えばｗ
ａｖ形式で、（１）０．ｗａｖ（ぜろ）〜９．ｗａｖ（きゅう）の１
桁のプロンプト（２）００．ｗａｖ（ぜろぜろ）、０１．ｗａｖ（ぜろ
いち）、０２．ｗａｖ（ぜろに）……５３．ｗａｖ（ご
ーさん）……９９．ｗａｖ（きゅうきゅう）の２桁のプ
ロンプトの計１１０個のプロンプトを用意しておく。そして、こ
の１１０個のプロンプトを用いて復唱プロンプトを作成
する。In this processing, for example, w
av format, (1) 0. wav (zero) -9. 1 of wav (kyu)
Digit prompt (2) 00. wav (zero), 01. wav (Zeroichi), 02. wav (53) 53. wav (go-san) ... 99. Prepare a total of 110 prompts, two-digit prompts for wav (kyukyu). Then, a repeat prompt is created using the 110 prompts.

【０２７９】ここで、まず偶数桁（２ｎ桁、ｎは整数）
の復唱プロンプトを作成する場合は、上記（２）の２桁
のプロンプトを使用して復唱プロンプトを作成する。す
なわち、ｎ個の２桁のプロンプトを利用して復唱プロン
プトを作成する。Here, first, an even digit (2n digits, n is an integer)
When the repetition prompt is created, the repetition prompt is created using the two-digit prompt of the above (2). That is, a repetition prompt is created using n two-digit prompts.

【０２８０】例えば、“３４５６１２”という復唱プロ
ンプトを作成する場合は、３４．ｗａｖと、５６．ｗａ
ｖと、１２．ｗａｖをつなぎ合わせる。For example, to create a repeat prompt “345612”, wav, 56. wa
v, and 12. Connect the wavs.

【０２８１】また、奇数桁（（２ｎ＋１）桁）の復唱プ
ロンプトを作成する場合は、上記（２）の２桁のプロン
プトをｎ個使用し、上記（１）の１桁のプロンプトを１
個使用して復唱プロンプトを作成する。When generating an odd-numbered ((2n + 1) -digit) repetition prompt, n 2-digit prompts in (2) are used and 1-digit prompts in (1) are replaced by 1.
Use this to create a repeat prompt.

【０２８２】例えば、“１０３８８５９”という復唱プ
ロンプトを作成する場合は、１０．ｗａｖと、３８．ｗ
ａｖと、８５．ｗａｖと、９．ｗａｖをつなぎ合わせ
る。For example, to create a repeat prompt “1038859”, wav, 38. w
av, 85. wav, and 9. Connect the wavs.

【０２８３】ここで、航空機の便名復唱プロンプトを考
えると、便名は３桁で表されるので、１桁と２桁のプロ
ンプトを１つずつ使用することになる。Here, considering the flight number repeat prompt of the aircraft, the flight number is represented by three digits, so that one-digit and two-digit prompts are used one by one.

【０２８４】例えば、“０３５”という復唱プロンプト
を作成する場合は、０３．ｗａｖと、５．ｗａｖをつな
ぎ合わせる。For example, to create a repeat prompt of “035”, use 03. wav, 5. Connect the wavs.

【０２８５】なお、奇数桁の復唱プロンプトを作成する
場合、１桁のプロンプトを１個使用する必要があるが、
この場合、上記の例では、例えば、“１０３８８５９”
という復唱プロンプトを作成する場合、１０．ｗａｖ
と、３８．ｗａｖと、８５．ｗａｖと、９．ｗａｖをつ
なぎ合わせて、１桁のプロンプトを最下位桁に使用し
た。When generating an odd-digit repetition prompt, it is necessary to use one one-digit prompt.
In this case, in the above example, for example, “1038859”
10. When creating a repeat prompt, wav
And 38. wav, 85. wav, and 9. The wavs were tied together and the one digit prompt was used for the least significant digit.

【０２８６】しかし、その他の使用例として、１桁のプ
ロンプトを最上位桁に使用することもできる。However, as another example of use, a one-digit prompt can be used for the most significant digit.

【０２８７】例えば、上記と同様な“１０３８８５９”
という復唱プロンプトを作成する場合、１．ｗａｖと、
０３．ｗａｖと、８８．ｗａｖと、５９．ｗａｖをつな
ぎ合わせる。For example, “1038859” similar to the above
When creating a repetition prompt, wav,
03. wav, 88. wav, 59. Connect the wavs.

【０２８８】また、１桁のプロンプトを最上位桁にも最
下位桁にも使用しないという用法もある。There is also a usage method in which a one-digit prompt is not used for the most significant digit or the least significant digit.

【０２８９】例えば、“１０３８８５９”という復唱プ
ロンプトを作成する場合、１０．ｗａｖと、３．ｗａｖ
と、８８．ｗａｖと、５９．ｗａｖをつなぎ合わせる。For example, to create a repeat prompt “1038859” wav, 3. wav
And 88. wav, 59. Connect the wavs.

【０２９０】次に、図１５のステップ２７７及び図１６
のステップ２８９の空港名と都道府県名の整合処理につ
いて説明する。これは、すでに述べたように、出発空港
名や到着空港名を誤認識した場合に、すでに取得してい
る認識空港名候補と新たに取得した都道府県名の整合を
とり、正しいと思われる空港名を決定するものである。Next, step 277 in FIG.
The process of matching the airport name and the prefecture name in step 289 will be described. This is because, as mentioned earlier, if the departure airport name or arrival airport name is misrecognized, the candidate airport name that has already been acquired is matched with the newly acquired prefecture name, and the airport that is deemed to be correct Determine the name.

【０２９１】以下、出発空港名を誤認識した場合につい
て図１９を参照しながら説明する。Hereinafter, a case where the departure airport name is erroneously recognized will be described with reference to FIG.

【０２９２】この処理では、まず、図１５の出発空港名
取得処理（ステップ２７０）の際、認識空港名候補も取
得する（ステップ２７４）。図１９において、２９１は
このとき取得した認識空港名候補である。認識空港名候
補２９１では、ユーザの発話に基づいて、優先順位をつ
けて、“きたみ”、“いたみ”、“いわみ”が記憶され
ている。一方、装置には、都道府県別に空港と所在地の
関係テーブルが格納されており、例えば北見空港のある
北海道については空港と所在地の関係テーブル２９２
が、伊丹空港のある大阪府についてはテーブル２９３
が、石見空港のある鳥取県についてはテーブル２９４が
格納されている。In this process, first, in the departure airport name acquisition process (step 270) of FIG. 15, a recognized airport name candidate is also acquired (step 274). In FIG. 19, reference numeral 291 denotes a recognized airport name candidate acquired at this time. In the recognized airport name candidate 291, “Kitami”, “Itami”, and “Iwami” are stored in a priority order based on the utterance of the user. On the other hand, the apparatus stores an airport and location relationship table for each prefecture. For example, in Hokkaido where Kitami Airport is located, an airport and location relationship table 292 is stored.
However, in Osaka prefecture where Itami Airport is located, table 293
However, a table 294 is stored for Tottori prefecture where Iwami Airport is located.

【０２９３】ここで、誤認識があった場合はユーザに対
して空港がある都道府県名を発話するよう求めて（ステ
ップ２７５）、“大阪”という発話２９５があったとす
る。Here, if there is an erroneous recognition, it is assumed that the user is asked to speak the name of the prefecture where the airport is located (step 275), and there is an utterance 295 "Osaka".

【０２９４】すると大阪府についてのテーブル２９３が
参照され、認識空港名候補２９１にのっていて、かつテ
ーブル２９３に登録されている空港名が検索される。す
なわち、空港名と都道府県名の整合処理が行われる。[0294] Then, the table 293 for Osaka is referred to, and the airport name that is on the recognized airport name candidate 291 and registered in the table 293 is searched. That is, the matching process between the airport name and the prefecture name is performed.

【０２９５】その結果、ユーザの言った空港名は伊丹空
港であることが分かる。As a result, it is found that the airport name that the user has said is Itami Airport.

【０２９６】ところで、すでに述べたように、ユーザと
の対話において、ユーザの発話の認識不可や誤認識が発
生する場合があるが、このような認識不可や誤認識は何
度も繰り返してはいけない。By the way, as described above, in the dialogue with the user, the utterance of the user may be unrecognizable or erroneous recognition may occur, but such unavailability or erroneous recognition must not be repeated many times. .

【０２９７】ここで、認識不可とは誤認識する以前の問
題で、ユーザの声が聞き取れないとかそのステートで用
意されていない発話があった場合に発生する。また、誤
認識は、ユーザの話し方に問題がある場合に起こりやす
い。[0297] Here, "unrecognizable" is a problem before erroneous recognition, which occurs when the voice of the user cannot be heard or there is an utterance not prepared in the state. In addition, erroneous recognition is likely to occur when there is a problem in how the user speaks.

【０２９８】しかし、認識不可や誤認識が起きている理
由は一般にはユーザには分からないため、言い方を変え
るなどの工夫ができない。そのため、同じような言い方
を繰り返し、認識不可、誤認識を何度も繰り返す。However, the reason why recognition is not possible or erroneous recognition has occurred is generally not known to the user, so that it is not possible to devise ways of changing the way of saying. Therefore, the same wording is repeated, and unrecognition and erroneous recognition are repeated many times.

【０２９９】そこで、本実施形態では、規定回数認識不
可や誤認識を繰り返した場合は、それまでに使用したユ
ーザの発話を復唱プロンプトに用い、これによってユー
ザの欠点、すなわち、早口すぎる、声が小さい、発音が
不明瞭等をユーザに気づかせることによって認識不可や
誤認識を繰り返さないようにする。Therefore, in the present embodiment, in the case where the recognition cannot be performed the specified number of times or the recognition has been repeated, the utterance of the user who has been used up to that time is used for the repeat prompt, so that the drawback of the user, that is, the voice that is too fast, By making the user aware of small, unclear pronunciation, etc., recognizing failure or erroneous recognition is prevented from being repeated.

【０３００】図２０はこの場合の処理手順を示すフロー
チャートであり、到着空港名を認識する場合である。FIG. 20 is a flowchart showing a processing procedure in this case, in which the arrival airport name is recognized.

【０３０１】この処理では、まず到着空港名の認識処理
に先立って、認識不可等の回数を示す値ｎを初期化する
（ステップ３００）。In this process, first, prior to the process of recognizing the arrival airport name, a value n indicating the number of times of non-recognition or the like is initialized (step 300).

【０３０２】次に、ｎの値が規定回数αより小さいこと
を確認し（ステップ３０１）、さらにｎの値を１インク
リメントする（ステップ３０２）。Next, it is confirmed that the value of n is smaller than the specified number of times α (step 301), and the value of n is incremented by 1 (step 302).

【０３０３】つづいて、到着空港名の認識処理が行われ
るが、このときのシステムプロンプトは、「到着空港名
をおっしゃってください。」となる。Subsequently, the arrival airport name is recognized, and the system prompt at this time is "Please state the arrival airport name."

【０３０４】ここで、ユーザの『いたみ』とか『たみ』
の発話が認識できたら認識可とする。『たみ』が認識で
きても認識可とするのは、『い』という発音は弱い発音
であり、この言葉で認識トリガ（認識開始）がかからな
いことが多い。そのため、『た』という言葉をトリガと
して『たみ』を認識対象とする。Here, the user's “damaged” or “damaged”
If the utterance is recognized, it can be recognized. Even if "Tami" can be recognized, what can be recognized is that the pronunciation of "i" is a weak pronunciation, and in many cases a recognition trigger (recognition start) is not activated with this word. Therefore, the word “ta” is used as a trigger to recognize “tami”.

【０３０５】一方、認識不可は、ユーザが無言であると
か、そのときのステートに無関係な『はい』等の発話が
あった場合に発生する。On the other hand, the recognition failure occurs when the user is silent or when there is an utterance such as "yes" unrelated to the state at that time.

【０３０６】次に、ステップ３０４では、認識可か否か
が調べられ、認識不可の場合は（ステップ３０４でＮ
Ｏ）、ステップ３０１の処理に戻る。Next, in step 304, it is checked whether or not recognition is possible. If recognition is not possible, (N in step 304
O), the process returns to the step 301.

【０３０７】一方、認識可の場合は（ステップ３０４で
ＹＥＳ）、到着空港名の確認処理が行われる（ステップ
３０５）。On the other hand, if the recognition is possible (YES in step 304), the arrival airport name is confirmed (step 305).

【０３０８】このときのシステムプロンプトは、認識対
象データが『いたみ』の場合は、（１）「伊丹空港でよろしいでしょうか？」となる。If the data to be recognized is "damaged", the system prompt at this time is (1) "Is it OK at Itami Airport?"

【０３０９】また、認識対象データが『たみ』の場合
は、『たみ』を北見と認識した場合、（２）「北見空港でよろしいでしょうか？」となる。When the recognition target data is "Tami", when "Tami" is recognized as Kitami, (2) "Is it OK at Kitami Airport?"

【０３１０】ここで、ユーザは（１）のシステムプロン
プトに対しては『はい』と答えるので（ステップ３０５
でＹＥＳ）、続いて認識結果確定処理が行われる（ステ
ップ３０６）。Here, the user answers "yes" to the system prompt of (1) (step 305).
Then, a recognition result confirmation process is performed (step 306).

【０３１１】また、（２）のシステムプロンプトに対し
ては『いいえ』と答えるので（ステップ３０５でＮ
Ｏ）、ステップ３０１の処理に戻り、ｎの値が規定回数
α未満ならば、再びステップ３０１以下の処理が行われ
る。Also, since "No" is answered to the system prompt of (2) (N in step 305)
O) Returning to the processing of step 301, if the value of n is less than the specified number of times α, the processing of step 301 and thereafter is performed again.

【０３１２】なお、以上の処理で、ユーザの発話し
た『』内の言葉は、ユーザが実際に発話した内容でその
都度記憶されている。[0312] In the above processing, the words in "" that the user uttered are stored each time as the contents of the user's actual utterance.

【０３１３】次に、到着空港名の不確認（ステップ３０
５でＮＯ）が続き、ステップ３０１でｎの値がα以上と
判別されると、ステップ３０８に進み、ユーザの発話を
返すプロンプトの生成処理が行われる。このプロンプト
は上記記憶されたユーザの発話内容に基づいて生成され
る。Next, the arrival airport name is not confirmed (step 30).
(NO in 5), and if it is determined in step 301 that the value of n is equal to or more than α, the flow advances to step 308 to perform a process of generating a prompt for returning the user's utterance. This prompt is generated based on the stored contents of the utterance of the user.

【０３１４】ここでの、システムプロンプトは、以下の
ようになる。Here, the system prompt is as follows.

【０３１５】（１）例えば、無言の場合は、「お客様の
声が小さすぎます。」となる。（２）また、無関係のことを言っている場合は、「お客
様は、到着空港をおっしゃる際に、『はい』とおっしゃ
っています。」となる。（３）また、誤認識した場合は、「お客様は、到着空港
をおっしゃる際に、『たみ』とおっしゃっています。」
となる。(1) For example, in the case of silence, "the voice of the customer is too small." (2) In addition, when saying something unrelated, "The customer says" Yes "when talking to the arrival airport." (3) If you misrecognized, "The customer says" Tami "when speaking to the arrival airport."
Becomes

【０３１６】次に、ステップ３０１以下の処理が行われ
るが、ステップ３０３の到着空港名認識処理では、上記
のごときシステムプロンプトに対して、ユーザは発声方
法を変化させて、以下のような発話をすることが予想さ
れる。Next, the processing of step 301 and subsequent steps is performed. In the arrival airport name recognition processing of step 303, the user changes the utterance method in response to the system prompt as described above, and utters the following utterance. It is expected to be.

【０３１７】上記（１）のシステムプロンプトに対して
は、もう少し大きな声で言う。The system prompt of (1) is a little louder.

【０３１８】上記（２）のシステムプロンプトに対して
は、存在する空港名を言う。For the system prompt of the above (2), the name of the existing airport is said.

【０３１９】上記（３）のシステムプロンプトに対して
は、『いたみ』の『い』という言葉をもう少し大きな声
ではっきり言う。In response to the system prompt of (3), the word "I" of "Itami" is clearly and aloud.

【０３２０】このような新しく変更されたシステムプロ
ンプトによって到着空港名が確認されると（ステップ３
０５でＹＥＳ）、認識結果確定処理を行う（ステップ３
０６）。When the arrival airport name is confirmed by such a newly changed system prompt (step 3)
05 (YES), a recognition result confirmation process is performed (step 3).
06).

【０３２１】ところで、この種装置では、例えば市外局
番を認識する場合のように、データ数が非常に多い場合
に本来得たいデータを得やすく必要がある。By the way, in this kind of apparatus, when the number of data is very large, for example, when recognizing an area code, it is necessary to easily obtain the data originally desired.

【０３２２】そこで、以下、カテゴリ毎のデータ数が非
常に多い場合に本来得たいデータを得やすくする手法に
ついて、市外局番を例にとって説明する。Therefore, a method for easily obtaining the data originally desired when the number of data for each category is very large will be described below with reference to an area code.

【０３２３】図２１はこの場合の処理手順を示すフロー
チャートである。FIG. 21 is a flowchart showing the processing procedure in this case.

【０３２４】この処理では、まず認識不可や誤認識をし
た回数ｎの初期化が行われる（ステップ３２０）。In this process, first, the number n of times of unrecognizable or erroneous recognition is initialized (step 320).

【０３２５】次に、全国市外局番辞書の設定が行われる
（ステップ３２１）。Next, a national area code dictionary is set (step 321).

【０３２６】ここで、図２２を参照しながら市外局番辞
書について説明すると、市外局番辞書には、全国で用い
られている２桁から５桁までの市外局番全てが登録され
た全国市外局番辞書３１０と、都道府県単位に市外局番
が登録されている北海道市外局番一覧辞書３１１、東京
都市外局番一覧辞書３１２、大阪府市外局番一覧辞書３
１３、鹿児島県市外局番一覧辞書３１４等の都道府県レ
ベルの辞書と、都道府県単位に作成されてその都道府県
内に含まれる市区町村名に関連付けて市外局番が登録さ
れている北海道市区町村名−市外局番関連辞書３１５、
東京都市区町村名−市外局番関連辞書３１６、大阪府市
区町村名−市外局番関連辞書３１７、鹿児島県市区町村
名−市外局番関連辞書３１８等の市区町村レベルの辞書
がある。Here, the area code dictionary will be described with reference to FIG. 22. The area code dictionary includes a nationwide city code in which all the area codes of 2 to 5 digits used nationwide are registered. Area code dictionary 310, Hokkaido area code list dictionary 311 in which area codes are registered in each prefecture, Tokyo area code list dictionary 312, Osaka area code list dictionary 3
13. A prefecture-level dictionary such as the Kagoshima prefecture area code list dictionary 314, and a Hokkaido city in which the area code is registered in association with the municipalities created in each prefecture and included in the prefecture. Municipal name-area code dictionary 315,
There are municipal-level dictionaries such as a Tokyo city / ward / town / area code dictionary 316, an Osaka prefecture / ward / town / village name / area code-related dictionary 317, and a Kagoshima prefecture / city / town name / area code-related dictionary 318. .

【０３２７】そこで、ステップ３２１の処理では、図２
２に示した市外局番辞書のうち、全国の市外局番が登録
された全国市外局番辞書３１０を設定する。なお、以下
の市外局番認識処理では、この全国市外局番辞書３１０
に登録されたもの以外の数字の組み合わせは認識されな
い。Therefore, in the process of step 321, FIG.
Among the area code dictionaries shown in FIG. 2, a national area code dictionary 310 in which area codes of the whole country are registered is set. In the following area code recognition processing, the national area code dictionary 310
Combinations of numbers other than those registered in are not recognized.

【０３２８】次に、市外局番の認識処理が行われる（ス
テップ３２２）。Next, area code recognition processing is performed (step 322).

【０３２９】ここでの、システムプロンプトは、「お客
様のお電話番号の市外局番をおっしゃってください。」
となる。Here, the system prompt is “Please tell us the area code of your telephone number.”
Becomes

【０３３０】これに対して、ユーザの応答の発話があ
り、ユーザの応答が認識可か否かが調べられる（ステッ
プ３２３）。On the other hand, there is an utterance of the user's response, and it is checked whether or not the user's response is recognizable (step 323).

【０３３１】ここで、例えば、ユーザの『ぜろいちさん
ろく』という応答が認識できたら（ステップ３２３でＹ
ＥＳ）、ステップ３２４の市外局番確認処理が行われる
が、ユーザが無言であるとか、『はい』という本ステー
トに無関係な発話があると認識不可となり（ステップ３
２３でＮＯ）、ステップ３２６に進む。Here, for example, if the user's response “Zeroichi Sanroku” is recognized (Y in step 323)
ES), the area code confirmation processing of step 324 is performed, but if the user is silent or there is an utterance "yes" unrelated to this state, the recognition becomes impossible (step 3).
(NO at 23), the process proceeds to step 326.

【０３３２】次に、ステップ３２４の市外局番確認での
システムプロンプトが、「ぜろいちさんろくでよろしい
でしょうか？」であったとする。Next, it is assumed that the system prompt for confirming the area code at step 324 is "Is it all right?"

【０３３３】これに対して、ユーザが、『はい』と発話
すると市外局番認識の確認ができたことになり（ステッ
プ３２４でＹＥＳ）、続いて市外局番確定処理が行われ
て（ステップ３２５）、本ステートの処理を終える。On the other hand, when the user utters “Yes”, the area code recognition can be confirmed (YES in step 324), and the area code determination processing is performed (step 325). ), End the processing of this state.

【０３３４】一方、ステップ３２４の市外局番確認での
システムプロンプトが、「ぜろななさんろくでよろしい
でしょうか？」であったとする。On the other hand, it is assumed that the system prompt for confirming the area code in step 324 is "Are you sure you want to get a Zenerana?"

【０３３５】これに対して、ユーザが、『いいえ』と発
話すると市外局番の誤認識とになり（ステップ３２４で
ＮＯ）、ステップ３２６に進む。On the other hand, if the user utters “No”, the area code is erroneously recognized (NO in step 324), and the flow advances to step 326.

【０３３６】ステップ３２６は、認識不可や誤認識があ
ったとき進むステップなので、認識不可や誤認識をした
回数ｎが１インクリメントされる。Step 326 is a step to be performed when there is an unrecognizable or erroneous recognition. Therefore, the number n of unrecognizable or erroneous recognitions is incremented by one.

【０３３７】次に、図２２に示した都道府県レベルの辞
書や市区町村レベルの辞書を利用するために、地域認識
が行われる（ステップ３２７）。Next, in order to use the dictionary at the prefectural level or the dictionary at the municipal level shown in FIG. 22, area recognition is performed (step 327).

【０３３８】この場合、初めての地域認識の場合、すな
わちｎ＝１の場合のシステムプロンプトは、「都道府県
名をおっしゃってください。」となる。In this case, the system prompt in the case of the first area recognition, that is, in the case of n = 1, is "Please state the name of the prefecture."

【０３３９】また、２回目の地域認識の場合、すなわち
ｎ＝２の場合のシステムプロンプトは、「都道府県名と
市区町村名をおっしゃってください。」となる。In the case of the second area recognition, that is, in the case of n = 2, the system prompt is "Please state the name of the prefecture and the name of the municipality."

【０３４０】これに対してユーザは、初めての地域認識
の場合、すなわちｎ＝１の場合は、例えば『北海道』と
発話する。On the other hand, the user utters, for example, "Hokkaido" in the case of the first area recognition, that is, when n = 1.

【０３４１】また、２回目の地域認識の場合、すなわち
ｎ＝２の場合は、例えば『北海道の倶知安町』と発話す
る。In the case of the second area recognition, that is, in the case of n = 2, for example, "Kutchan-cho in Hokkaido" is uttered.

【０３４２】続いて、地域認識の有無が調べられ（ステ
ップ３２８）、認識不可の場合は（ステップ３２８でＮ
Ｏ）、ステップ３２７の処理に戻るが、認識可の場合は
（ステップ３２８でＹＥＳ）、地域認識の確認処理が行
われる（ステップ３２９）。Subsequently, the presence or absence of area recognition is checked (step 328). If recognition is not possible (N in step 328)
O), the process returns to step 327, but if recognition is possible (YES in step 328), confirmation processing of region recognition is performed (step 329).

【０３４３】この場合、初めての地域認識の場合、すな
わちｎ＝１の場合のシステムプロンプトは、「北海道で
よろしいでしょうか？」となる。In this case, the system prompt in the case of the first area recognition, that is, in the case of n = 1, is "Is it OK in Hokkaido?"

【０３４４】また、２回目の地域認識の場合、すなわち
ｎ＝２の場合のシステムプロンプトは、「北海道倶知安
町でよろしいでしょか？」となる。In the case of the second area recognition, that is, in the case of n = 2, the system prompt is "Is it OK in Kutchan-cho, Hokkaido?"

【０３４５】これに対して、ユーザが『はい』と発話す
ると（ステップ３２９でＹＥＳ）、ステップ３３０に進
むが、『いいえ』と発話すると（ステップ３２９でＮ
Ｏ）、ステップ３２７の処理に戻る。On the other hand, if the user utters “Yes” (YES in step 329), the process proceeds to step 330, but utters “No” (N in step 329).
O), the process returns to the step 327.

【０３４６】次に、ステップ３３０では、ｎ＝１か否
か、すなわち初めての地域認識の場合か２回目の地域認
識かが調べられ、ｎ＝１で初めての地域認識の場合は
（ステップ３３０でＹＥＳ）、都道府県レベルの地域別
市外局番辞書の設定が行われる（ステップ３３１）。す
なわち、図２２における市外局番辞書のうち都道府県単
位に市外局番が登録されている北海道市外局番一覧辞書
３１１、東京都市外局番一覧辞書３１２、大阪府市外局
番一覧辞書３１３、鹿児島県市外局番一覧辞書３１４等
の都道府県レベルの辞書のうちから、地域認識された都
道府県の１つの市外局番一覧辞書が設定される。Next, at step 330, it is checked whether or not n = 1, that is, whether it is the first area recognition or the second area recognition. YES), setting of a regional area code dictionary at the prefecture level is performed (step 331). That is, the Hokkaido area code list dictionary 311, the Tokyo area code list dictionary 312, the Osaka area code list dictionary 313, the Osaka area code list dictionary 313 in which the area code is registered for each prefecture in the area code dictionary in FIG. One area / area code list dictionary of prefectures recognized as regions from among the prefecture level dictionaries such as the area code list dictionary 314 is set.

【０３４７】そして、再びステップ３２２以下の市外局
番認識処理が行われるが、この処理では都道府県別に分
割された地域辞書を使用するので、認識不可や誤認識が
少なくなる。例えば、北海道市外局番一覧辞書３１１を
使う場合、北海道の市外局番の上２桁は“０１”なの
で、０１から始まる市外局番しか認識しない。従って、
認識不可や誤認識が少なくなる。[0347] Then, the area code recognition processing of step 322 and subsequent steps is performed again. In this processing, since the area dictionary divided for each prefecture is used, recognition failure and erroneous recognition are reduced. For example, when the Hokkaido area code list dictionary 311 is used, the first two digits of the Hokkaido area code are “01”, so that only the area code starting with 01 is recognized. Therefore,
Unrecognizable or erroneous recognition is reduced.

【０３４８】一方、ｎ＝２で２回目の地域認識の場合は
（ステップ３３０でＮＯ）、市区町村名−市外局番関連
辞書の設定が行われる（ステップ３３２）。すなわち、
図２２における市外局番辞書のうち都道府県単位に作成
されてその都道府県内に含まれる市区町村名に関連付け
て市外局番が登録されている北海道市区町村名−市外局
番関連辞書３１５、東京都市区町村名−市外局番関連辞
書３１６、大阪府市区町村名−市外局番関連辞書３１
７、鹿児島県市区町村名−市外局番関連辞書３１８等の
市区町村レベルの辞書のうちから、地域認識された都道
府県の１つの市区町村名−市外局番関連辞書が設定され
る。On the other hand, in the case of n = 2 and the second region recognition (NO in step 330), a dictionary relating to city / town / village name-area code is set (step 332). That is,
22. The Hokkaido city / city name / area code dictionary 315 created for each prefecture in the area code dictionary in FIG. 22 and registered with the area code in association with the city / town name included in the prefecture. , Tokyo city / ward / town / village name-area code related dictionary 316, Osaka prefecture / ward / town / village name-area code related dictionary 31
7. One of the municipalities-level dictionaries, such as a municipal level-dictionary 318, etc., from the municipal level dictionary, such as the Kagoshima municipal name-area code dictionary 318, is set. .

【０３４９】そして、例えば、確認された都道府県名お
よび市区町村名が「北海道倶知安町」であるなら、北海
道市区町村名−市外局番関連辞書３１５を使用して市外
局番を検索する（ステップ３３３）。For example, if the confirmed prefecture name and municipal name are “Kutchan-cho, Hokkaido”, the area code is searched using the dictionary 315 relating to the municipal name-area code of Hokkaido. (Step 333).

【０３５０】すなわち北海道の市町村名と市外局番が関
連付られた北海道市区町村名−市外局番関連辞書３１５
から倶知安町を検索し、市外局番を検索する。そして、
市外局番の確定処理が行われることになる（ステップ３
２５）。That is, a municipal name-area code dictionary 315 associated with a municipal name and an area code of Hokkaido.
Search for Kutchan-cho from, and search for area code. And
The area code is determined (step 3
25).

【０３５１】従って、第２の実施形態では、以下の効果
を奏する。（１）対話中同じシステムプロンプトをできるだけ繰り
返さないようにしたので、ユーザに不快感を与えない。（２）同じ内容の質問を繰り返すことなく関連性のある
質問をして認識結果を絞り込み、認識精度を上げたの
で、対話時間を減少できる。（３）ユーザの応答の仕方をもとに、装置がユーザに働
きかけることで認識しやすい状況を作り出し、認識精度
をあげたので、同じく対話時間を減少できる。（４）自然な数字復唱プロンプトを作成するようにした
ので、低コストに聞き取りやすいシステムプロンプトを
作成できる。Therefore, the second embodiment has the following advantages. (1) Since the same system prompt is not repeated as much as possible during a conversation, the user is not discomforted. (2) Relevant questions are asked without repetition of questions of the same content, and the recognition result is narrowed down. The recognition accuracy is increased, so that the conversation time can be reduced. (3) Based on how the user responds, the apparatus works on the user to create an easy-to-recognize situation, and the recognition accuracy is increased, so that the conversation time can be similarly reduced. (4) Since a natural number repetition prompt is created, a low-cost, easy-to-hear system prompt can be created.

【０３５２】ところで、すでに述べたように、従来の音
声認識応答装置において、ユーザの発話を確認する確認
手法には、大きく分けて以下の２手法がある。By the way, as described above, in the conventional voice recognition response device, there are roughly the following two methods for confirming the utterance of the user.

【０３５３】（１）確認すべき項目を個々に提示して、
それぞれの項目に対してユーザの回答を求めるもの。(1) Items to be confirmed are individually presented,
Ask for the user's answer for each item.

【０３５４】例えば、システムプロンプト「出発空港は
羽田ですか？」に対してユーザが『はい』と答えるも
の。あるいは、システムプロンプト「到着空港は伊丹で
すか？」に対してユーザが『はい』と答えるもの。For example, the user answers "Yes" to the system prompt "Is the departure airport Haneda?" Or, the user answers "yes" to the system prompt "Is the arrival airport Itami?"

【０３５５】（２）複数の確認すべき項目を一度に提示
して、全ての項目に対してユーザの回答を求めるもの。(2) A plurality of items to be confirmed are presented at a time, and the user's answer is requested for all items.

【０３５６】例えば、システムプロンプト「出発空港は
羽田、到着空港は伊丹ですか？」に対してユーザが『は
い』と答えるもの。For example, a user answers "Yes" to the system prompt "Is the departure airport Haneda and the arrival airport Itami?"

【０３５７】しかしながら、上記のごとき従来の確認手
法では、以下の問題がある。However, the conventional checking method as described above has the following problems.

【０３５８】まず、（１）のように個々に確認する手法
では、確認作業の時間が長くなり、また機械的な対話と
なってしまう。First, in the method of confirming individually as shown in (1), the time of the confirmation work becomes longer and a mechanical dialogue occurs.

【０３５９】また、（２）のように、一度に複数の項目
を確認する場合は、ユーザが『いいえ』と答えた場合、
複数の項目のうち、どの項目を誤認識したのかの特定が
難しく、上記（１）の場合に比してかえって時間がかか
る。In the case of confirming a plurality of items at once as shown in (2), when the user answers "No",
It is difficult to specify which of the plurality of items has been erroneously recognized, and it takes longer time than the case (1).

【０３６０】また、従来の音声認識応答装置では、装置
側のシステムプロンプトに対してユーザが確認の応答を
する場合、ユーザに対して事前に確認の応答方法を教示
しておけば、一般にはユーザはその教示に従って応答す
るはずであるが、ユーザによっては、戸惑ったり、聞き
逃したりして何を答えたらよいのか分からず、先へ進め
ないという問題点があった。In the conventional voice recognition response device, when the user responds to the system prompt on the device side for confirmation, if the user is instructed in advance how to respond to the confirmation, the user is generally accepted. Should respond in accordance with the instruction, but there is a problem that some users are confused or missed and do not know what to answer and cannot proceed.

【０３６１】そこで、第３の実施形態として、装置とユ
ーザの対話状況を監視して装置とユーザの対話状況に応
じて質問形式を変え、人との対話に近い自然な対話を実
現するようにした音声認識応答装置を説明する。Therefore, as a third embodiment, the state of dialogue between the device and the user is monitored, the question format is changed according to the state of dialogue between the device and the user, and a natural dialogue close to that of a person is realized. The described speech recognition responder will be described.

【０３６２】図２３は本実施形態に係わる音声認識応答
装置の全体構成を示すブロック図である。図２３におい
て、音声認識応答装置３４０は、音声入力装置３４１、
音声認識装置３４２、認識語入力時点監視装置３４３、
認識結果判定装置３４４、ダイアログフロー管理装置３
４５、音声プロンプト合成装置３４６、音声出力装置３
４７、フレーム情報格納部３４８、ダイアログフローデ
ータベース（ダイアログフローＤＢ）３４９、プロンプ
トデータベース（プロンプトＤＢ）３５０、出力プロン
プト格納部３５１より構成されている。FIG. 23 is a block diagram showing the overall configuration of the voice recognition / response apparatus according to this embodiment. In FIG. 23, a voice recognition response device 340 includes a voice input device 341,
Voice recognition device 342, recognition word input time monitoring device 343,
Recognition result determination device 344, dialog flow management device 3
45, voice prompt synthesizer 346, voice output device 3
47, a frame information storage unit 348, a dialog flow database (dialog flow DB) 349, a prompt database (prompt DB) 350, and an output prompt storage unit 351.

【０３６３】ここで、音声入力装置３４１は電話受話器
のマイク等より構成され、音声信号を電気信号に変換す
るものである。Here, the voice input device 341 is composed of a microphone or the like of a telephone handset, and converts a voice signal into an electric signal.

【０３６４】音声認識装置３４２は、音声入力装置３４
１で得られた電気信号を文字情報に変換するものであ
る。なお、音声認識処理自体は既知の手法を用い、その
詳細説明は省略する。[0364] The voice recognition device 342 includes the voice input device 34.
The electric signal obtained in step 1 is converted into character information. The voice recognition processing itself uses a known method, and a detailed description thereof will be omitted.

【０３６５】認識語入力時点監視装置３４３は、ユーザ
の発話を認識した時点を検出するもので、本実施形態で
は、装置側から出力する合成プロンプトの出力開始時点
を基準にユーザの発話時点を求める。The recognition word input time monitoring device 343 detects the time when the utterance of the user is recognized. In this embodiment, the utterance time of the user is obtained based on the output start time of the composite prompt output from the device side. .

【０３６６】例えば、図３１において、「札幌発」ａを
発話中のａ１時点でユーザの『はい』４６１の発話があ
ると、ａ１時点で『はい』４６１の発話があったことが
分かる。なお、この処理は後に詳述する。For example, in FIG. 31, when the user utters “Yes” 461 at the time a1 while “Sapporo departure” a is being uttered, it can be understood that the user uttered “Yes” 461 at the time a1. This processing will be described later in detail.

【０３６７】認識結果判定装置３４４は、音声認識装置
３４２で認識された音声認識に基づいて以下の処理を行
う。The recognition result determination device 344 performs the following processing based on the voice recognition recognized by the voice recognition device 342.

【０３６８】（１）音声認識装置３４２で認識されたユ
ーザの発話内容を判定する。本実施形態では、『はい』
であれば「肯定語」、『いいえ』であれば「否定語」、
空港名や日付であれば「関連語」、認識結果がなければ
「未入力」、それ以外は「認識不可」と判定される。ま
た、合成プロンプトのタイムテーブルから出力終了時点
を算出し、ユーザの発話が出力終了時点から一定時間以
内になければ「未入力」と判定する。(1) The utterance content of the user recognized by the voice recognition device 342 is determined. In this embodiment, "Yes"
Is a positive word, "no" is a negative word,
If it is an airport name or date, it is determined as "related word", if there is no recognition result, "not entered", otherwise, "unrecognizable". Further, the output end point is calculated from the time table of the composite prompt, and if the user's utterance is not within a predetermined time from the output end point, it is determined that “uninput”.

【０３６９】（２）また、ユーザの発話がどの確認項目
に対してなされたかを判定する。本実施形態では、認識
語入力時点監視装置３４３で検出されたユーザの発話時
点と、出力している合成プロンプトのタイムテーブルか
らどの確認項目に対して発話があったかを判定する。(2) Also, it is determined to which confirmation item the user's utterance was made. In the present embodiment, it is determined from the utterance time of the user detected by the recognition word input time monitoring device 343 and which confirmation item the utterance was made from the time table of the output composite prompt.

【０３７０】次に、ダイアログフロー管理装置３４５
は、ダイアログフロー全体の制御を行うとともに、認識
結果判定装置３４４の判定結果に基づいてダイアログフ
ローの変更を行う。例えば、到着空港のプロンプトを出
力しているときに「否定語」の判定がなされると、到着
空港の確認ステートへ移行する。空港名などの「関連
語」が認識されても到着空港の確認ステートへ移行す
る。また、確認プロンプト出力後に「未入力」、「認識
不可」の判定がなされると、例示プロンプトへ移行す
る。Next, the dialog flow management device 345
Controls the entire dialog flow and changes the dialog flow based on the determination result of the recognition result determination device 344. For example, if the determination of the "negative word" is made while outputting the prompt of the arrival airport, the state shifts to the arrival airport confirmation state. Even if a "related word" such as an airport name is recognized, a transition is made to the arrival airport confirmation state. Further, when the determination of “not input” or “unrecognizable” is made after the output of the confirmation prompt, the process shifts to the example prompt.

【０３７１】音声プロンプト合成装置３４６は、ダイア
ログフロー管理装置３４５で選択されたダイアログフロ
ーに基づき、出力する音声プロンプトを合成する。ま
た、出力プロンプトの再生時間と確認項目の対応テーブ
ルも同時に作成する。The voice prompt synthesizing device 346 synthesizes a voice prompt to be output based on the dialog flow selected by the dialog flow management device 345. Also, a correspondence table between the reproduction time of the output prompt and the confirmation item is created at the same time.

【０３７２】音声出力装置３４７は、電話受話器のスピ
ーカ等より構成され、音声プロンプト合成装置３４６で
合成された音声プロンプトを音声信号に変換する。The audio output device 347 is constituted by a speaker of a telephone handset, etc., and converts the audio prompt synthesized by the audio prompt synthesizing device 346 into an audio signal.

【０３７３】フレーム情報格納部３４８は、認識結果判
定装置３４４で判定された判定結果を保持する。The frame information storage section 348 holds the determination result determined by the recognition result determination device 344.

【０３７４】ダイアログフローデータベース（ダイアロ
グフローＤＢ）３４９は、予め必要とされるダイアログ
フローを保持するもので、本実施形態では、出発地、到
着地、日付を獲得するダイアログフローやダイアログフ
ロー管理装置３４５で要求されるダイアログフローを保
持する。The dialog flow database (dialog flow DB) 349 holds a dialog flow required in advance. In this embodiment, a dialog flow for acquiring a departure place, an arrival place, and a date, and a dialog flow management device 345. Holds the dialog flow required by.

【０３７５】プロンプトデータベース（プロンプトＤ
Ｂ）３５０は、音声プロンプト合成装置３４６で音声プ
ロンプトを合成するのに必要なデータを保持するもの
で、本実施形態では、さまざまな空港名や日付データ、
共通して用いられる言い回しなど、プロンプト合成に必
要なデータを保持する。The prompt database (prompt D)
B) 350 holds data necessary for synthesizing voice prompts by the voice prompt synthesizer 346. In this embodiment, various airport names and date data,
Holds data required for prompt composition such as commonly used wording.

【０３７６】出力プロンプト格納部３５１は、音声プロ
ンプト合成装置３４６で合成されたプロンプトに関する
情報を保持する。本実施形態では、確認プロンプト及び
それに対応したタイムテーブルを保持する。[0376] The output prompt storage unit 351 holds information on the prompts synthesized by the voice prompt synthesizer 346. In the present embodiment, a confirmation prompt and a time table corresponding to the confirmation prompt are stored.

【０３７７】以上が構成であるが、次にその動作を図２
４乃至図２７のフローチャートに基づいて説明する。The above is the configuration. Next, the operation will be described with reference to FIG.
4 to FIG. 27 will be described.

【０３７８】この処理では、まず、ユーザからの入力情
報の獲得処理が行われ、具体的には、出発地、到着地、
日付の獲得が行われる（ステップ３６０）。In this process, first, a process of acquiring input information from the user is performed. Specifically, the departure place, the arrival place,
A date is obtained (step 360).

【０３７９】次に、獲得された入力情報に基づいて音声
プロンプト合成装置３４６によって確認プロンプトが作
成され、同時に再生時間と確認項目の対応テーブルが作
成される（ステップ３６１）。Next, a confirmation prompt is created by the voice prompt synthesizing device 346 based on the acquired input information, and at the same time, a correspondence table between the reproduction time and the confirmation item is created (step 361).

【０３８０】次に、確認プロンプトの出力要求を行い
（ステップ３６２）、その出力処理が行われる（ステッ
プ３６３）。そして、確認プロンプトの出力が完了した
か否かが調べられ（ステップ３６４）、確認プロンプト
の出力が完了していない場合は（ステップ３６４でＮ
Ｏ）、図２５のステップ３７０に進む。Next, an output request for a confirmation prompt is made (step 362), and the output process is performed (step 363). Then, it is checked whether or not the output of the confirmation prompt has been completed (step 364). If the output of the confirmation prompt has not been completed (N in step 364)
O), proceed to step 370 in FIG.

【０３８１】一方、確認プロンプトの出力が完了した場
合は（ステップ３６４でＹＥＳ）、次に、ユーザからの
入力があるか否かが調べられ（ステップ３６５）、ユー
ザからの入力がない場合は（ステップ３６５でＮＯ）、
例示プロンプトを出力して（ステップ３６６）、ステッ
プ３６５の処理にもどる。On the other hand, if the output of the confirmation prompt is completed (YES in step 364), it is next checked whether or not there is an input from the user (step 365). (NO in step 365),
An example prompt is output (step 366), and the process returns to step 365.

【０３８２】一方、ユーザからの入力があった場合は
（ステップ３６５でＹＥＳ）、『いいえ』という否定の
入力か否かが調べられ（ステップ３６７）、『いいえ』
という否定の入力の場合は（ステップ３６７でＹＥ
Ｓ）、否定された項目を除いて確認プロンプトの修正処
理をし（ステップ３６８）、ステップ３６２の処理に戻
る。On the other hand, if there is an input from the user (YES at step 365), it is checked whether the input is a negative input of "No" (step 367), and "No"
Is negative (YE at step 367)
S), the confirmation prompt is corrected except for the denied items (step 368), and the process returns to step 362.

【０３８３】一方、『いいえ』という否定の入力でない
場合は（ステップ３６７でＮＯ）、『はい』という肯定
の入力か否かが調べられ（ステップ３６９）、肯定の入
力でない場合は（ステップ３６９でＮＯ）、例示プロン
プトを出力して（ステップ３６６）、ステップ３６５の
処理に戻る。On the other hand, if it is not a negative input of "No" (NO in step 367), it is checked whether or not it is a positive input of "yes" (step 369). If it is not an affirmative input (step 369). NO), an example prompt is output (step 366), and the process returns to step 365.

【０３８４】一方、『はい』という肯定の入力があった
場合は（ステップ３６９でＹＥＳ）、当処理を終える。On the other hand, if there is an affirmative input of “Yes” (YES in step 369), this processing ends.

【０３８５】次に、図２５を参照して、確認プロンプト
の出力が完了していない場合（ステップ３６４でＮＯ）
の処理手順を説明する。Next, referring to FIG. 25, when the output of the confirmation prompt has not been completed (NO in step 364)
Will be described.

【０３８６】この処理では、ユーザからの入力があるか
否かが調べられ（ステップ３７０）、ユーザからの入力
がない場合は（ステップ３７０でＮＯ）、図２４のステ
ップ３６４の処理にもどる。In this process, it is checked whether or not there is an input from the user (step 370). If there is no input from the user (NO in step 370), the process returns to step 364 in FIG.

【０３８７】一方、ユーザからの入力があった場合は
（ステップ３７０でＹＥＳ）、『いいえ』という否定の
入力か否かが調べられ（ステップ３７１）、『いいえ』
という否定の入力の場合は（ステップ３７１でＹＥ
Ｓ）、図２６のステップ３８０に進む。On the other hand, if there is an input from the user (YES at step 370), it is checked whether the input is a negative input of "no" (step 371), and "no"
In the case of a negative input of (YE
S), and proceed to step 380 in FIG.

【０３８８】一方、『いいえ』という否定の入力でない
場合は（ステップ３７１でＮＯ）、例えば壱岐空港とい
う関連語の入力か否かが調べられ、（ステップ３７
２）、関連語の入力の場合は（ステップ３７２でＹＥ
Ｓ）、図２７のステップ３９０に進む。On the other hand, if it is not a negative input of "No" (NO in step 371), it is checked whether or not a related word such as Iki Airport is input (step 37).
2) In the case of inputting related words (YE in step 372)
S), proceed to step 390 in FIG.

【０３８９】一方、関連語の入力でない場合は（ステッ
プ３７２でＮＯ）、『はい』という肯定の入力か否かが
調べられ（ステップ３７３）、『はい』という肯定の入
力の場合は（ステップ３７３でＹＥＳ）、肯定された項
目をチェックして（ステップ３７４）、図２４のステッ
プ３６４の処理にもどる。On the other hand, if it is not an input of a related word (NO in step 372), it is checked whether or not the input is affirmative (Yes) (step 373). If the input is affirmative (Yes) (step 373). Then, the affirmed item is checked (step 374), and the process returns to step 364 in FIG.

【０３９０】また、肯定の入力でない場合も（ステップ
３７３でＮＯ）、図２４のステップ３６４の処理にもど
る。If the input is not affirmative (NO in step 373), the process returns to step 364 in FIG.

【０３９１】次に、図２６を参照して、図２５のステッ
プ３７１で『いいえ』という否定の入力があった場合の
処理手順を説明する。Next, with reference to FIG. 26, a description will be given of a processing procedure in the case where a negative input of “No” is made in step 371 of FIG.

【０３９２】この場合は、まず、確認プロンプトの出力
を停止し（ステップ３８０）、ユーザの発話時点より確
定している再確認項目を抽出する（ステップ３８１）。
そして、再入力要求プロンプトを出力する（ステップ３
８２）。In this case, first, the output of the confirmation prompt is stopped (step 380), and the reconfirmation items determined from the time of the user's utterance are extracted (step 381).
Then, a re-input request prompt is output (step 3
82).

【０３９３】次に、ユーザからの入力があるか否かが調
べられ（ステップ３８３）、ユーザからの入力がない場
合は（ステップ３８３でＮＯ）、ステップ３８２の処理
に戻るが、ユーザからの入力があった場合は（ステップ
３８３でＹＥＳ）、再確認プロンプトを出力し（ステッ
プ３８４）、ユーザからの応答待ちとなる（ステップ３
８５）。Next, it is checked whether or not there is an input from the user (step 383). If there is no input from the user (NO in step 383), the process returns to step 382. If there is (YES at step 383), a reconfirmation prompt is output (step 384), and a response from the user is waited for (step 3).
85).

【０３９４】ここで、ユーザからの入力がない場合は
（ステップ３８５でＮＯ）、ステップ３８４の処理に戻
るが、ユーザからの入力があった場合は、図２４のステ
ップ３６８の処理に戻る。If there is no input from the user (NO in step 385), the process returns to step 384, but if there is an input from the user, the process returns to step 368 in FIG.

【０３９５】次に、図２７を参照して、図２５のステッ
プ３７２において関連語の入力があった場合の処理手順
を説明する。Next, with reference to FIG. 27, a description will be given of a processing procedure when a related word has been input in step 372 of FIG.

【０３９６】この処理では、確認プロンプトの出力を停
止し（ステップ３９０）、ユーザの発話時点において推
定していた再確認項目を抽出する（ステップ３９１）。In this process, the output of the confirmation prompt is stopped (step 390), and the reconfirmation item estimated at the time of the utterance of the user is extracted (step 391).

【０３９７】次に、関連語を取り入れた再確認プロンプ
トを作成し（ステップ３９２）、作成された再確認プロ
ンプトを出力する（ステップ３９３）。そして、ユーザ
からの応答待ちとなる（ステップ３９４）。Next, a reconfirmation prompt incorporating a related word is created (step 392), and the created reconfirmation prompt is output (step 393). Then, it waits for a response from the user (step 394).

【０３９８】ここで、ユーザからの入力がない場合は
（ステップ３９４でＮＯ）、ステップ３９３の処理に戻
るが、ユーザからの入力があった場合は、図２４のステ
ップ３６８の処理に戻る。If there is no input from the user (NO in step 394), the process returns to step 393, but if there is an input from the user, the process returns to step 368 in FIG.

【０３９９】以上が本実施形態の処理手順であるが、次
に、図２８を参照して図２３の各構成要素間の動作を説
明する。The above is the processing procedure of the present embodiment. Next, the operation between the components in FIG. 23 will be described with reference to FIG.

【０４００】まず、音声入力装置３４１は、ユーザから
入力された音声信号を電気信号に変換する。First, the voice input device 341 converts a voice signal input from a user into an electric signal.

【０４０１】音声入力装置３４１で電気信号に変換され
た音声信号は音声認識装置３４２と認識語入力時点監視
装置３４３に入力され、認識語入力時点監視装置３４３
では、音声入力装置３４１に入力された発話の入力時点
が検出されて音声認識装置３４２に渡される。The voice signal converted into the electric signal by the voice input device 341 is input to the voice recognition device 342 and the recognition word input time monitoring device 343, and the recognition word input time monitoring device 343.
Then, the input time point of the utterance input to the voice input device 341 is detected and passed to the voice recognition device 342.

【０４０２】音声認識装置３４２では、電気信号に変換
された音声信号を文字情報に変換し、出力プロンプト格
納部３５１に保持されたタイムテーブルと、認識語入力
時点監視装置３４３で検出された発話時点を用いて認識
辞書を切り替える。なお、認識語入力時点監視装置３４
３がない場合は、出力プロンプトの経過時間を測るタイ
マが必要である。[0402] The voice recognition device 342 converts the voice signal converted into the electric signal into character information, and stores the time table held in the output prompt storage unit 351 and the utterance time detected by the recognized word input time monitoring device 343. To switch the recognition dictionary. In addition, the recognition word input time monitoring device 34
If there is no 3, a timer is needed to measure the elapsed time of the output prompt.

【０４０３】認識結果判定装置３４４は出力プロンプト
格納部３５１に保持された出力プロンプトのタイムテー
ブルと認識語入力時点監視装置３４３で検出された発話
時点を用いて、発話の対象となっている項目を抽出し、
音声認識装置３４２で認識された認識結果の内容を判定
する。その結果は、フレーム情報格納部３４８に格納す
るとともに、ダイアログフロー管理装置３４５に渡す。The recognition result determination device 344 uses the time table of the output prompt stored in the output prompt storage unit 351 and the utterance time detected by the recognition word input time monitoring device 343 to determine the item to be uttered. Extract,
The content of the recognition result recognized by the voice recognition device 342 is determined. The result is stored in the frame information storage unit 348 and passed to the dialog flow management device 345.

【０４０４】また、認識結果判定装置は３４４は、出力
プロンプト格納部３５１に格納された出力プロンプトの
タイムテーブルと認識語入力時点監視装置３４３で検出
された発話時点データを用いて、プロンプトの出力終了
時点から一定時間以上ユーザの発話が認識されない場合
は、その情報をフレーム情報格納部３４８に格納すると
ともに、ダイアログフロー管理装置３４５に渡す。The recognition result determination device 344 uses the time table of the output prompt stored in the output prompt storage unit 351 and the utterance time data detected by the recognition word input time monitoring device 343 to end the output of the prompt. If the user's utterance is not recognized for a certain period of time from the point in time, the information is stored in the frame information storage unit 348 and passed to the dialog flow management device 345.

【０４０５】ダイアログフロー管理装置３４５では、認
識結果判定装置は３４４から出力されたデータおよびフ
レーム情報格納部３４８に格納されたデータに基づいて
ダイアログフローを決定し、ダイアログフローデータベ
ース（ダイアログフローＤＢ）３４９の格納データを参
照して音声プロンプト合成装置３４８に出力する。[0405] In the dialog flow management device 345, the recognition result determination device determines a dialog flow based on the data output from the 344 and the data stored in the frame information storage unit 348, and a dialog flow database (dialog flow DB) 349. Is output to the voice prompt synthesizing device 348 with reference to the stored data.

【０４０６】音声プロンプト合成装置３４８では、ダイ
アログフロー管理装置３４５から得た情報をもとにプロ
ンプトデータベース（プロンプトＤＢ）３５０の格納デ
ータを参照して音声プロンプトを作成する。同時に、作
成したプロンプトのタイムテーブルも作成する。作成し
たプロンプトとタイムテーブルは出力プロンプト格納部
３５１に格納する。The voice prompt synthesizer 348 creates a voice prompt by referring to data stored in the prompt database (prompt DB) 350 based on information obtained from the dialog flow management device 345. At the same time, create a timetable for the created prompt. The created prompt and time table are stored in the output prompt storage unit 351.

【０４０７】そして、音声出力装置３４７は音声プロン
プト合成装置３４８で生成した音声プロンプトを出力す
る。[0407] Then, the voice output device 347 outputs the voice prompt generated by the voice prompt synthesizer 348.

【０４０８】以上が図２３の各構成要素間の動作であ
る。The above is the operation between the components in FIG.

【０４０９】次に、図２９を参照してユーザの発話時点
と内容判定情報に基づいてダイアログフローが変更され
る場合について説明する。Next, a case where the dialog flow is changed based on the utterance time of the user and the content determination information will be described with reference to FIG.

【０４１０】この処理は、システムプロンプト「出発地
をお願いします。」（ステップ４２０）に対して、ユー
ザが『札幌』と答え（ステップ４２１）、システムプロ
ンプト「到着地をお願いします。」（ステップ４２２）
に対して、ユーザが『壱岐』と答え（ステップ４２
３）、システムプロンプト「ご出発の月日をお願いしま
す。」（ステップ４２４）に対して、ユーザが『８月７
日』と答えた場合の（ステップ４２５）、確認プロンプ
トのダイアログフローの変更についての処理である。In this process, the user answers "Sapporo" to the system prompt "Please give me the departure place" (Step 420) (Step 421), and the system prompt "Please give me the destination." Step 422)
, The user answers "Iki" (step 42).
3) In response to the system prompt "Please give me the date of departure."
This is the process for changing the dialog flow of the confirmation prompt when "day" is answered (step 425).

【０４１１】この場合、まず、確認のシステムプロンプ
トは、「札幌発、隠岐空港行き、７月７日の便でよろし
いでしょうか？」となる（ステップ４２６）。In this case, first, the confirmation system prompt is "Is it OK for the flight from Sapporo to Oki Airport on July 7?" (Step 426).

【０４１２】この場合、ユーザ発話に応じて以下のよう
にダイアログフローが変更される。In this case, the dialog flow is changed as follows according to the user's utterance.

【０４１３】（１）ユーザ発話が『壱岐』の場合（ステ
ップ４２７）到着地の確認プロンプトを出力中もしくは出力直後に、
『壱岐』などの「関連語」のユーザ発話があった場合、
到着地の確認ステートへ移行する。(1) In the case where the user utterance is “Iki” (step 427) During or immediately after the output of the confirmation prompt for the destination,
If there is a user utterance of "related words" such as "Iki"
Move to the destination confirmation state.

【０４１４】すなわち、システムプロンプト「壱岐空港
でよろしいでしょうか？」を発話し（ステップ４３
２）、ユーザの『はい』という肯定的な発話があると
（ステップ４３３）、到着地までについての処理は終了
する。また、これ以外の発話の場合はステップ４２６の
処理に戻る。That is, the system prompt "Is it okay at Iki Airport?" Is spoken (step 43).
2) If there is a positive utterance of "yes" by the user (step 433), the processing up to the destination is terminated. In the case of other utterances, the process returns to step 426.

【０４１５】（２）ユーザ発話が『違う』の場合（ステ
ップ４２８）日付の確認プロンプトを出力中もしくは出力直後に、
『違う』などの「否定語」のユーザ発話があった場合、
日付の確認ステートへ移行する。(2) In the case where the user's utterance is “NO” (step 428) During or immediately after the output of the date confirmation prompt,
If there is a user utterance of "negative word" such as "no",
Move to the date confirmation state.

【０４１６】すなわち、システムプロンプト「ご出発の
月日をお願いします。」を発話し（ステップ４３４）、
ユーザの『８月７日』という発話があると（ステップ４
３５）、「８月７日でよろしいですか？」の確認プロン
プトを発話し（ステップ４３６）、これに対してユーザ
の『はい』という肯定的な発話があった場合は（ステッ
プ４３７）、ステップ４２６の処理に戻る。That is, a system prompt “Please give me the departure date” is spoken (step 434).
If the user speaks “August 7” (step 4
35), a confirmation prompt of "Is August 7 ready?" Is uttered (step 436), and if there is a positive utterance of "Yes" by the user (step 437), the step It returns to the process of 426.

【０４１７】（３）ユーザ発話が『はい』の場合（ステ
ップ４２９）確認プロンプトの出力終了後に、『はい』などの「肯定
語」のユーザ発話があった場合、次のフェーズに移行す
るか、ダイアログフローの終了プロンプトへ移行する
（ステップ４３８）。(3) In the case where the user utterance is “Yes” (step 429) If the user utterance of “positive word” such as “Yes” is found after the output of the confirmation prompt, the process proceeds to the next phase. The process proceeds to a dialog flow end prompt (step 438).

【０４１８】（４）ユーザ発話が無言の場合（ステップ
４３０）確認プロンプトの出力後に、ユーザ発話が認識されない
場合、返答の仕方を例示するプロンプトに移行し、
「『はい』、『いいえ』でお答えください。」等の例示
プロンプトを発話する（ステップ４３９）。(4) In the case where the user utterance is silent (step 430) If the user utterance is not recognized after the output of the confirmation prompt, the process proceeds to a prompt which exemplifies how to reply,
An example prompt such as "Please answer" Yes "or" No "" is spoken (step 439).

【０４１９】（５）ユーザ発話が認識不可能な場合（ス
テップ４３１）確認プロンプトの出力後に、ユーザ発話が認識不可能な
場合（「肯定語」、「否定語」、「未入力」以外）も、
返答の仕方を例示するプロンプトに移行し、「『は
い』、『いいえ』でお答えください。」等の例示プロン
プトを発話する（ステップ４３９）。(5) When User Speech is Unrecognizable (Step 431) When the user utterance is unrecognizable after outputting the confirmation prompt (other than “positive word”, “negative word”, “uninput”) ,
The process shifts to a prompt that exemplifies how to reply, and utters an example prompt such as “Please answer“ Yes ”or“ No ”” (step 439).

【０４２０】なお、確認プロンプトの出力中にユーザの
発話が認識不可能な場合はダイアログフローの変更はし
ない。[0420] If the user's utterance cannot be recognized during the output of the confirmation prompt, the dialog flow is not changed.

【０４２１】また、図３０は図２９と同じ事例である
が、確認プロンプト出力中に『はい』等の「肯定語」の
ユーザ発話があり（ステップ４４７）、確認プロンプト
出力後に『いいえ』等の「否定後」の入力があった場合
（ステップ４４８）、「肯定語」の認識された時点もし
くは肯定語の発話前の確認項目は以後のプロンプトから
省く。FIG. 30 shows the same case as FIG. 29, but there is a user utterance of a "positive word" such as "Yes" during the output of the confirmation prompt (step 447), and "No" or the like after the output of the confirmation prompt. If there is an input of "after negative" (step 448), the point at which the "positive word" is recognized or the confirmation item before the utterance of the positive word is omitted from the subsequent prompts.

【０４２２】すなわち、確認のシステムプロンプト「札
幌発、隠岐空港行き、７月７日の便でよろしいでしょう
か？」（ステップ４２６）に対して、「札幌発」のプロ
ンプト出力直後にユーザの『はい』という「肯定語」の
入力があった場合は（ステップううや）、『はい』とい
う「肯定語」の入力があった時点以前の「札幌発」は以
降のプロンプトから除く。[0422] That is, in response to the confirmation system prompt "Is it OK for the flight from Sapporo to Oki Airport on July 7?" (Step 426), the user is asked "Yes" immediately after the "Sapporo departure" prompt is output. Is input (Step Uuya), "Departure from Sapporo" before the input of "Yes" is excluded from the subsequent prompts.

【０４２３】従って、確認プロンプト出力後に『いい
え』等の「否定後」の入力があった場合（ステップ４４
８）、システムプロンプトは「隠岐空港行き、７月７日
の便でよろしいでしょうか？」となる（ステップ４４
９）。Therefore, when there is an input of "after negation" such as "no" after the output of the confirmation prompt (step 44)
8), the system prompt is "Is it OK for the flight to Oki Airport on July 7?" (Step 44)
9).

【０４２４】これに対してユーザの『はい』という「肯
定語」のユーザ発話があった場合（ステップ４５０）、
次のフェーズに移行するか、ダイアログフローの終了プ
ロンプトへ移行する（ステップ４５１）。[0424] On the other hand, if the user utters a "positive word" of "yes" (step 450),
The process moves to the next phase or moves to the end prompt of the dialog flow (step 451).

【０４２５】次に、確認プロンプトの時間データを用い
てダイアログフローを変更する場合及び認識文法を切り
替える場合の処理手順を図３１を参照しながら説明す
る。Next, a processing procedure for changing the dialog flow using the time data of the confirmation prompt and for switching the recognition grammar will be described with reference to FIG.

【０４２６】（１）確認プロンプト確認プロンプトとは装置側の認識結果が正しいかどうか
の判断をユーザに求めるプロンプトで、「札幌発、隠岐
空港行き、７月７日の便でよろしいでしょうか？」であ
る（ステップ４６０）。(1) Confirmation Prompt The confirmation prompt is a prompt for asking the user whether or not the recognition result on the device side is correct. "Are you sure you want to travel from Sapporo to Oki Airport on July 7?" (Step 460).

【０４２７】（２）タイムテーブルタイムテーブルとは、合成された確認プロンプトの時間
データであり、確認プロンプトに対応して、項目毎ある
いは言いまわし等の部品毎に提示時間が定められてい
る。(2) Time Table The time table is the time data of the synthesized confirmation prompt, and the presentation time is determined for each item or for each part such as wording in correspondence with the confirmation prompt.

【０４２８】例えば、図３１では、出発地のタイムテー
ブルａに２ｓ（２秒）、目的地のタイムテーブルｂに４
ｓ、日付の「７月７日」ｃに３ｓ、言いまわしのタイム
テーブルｄに６ｓと定められている。For example, in FIG. 31, 2 s (2 seconds) is set in the time table a of the departure place, and 4 s is set in the time table b of the destination.
s, 3s for the date “July 7” c, and 6s for the timetable d.

【０４２９】（３）発話の認識時点上記タイムテーブル（２）を用いて、ユーザの発話がタ
イムテーブルのどの時点で発話されたかがわかる。(3) Speech Recognition Time Using the time table (2), it is possible to know at which point in the time table the utterance of the user was uttered.

【０４３０】例えば、図３１では、「札幌発」を発話中
のタイムテーブルａのａ１時点でユーザの『はい』４６
１の発話があり、「隠岐空港行き」を発話中のタイムテ
ーブルｂのｂ１時点でユーザの『壱岐』４６２の発話が
あり、「７月７日」を発話中のタイムテーブルｃのｃ１
時点でユーザの『違う』４６３の発話があり、「の便で
よろしいでしょうか」を発話中のタイムテーブルｄのｄ
１時点でユーザの『はい』４６４の発話があり、同じく
タイムテーブルｄのｄ２時点でユーザの無言が検出さ
れ、同じくタイムテーブルｄのｄ３時点でユーザの認識
不可が検出されていることが分かる。For example, in FIG. 31, "Yes" 46 of the user is displayed at the time point a1 of the time table a during which "Sapporo departure" is being uttered.
There is an utterance of “1”, a utterance of “Iki” 462 of the user at the time point b1 of the time table “b” uttering “Oki Airport”, and c1 of a time table “c” uttering “July 7”.
At the time point, the user has an utterance of “different” 463, and the timetable d is uttering “Is it okay with the flight?”
At one point in time, there is an utterance of “Yes” 464 by the user, and it is understood that the user's mute is detected at the time point d2 of the time table d, and that the user cannot be recognized at the time point d3 of the time table d.

【０４３１】（４）認識結果の判定認識結果には、「肯定語」、「関連語」、「否定語」、
「未入力」、「認識不可」があり、発話の認識時点の発
話より、「札幌発」を発話中のタイムテーブルａのａ１
時点でユーザの『はい』４６１の発話があると認識結果
は「肯定語」となり、「隠岐空港行き」を発話中のタイ
ムテーブルｂのｂ１時点でユーザの『壱岐』４６２の発
話があると認識結果は「関連語」となり、「７月７日」
を発話中のタイムテーブルｃのｃ１時点でユーザの『違
う』４６３の発話があると認識結果は「否定語」とな
り、「の便でよろしいでしょうか」を発話中のタイムテ
ーブルｄのｄ１時点でユーザの『はい』４６４の発話が
あると認識結果は「肯定語」となり、同じくタイムテー
ブルｄのｄ２時点でユーザの無言が検出されると認識結
果は「未入力」となり、同じくタイムテーブルｄのｄ３
時点でユーザの認識不可が検出されると認識結果は「認
識不可」と判定されることが分かる。また、プロンプト
の出力中では、認識文法を切り替えることにより、「関
連語」の判定もできる。(4) Judgment of Recognition Result The recognition result includes “positive words”, “related words”, “negative words”,
There are "Not entered" and "Unrecognizable". From the utterance at the time of recognition of the utterance, a1 in the time table a in which "Sapporo departure" is being uttered
If there is an utterance of “Yes” 461 of the user at the time point, the recognition result becomes “positive word”, and it is recognized that there is an utterance of “Iki” 462 of the user at the time point b1 of the time table “b” uttering “Oki Airport”. The result is "related words" and "July 7"
When there is an utterance of "different" 463 of the user at the time point c1 of the time table c during speech, the recognition result becomes a "negative word", and at the time point d1 of the time table d during the speech time, "Is it okay for convenience?" If there is an utterance of “yes” 464 of the user, the recognition result is “positive word”. Similarly, if the user's mute is detected at the time point d2 of the time table d, the recognition result is “not input”. d3
It can be seen that if the user cannot be recognized at the time, the recognition result is determined to be “unrecognizable”. In addition, during the output of the prompt, by switching the recognition grammar, the "related word" can be determined.

【０４３２】（５）ダイアログフローの変更確認プロンプトのどこでユーザの発話が認識されたか
と、認識内容によりダイアログフローを変更できる。(5) Changing Dialog Flow The dialog flow can be changed depending on where the user's utterance is recognized in the confirmation prompt and the content of recognition.

【０４３３】例えば、「札幌発」を発話中のタイムテー
ブルａのａ１時点でユーザの『はい』４６１の発話があ
ると認識結果は「肯定語」となり、確認プロンプトを継
続する（ステップ４６７）。For example, if there is an utterance of "Yes" 461 of the user at the time a1 of the time table a while "Sapporo departure" is being uttered, the recognition result becomes "Positive word" and the confirmation prompt is continued (step 467).

【０４３４】また、「隠岐空港行き」を発話中のタイム
テーブルｂのｂ１時点でユーザの『壱岐』４６２の発話
があると認識結果は「関連語」となり、到着地ステート
へ移行する（ステップ４６８）。If the user utters "Iki" 462 at the time point b1 of the time table b during the utterance of "going to Oki Airport", the recognition result becomes "related word", and the process shifts to the destination state (step 468). ).

【０４３５】また、「７月７日」ｃ発話中のタイムテー
ブルｃのｃ１時点でユーザの『違う』４６３の発話があ
ると認識結果は「否定語」となり、日付ステートへ移行
する（ステップ４６９）。[0435] If there is an utterance of "different" 463 by the user at the time point c1 of the time table c during the "July 7" c utterance, the recognition result becomes "negative word" and the process shifts to the date state (step 469). ).

【０４３６】また、「の便でよろしいでしょうか」を発
話中のタイムテーブルｄのｄ１時点でユーザの『はい』
４６４の発話があると認識結果は「肯定語」となり、終
了プロンプトへ移行する（ステップ４７０）。Also, at the time point d1 of the time table d in which "Is it okay with the flight", "Yes" of the user is displayed.
If there is an utterance of 464, the recognition result becomes “positive word” and the process shifts to an end prompt (step 470).

【０４３７】また、同じくタイムテーブルｄのｄ２時点
でユーザの無言が検出されると認識結果は「未入力」と
なり、例示プロンプトへ移行する（ステップ４７１）。If the user's mute is detected at the time point d2 of the time table d, the recognition result becomes "not input" and the process proceeds to an example prompt (step 471).

【０４３８】また、同じくタイムテーブルｄのｄ３時点
でユーザの認識不可が検出されると認識結果は「認識不
可」となり、同じく例示プロンプトへ移行する（ステッ
プ４７１）。[0438] Also, if the user's unrecognizable state is detected at the time point d3 of the time table d, the recognition result becomes "unrecognizable" and the process proceeds to the example prompt (step 471).

【０４３９】（６）認識文法の切り替え音声認識で用いられる認識文法は、関連語を認識するた
めに切り替えられる。例えば、出発地の確認プロンプト
出力中もしくは出力直後でプロンプトの空港名が間違っ
ている場合に、他の空港名や出発地などのキーワードが
発話されることがある。このユーザの発話を認識するた
めに、出力している確認プロンプトの場所により、認識
文法を切り替える。(6) Switching of Recognition Grammar The recognition grammar used in speech recognition is switched to recognize related words. For example, if the airport name of the prompt is incorrect during or immediately after the output of the confirmation prompt for the departure point, a keyword such as another airport name or departure point may be uttered. In order to recognize the utterance of the user, the recognition grammar is switched according to the location of the output confirmation prompt.

【０４４０】図３１の（６）では、タイムテーブルのｃ
の部分、すなわち日付が切り替えられる場合を示してい
る。In (6) of FIG. 31, the time table c
, That is, the case where the date is switched.

【０４４１】図３２はダイアログフローの選択に用いる
フレーム情報格納部３４８に格納されたフレーム情報の
データフォーマット例を示す図である。FIG. 32 is a diagram showing a data format example of frame information stored in the frame information storage unit 348 used for selecting a dialog flow.

【０４４２】フレーム情報４８０は、確認項目４８１
と、認識語４８２と、認識状況４８３を含む。The frame information 480 contains the confirmation item 481
, A recognition word 482, and a recognition state 483.

【０４４３】ここで、確認項目４８１は、ユーザの確認
判断が必要な項目で、ユーザの発話を認識した場合、正
しいかどうか検証しなければならない項目である。[0443] Here, the confirmation item 481 is an item that requires a user's confirmation judgment, and is an item that must be verified whether it is correct when a user's utterance is recognized.

【０４４４】また、認識語４８２は音声認識の結果とし
て得られた文字列である。The recognition word 482 is a character string obtained as a result of speech recognition.

【０４４５】認識状況４８３は、確認項目４８１につい
ての認識語４８２の認識状況を示すもので、「確認済
み」、「再入力の要求」、「未確認」がある。The recognition status 483 indicates the recognition status of the recognition word 482 for the confirmation item 481, and includes “confirmed”, “re-input request”, and “unconfirmed”.

【０４４６】ここで、「確認済み」は認識結果の判定が
「肯定語」の場合であり、「再入力の要求」は認識結果
の判定が「否定語」または「関連語」の場合であり、
「未確認」は認識結果の判定が「未入力」、「認識不
可」の場合である。[0446] Here, "confirmed" means that the judgment of the recognition result is "positive word", and "re-input request" means that the judgment of the recognition result is "negative word" or "related word". ,
“Unconfirmed” is a case where the judgment of the recognition result is “uninput” and “unrecognizable”.

【０４４７】（１）例えば、確認プロンプトで「札幌空
港」を出力中に、ユーザの「肯定」の発話があった場合
は「確認済み」である。（２）また、確認プロンプトで「隠岐空港」を出力中
に、ユーザの「否定」の発話があった場合は「再入力の
要求」である。（３）また、確認プロンプト出力後にユーザのの発話が
なかった場合は「未確認」である。(1) For example, when “Sapporo Airport” is being output at the confirmation prompt and the user utters “Yes”, it is “Confirmed”. (2) Also, if the user utters “No” while outputting “Oki Airport” at the confirmation prompt, this is a “re-input request”. (3) If there is no utterance of the user after the output of the confirmation prompt, it is “unconfirmed”.

【０４４８】このように、本実施形態では、装置とユー
ザの対話状況を監視して装置とユーザの対話状況に応じ
て質問形式を変え、人との対話に近い自然な対話を実現
するようにしたので、以下の効果を奏する。As described above, in this embodiment, the dialogue state between the device and the user is monitored, the question format is changed according to the dialogue state between the device and the user, and a natural dialogue close to the dialogue with a person is realized. Therefore, the following effects are obtained.

【０４４９】（１）装置とユーザの対話状況に応じて質
問形式を変えるので、確認作業の時間が短くなる。（２）装置とユーザの対話状況に応じて質問形式を変え
るので、ユーザの対話状況に応じた適切な音声ガイドが
できる。（３）また、ユーザは戸惑うことなく作業を進めること
ができるので、ユーザの負荷が軽減される。(1) Since the question format is changed according to the situation of the dialog between the device and the user, the time for the confirmation work is shortened. (2) Since the question form is changed according to the dialogue situation between the device and the user, an appropriate voice guide can be provided according to the dialogue situation of the user. (3) Also, since the user can proceed with the work without being confused, the load on the user is reduced.

【０４５０】[0450]

【発明の効果】以上説明したように、本発明では、電話
による相手側からの発話を認識して応答する音声認識応
答装置において、上記相手側の発話状態を検出する発話
状態検出手段と、上記発話状態検出手段で検出された相
手側の発話状態に基づいて該相手側への発話を制御する
発話制御手段と、を具備するようにしたので、以下の効
果を奏する。As described above, according to the present invention, in a voice recognition responding apparatus that recognizes and responds to a speech from a partner on a telephone, the utterance state detecting means for detecting the utterance state of the partner, The utterance control means for controlling the utterance to the other party based on the utterance state of the other party detected by the utterance state detection means has the following effects.

【０４５１】（１）ユーザの熟練度や環境度等の相手側
の発話状態を検出し、音声ガイド、対話及び対話の流れ
を変化させるようにしたので、ユーザの熟練度や周囲の
環境及び回線の状態に応じて適切な応答をすることがで
き、ユーザに負担をかけない自然な対話を実現するとと
もに、通話時間を短くすることができる。（２）また、ユーザにストレスを与えない対話装置とす
ることができ、さらに電話回線の有効利用を図ることが
できる。(1) The utterance state of the other party, such as the user's skill level and environmental level, is detected, and the voice guide, the dialogue, and the flow of the dialogue are changed, so that the user's skill level, the surrounding environment, and the line are changed. , An appropriate response can be made in accordance with the state of the user, a natural conversation without burdening the user can be realized, and the call time can be shortened. (2) Further, it is possible to provide a conversation device that does not give stress to the user, and it is possible to effectively use the telephone line.

【０４５２】また、本発明では、電話による相手側から
の発話を認識して応答する音声認識応答装置において、
上記相手側の発話内容を認識する発話内容認識手段と、
上記発話内容認識手段で認識された相手側の発話内容を
確認する発話内容確認手段と、上記発話内容確認手段の
確認結果に基づいて相手側への発話内容を変更する発話
内容変更手段と、を具備するようにしたので、以下の効
果を奏する。According to the present invention, there is provided a voice recognition response device which recognizes and responds to a speech from a partner by telephone.
Utterance content recognition means for recognizing the utterance content of the other party;
Speech content confirmation means for confirming the speech content of the other party recognized by the speech content recognition means, and speech content change means for changing the speech content to the other party based on the confirmation result of the speech content confirmation means, As a result, the following effects can be obtained.

【０４５３】（１）対話中同じシステムプロンプトをで
きるだけ繰り返さないようにしたので、ユーザに不快感
を与えない。（２）同じ内容の質問を繰り返すことなく関連性のある
質問をして認識結果を絞り込み、認識精度を上げたの
で、対話時間を減少できる。（３）ユーザの応答の仕方をもとに、装置がユーザに働
きかけることで認識しやすい状況を作り出し、認識精度
をあげたので、同じく対話時間を減少できる。（４）自然な数字復唱プロンプトを作成するようにした
ので、低コストに聞き取りやすいシステムプロンプトを
作成できる。(1) The same system prompt is not repeated as much as possible during a dialogue, so that the user is not uncomfortable. (2) Relevant questions are asked without repetition of questions of the same content, and the recognition result is narrowed down. The recognition accuracy is increased, so that the conversation time can be reduced. (3) Based on how the user responds, the apparatus works on the user to create an easy-to-recognize situation, and the recognition accuracy is increased, so that the conversation time can be similarly reduced. (4) Since a natural number repetition prompt is created, a low-cost, easy-to-hear system prompt can be created.

【０４５４】また、本発明では、電話による相手側から
の発話を認識して確認の応答をする音声認識応答装置に
おいて、上記相手側の発話を確認項目ごとに分割して確
認プロンプトを作成する確認プロンプト作成手段と、上
記確認プロンプト中の確認項目ごとの発話時間を設定す
る発話時間設定手段と、上記発話時間設定手段で設定さ
れた発話時間中に相手側からの入力があった場合、該入
力された相手側の発話内容を検出する発話内容検出手段
と、上記入力された相手側の発話の入力時点を検出する
入力時点検出手段と、を有し、上記入力時点検出手段で
検出された相手側の発話の入力時点と、上記発話内容検
出手段で検出された相手側の発話内容に基づいて確認の
応答をするようにしたので、以下の効果を奏する。Also, in the present invention, in a voice recognition response device that recognizes a speech from a partner on a telephone and responds to confirmation, a confirmation that creates a confirmation prompt by dividing the partner's utterance for each confirmation item. Prompt creation means, utterance time setting means for setting an utterance time for each confirmation item in the confirmation prompt, and when there is an input from the other party during the utterance time set by the utterance time setting means, Utterance content detection means for detecting the utterance content of the other party, and input time detection means for detecting the input time of the input utterance of the other party, and the other party detected by the input time detection means Since the response to the confirmation is made based on the input time of the utterance on the side and the utterance content of the other party detected by the utterance content detection means, the following effects are obtained.

【０４５５】（１）装置とユーザの対話状況に応じて質
問形式を変えるので、確認作業の時間が短くなる。（２）装置とユーザの対話状況に応じて質問形式を変え
るので、ユーザの対話状況に応じた適切な音声ガイドが
できる。（３）また、ユーザは戸惑うことなく作業を進めること
ができるので、ユーザの負荷が軽減される。(1) Since the question format is changed according to the situation of the dialog between the device and the user, the time for the confirmation work is shortened. (2) Since the question form is changed according to the dialogue situation between the device and the user, an appropriate voice guide can be provided according to the dialogue situation of the user. (3) Also, since the user can proceed with the work without being confused, the load on the user is reduced.

[Brief description of the drawings]

【図１】本発明が適用された第１の実施形態に係わる音
声認識応答装置の全体構成を示す概略ブロック図。FIG. 1 is a schematic block diagram showing the overall configuration of a speech recognition response device according to a first embodiment to which the present invention has been applied.

【図２】図１に示したフレーム・意味辞書に格納された
フレームと意味情報の説明図。FIG. 2 is an explanatory diagram of frames and semantic information stored in a frame / semantic dictionary shown in FIG. 1;

【図３】図２に示したスロット制約を用いることで、認
識結果判定装置で複数の認識結果候補から１つの認識結
果が判定される場合の説明図。FIG. 3 is an explanatory diagram of a case where one recognition result is determined from a plurality of recognition result candidates by a recognition result determination device by using the slot constraint shown in FIG. 2;

【図４】図１に示した習熟度判定装置で判定される習熟
度の判定処理方法を説明する図。FIG. 4 is a view for explaining a proficiency level determination processing method determined by the proficiency level determination device shown in FIG. 1;

【図５】図１に示した環境度判定装置で判定される環境
度の判定処理方法を説明する図。FIG. 5 is an exemplary view for explaining an environment degree determination processing method determined by the environment degree determination apparatus shown in FIG. 1;

【図６】図１に示した音声ガイド選択装置で選択される
音声ガイドの選択手法を説明する図。FIG. 6 is an exemplary view for explaining a method of selecting a voice guide selected by the voice guide selecting device shown in FIG. 1;

【図７】本発明が適用された第１の実施形態の処理手順
を示すフローチャート。FIG. 7 is a flowchart showing a processing procedure according to the first embodiment to which the present invention is applied;

【図８】図1に示した各構成要素間の作用を説明する
図。FIG. 8 is a view for explaining the operation between the components shown in FIG. 1;

【図９】航空券の予約をする場合で、出発地が羽田、到
着地が札幌、出発の年月日が１９９８年１２月３１日午
後５時３０分の航空券を予約する場合の従来の音声ガイ
ドの説明図。FIG. 9 shows a conventional airline ticket reservation in which the departure point is Haneda, the arrival point is Sapporo, and the departure date is December 31, 1998 at 5:30 pm. Explanatory drawing of an audio guide.

【図１０】図９の音声ガイドにおいて誤認識がある場合
の説明図。FIG. 10 is an explanatory diagram in the case where there is an erroneous recognition in the voice guide of FIG. 9;

【図１１】同じく、図９の音声ガイドにおいて誤認識が
ある場合の説明図。11 is an explanatory view showing a case where there is an erroneous recognition in the voice guide of FIG. 9;

【図１２】第２の実施形態の全体的な処理の流れを示す
構成図。FIG. 12 is a configuration diagram illustrating an overall processing flow according to the second embodiment;

【図１３】本発明が適用された第１の実施形態に係わる
音声認識応答装置における航空機発着案内サービスの処
理手順を説明するフローチャート。FIG. 13 is a flowchart illustrating a processing procedure of an aircraft arrival / departure guidance service in the voice recognition response device according to the first embodiment to which the present invention is applied.

【図１４】同じく本発明が適用された第１の実施形態に
係わる音声認識応答装置における航空機発着案内サービ
スの処理手順を説明するフローチャート。FIG. 14 is a flowchart illustrating a processing procedure of an aircraft departure and arrival guidance service in the voice recognition response device according to the first embodiment to which the present invention is applied.

【図１５】同じく本発明が適用された第１の実施形態に
係わる音声認識応答装置における航空機発着案内サービ
スの処理手順を説明するフローチャート。FIG. 15 is a flowchart illustrating a processing procedure of an aircraft arrival / departure guidance service in the voice recognition response device according to the first embodiment to which the present invention is applied.

【図１６】同じく本発明が適用された第１の実施形態に
係わる音声認識応答装置における航空機発着案内サービ
スの処理手順を説明するフローチャート。FIG. 16 is a flowchart illustrating a processing procedure of an aircraft arrival / departure guidance service in the voice recognition / response apparatus according to the first embodiment to which the present invention is applied.

【図１７】ユーザの発声量が大きければシステムプロン
プトの音量も大きくし、ユーザの発声量が小さければシ
ステムプロンプトの音量も小さくするようにしたユーザ
の発声量とシステムプロンプト音量との関係を示す図。FIG. 17 is a diagram showing the relationship between the volume of the user's utterance and the volume of the system prompt, in which the volume of the system prompt is increased if the volume of the user's utterance is large, and the volume of the system prompt is also reduced if the volume of the user's utterance is small; .

【図１８】ユーザの発声量が大きければシステムプロン
プトの音量も大きくし、ユーザの発声量が小さければシ
ステムプロンプトの音量も小さくするようにしたユーザ
の発声量とシステムプロンプト音量との関係を示すテー
ブル。FIG. 18 is a table showing the relationship between the volume of a user's utterance and the volume of a system prompt in which the volume of a system prompt is reduced when the volume of the user's voice is large and the volume of the system prompt is also reduced when the volume of the user's voice is small; .

【図１９】空港名と都道府県名の整合処理について説明
する図。FIG. 19 is an exemplary view for explaining matching processing between an airport name and a prefecture name;

【図２０】規定回数認識不可や誤認識を繰り返した場合
は、それまでに使用したユーザの発話を復唱プロンプト
に用い、これによってユーザの欠点をユーザに気づかせ
ることによって認識不可や誤認識を繰り返さないように
する場合の処理手順を示すフローチャート。FIG. 20 shows a case where recognizing is impossible or erroneous recognition is repeated a predetermined number of times, and the utterance of the user who has been used so far is used as a repetition prompt. 9 is a flowchart illustrating a processing procedure when the setting is not performed.

【図２１】カテゴリ毎のデータ数が非常に多い場合に本
来得たいデータを得やすくする手法について、市外局番
を例にとって説明するフローチャート。FIG. 21 is a flowchart illustrating a method of easily obtaining desired data when the number of data for each category is very large, taking an area code as an example.

【図２２】市外局番辞書についての説明図で、全国で用
いられている２桁から５桁までの市外局番全てが登録さ
れた全国市外局番辞書と、都道府県単位に市外局番が登
録されている都道府県レベルの辞書と、都道府県単位に
作成されてその都道府県内に含まれる市区町村名に関連
付けて市外局番が登録されている市区町村レベルの辞書
の説明図。FIG. 22 is an explanatory diagram of an area code dictionary. A nationwide area code dictionary in which all area codes of 2 to 5 digits used in the whole country are registered, and an area code of each prefecture. FIG. 4 is an explanatory diagram of a registered prefecture-level dictionary and a dictionary of a municipal level in which an area code is registered in association with a municipal name created for each prefecture and included in the prefecture;

【図２３】第３の実施形態に係わる音声認識応答装置の
全体構成を示すブロック図。FIG. 23 is a block diagram showing an overall configuration of a speech recognition response device according to a third embodiment.

【図２４】本発明が適用された第３の実施形態の処理手
順を示すフローチャート。FIG. 24 is a flowchart illustrating a processing procedure according to a third embodiment to which the present invention is applied.

【図２５】同じく、本発明が適用された第３の実施形態
の処理手順を示すフローチャート。FIG. 25 is a flowchart showing a processing procedure of a third embodiment to which the present invention is applied.

【図２６】同じく、本発明が適用された第３の実施形態
の処理手順を示すフローチャート。FIG. 26 is a flowchart showing a processing procedure of the third embodiment to which the present invention is applied.

【図２７】同じく、本発明が適用された第３の実施形態
の処理手順を示すフローチャート。FIG. 27 is a flowchart showing a processing procedure according to the third embodiment to which the present invention is applied.

【図２８】図２３に示した各構成要素間の作用を説明す
る図。FIG. 28 is a view for explaining the operation between the components shown in FIG. 23;

【図２９】ユーザの発話時点と内容判定情報に基づいて
ダイアログフローが変更される場合の説明図。FIG. 29 is an explanatory diagram in the case where the dialog flow is changed based on the utterance time of the user and the content determination information.

【図３０】同じく、ユーザの発話時点と内容判定情報に
基づいてダイアログフローが変更される場合の説明図。FIG. 30 is an explanatory diagram showing a case where the dialog flow is changed based on the utterance time of the user and the content determination information.

【図３１】確認プロンプトの時間データを用いてダイア
ログフローを変更する場合及び認識文法を切り替える場
合の処理手順を示すフローチャート。FIG. 31 is a flowchart showing a processing procedure when a dialog flow is changed using time data of a confirmation prompt and when a recognition grammar is switched.

【図３２】ダイアログフローの選択に用いるフレーム情
報格納部３４８に格納されたフレーム情報のデータフォ
ーマット例を示す図。FIG. 32 is a diagram showing a data format example of frame information stored in a frame information storage unit 348 used for selecting a dialog flow.

[Explanation of symbols]

１０音声認識応答装置１１音声入力装置１２音声波形特徴抽出装置１３音声認識装置１４認識結果判定装置１５習熟度判定装置１６環境度判定装置１７対話制御装置１８音声ガイド選択装置１９波形特徴分析装置２０音声出力装置２１音声ガイド辞書２２認識辞書２３波形特徴データベース（波形特徴ＤＢ）２４フレーム・意味辞書２５認識結果データベース（認識結果ＤＢ）２６対話履歴データベース（対話履歴ＤＢ）３０航空券予約フレーム３１意味情報３１ａ空港スロット間制約３１ｂ日付スロット制約３１ｃ時間スロット制約３５ユーザ発話（実際のユーザの発話）３６認識処理３７認識判定処理３８アプリ（アプリ本体）４０音声ガイド用ルール４１音声ガイドデータベース（音声ガイドＤＢ）２０１サービス選択ステート２０２選択サービス確認ステート２０３ＹｅｓＮｏステート２０４認識結果確定部２０５前ステートユーザ発声量チェック部２０６システムプロンプト音量調節部２０７サービス分岐部２０８便名認識ステート２０９便名確認ステート２１０数字発音法プロンプト生成部２１１認識結果確定部２１２運航状況案内部２１３出発空港名認識ステート２１４出発空港名確認ステート２１５都道府県名認識ステート２１６空港名確認ステート２１７認識結果確定部２１８到着空港名認識ステート２１９到着空港名確認ステート２２０都道府県名認識ステート２２１空港名確認ステート２２２認識結果確定部２９０ユーザの発声量とシステムプロンプト音量との
関係テーブル２９１認識空港名候補２９２、２９３、２９４空港と所在地の関係テーブル２９５発話３１０全国市外局番辞書３１１北海道市外局番一覧辞書３１２東京都市外局番一覧辞書３１３大阪府市外局番一覧辞書３１４鹿児島県市外局番一覧辞書３１５北海道市区町村名−市外局番関連辞書３１６東京都市区町村名−市外局番関連辞書３１７大阪府市区町村名−市外局番関連辞書３１８鹿児島県市区町村名−市外局番関連辞書３４０音声認識応答装置３４１音声入力装置３４２音声認識装置３４３認識語入力時点監視装置３４４認識結果判定装置３４５ダイアログフロー管理装置３４６音声プロンプト合成装置３４７音声出力装置３４８フレーム情報格納部３４９ダイアログフローデータベース（ダイアログフ
ローＤＢ）３５０プロンプトデータベース（プロンプトＤＢ）３５１出力プロンプト格納部４８０フレーム情報４８１確認項目４８２認識語４８３認識状況DESCRIPTION OF SYMBOLS 10 Speech recognition response device 11 Speech input device 12 Speech waveform feature extraction device 13 Speech recognition device 14 Recognition result judgment device 15 Proficiency judgment device 16 Environment judgment device 17 Dialogue control device 18 Voice guide selection device 19 Waveform feature analysis device 20 Speech Output device 21 Voice guide dictionary 22 Recognition dictionary 23 Waveform feature database (Waveform feature DB) 24 Frame / meaning dictionary 25 Recognition result database (Recognition result DB) 26 Dialogue history database (Dialogue history DB) 30 Airline ticket reservation frame 31 Meaning information 31a Airport slot restriction 31b Date slot restriction 31c Time slot restriction 35 User utterance (actual user utterance) 36 Recognition processing 37 Recognition determination processing 38 Application (app body) 40 Rules for voice guidance 41 Voice guide database (voice guide D) B) 201 service selection state 202 selected service confirmation state 203 Yes No state 204 recognition result determination section 205 previous state user utterance amount check section 206 system prompt volume control section 207 service branch section 208 flight number recognition state 209 flight number confirmation state 210 digit pronunciation Law prompt generation unit 211 Recognition result determination unit 212 Operation status guidance unit 213 Departure airport name recognition state 214 Departure airport name confirmation state 215 Prefectural name recognition state 216 Airport name confirmation state 217 Recognition result determination unit 218 Arrival airport name recognition state 219 Arrival Airport Name Confirmation State 220 Prefectural Name Recognition State 221 Airport Name Confirmation State 222 Recognition Result Determining Unit 290 Relation Table 291 between User Speech Volume and System Prompt Volume 291 Knowledge airport name candidate 292, 293, 294 Airport and location relation table 295 Utterance 310 Nationwide area code dictionary 311 Hokkaido area code list dictionary 312 Tokyo area code list dictionary 313 Osaka area code list dictionary 314 Kagoshima area Area Code List Dictionary 315 Hokkaido Municipal Area Name-Area Code Related Dictionary 316 Tokyo Metropolitan Area Municipal Name-Area Area Code Related Dictionary 317 Osaka Prefecture Municipal Area Name-Area Code Related Dictionary 318 Kagoshima Prefecture Municipal Area Name-Area Office number related dictionary 340 Speech recognition response device 341 Speech input device 342 Speech recognition device 343 Recognition word input time monitoring device 344 Recognition result judgment device 345 Dialog flow management device 346 Speech prompt synthesis device 347 Speech output device 348 Frame information storage unit 349 Dialog flow Database (Dialog Flow DB) 3 0 prompt database (Prompt DB) 351 outputs the prompt storage unit 480 frame information 481 Check items 482 recognition word 483 recognizes status

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｍ 3/42 Ｐ (72)発明者大本浩司京都府京都市右京区花園土堂町10番地オムロン株式会社内 (72)発明者中嶋宏京都府京都市右京区花園土堂町10番地オムロン株式会社内Ｆターム(参考） 5D015 KK02 KK04 LL00 LL04 LL06 5D045 AB01 AB24 5K024 AA75 BB01 BB02 CC01 DD01 EE09 FF06 GG00 GG01 HH00 9A001 HH17 HH18 JJ62 KK56 ──────────────────────────────────────────────────の Continued on the front page (51) Int.Cl. ⁷ Identification FI FI Theme Court ゛ (Reference) H04M 3/42 P (72) Inventor Koji Omoto 10 Hanazono Dodocho, Ukyo-ku, Kyoto-shi, Kyoto In-company (72) Inventor Hiroshi Nakajima 10-Family Todo-cho, Hanazono, Ukyo-ku, Kyoto, Japan 9A001 HH17 HH18 JJ62 KK56

Claims

[Claims]

1. A speech recognition response device for recognizing and responding to an utterance from a partner on a telephone, comprising: an utterance state detector for detecting the utterance state of the other party; and the other party detected by the utterance state detector. An utterance control means for controlling utterance to the other party based on the utterance state of the speech recognition response device.

2. The utterance state detecting means includes: a proficiency level detecting means for detecting the proficiency level of the utterance of the other party; and an environmental level detecting means for detecting the environmental level of the utterance of the other party. The voice recognition response device according to claim 1, wherein:

3. The voice recognition response device according to claim 1, wherein said utterance control means controls utterance content and utterance speed.

4. The apparatus according to claim 2, wherein said proficiency detecting means is detected based on a time until the other party responds to the utterance from the apparatus and a response time of the other party. A speech recognition response device according to claim 1.

5. The environment level detecting means includes: a strength of environmental noise of a partner, a strength of speech of the partner,
3. An environmental level is detected based on a difference between the utterance strength of the other party and the strength of environmental noise of the other party.
A speech recognition response device according to claim 1.

6. A voice recognition and response device for recognizing and responding to a speech from a partner by telephone, comprising: a voice recognition unit configured to recognize the speech of the partner as character information; and a mastery of the recognized speech of the partner. Proficiency detecting means for detecting the degree of proficiency, environmental level detecting means for detecting the environmental level of the utterance of the other party, and changing the content of utterance from the apparatus side based on the proficiency and environmental level of the utterance of the other party. A speech recognition response device, comprising: utterance content changing means.

7. The speech recognition means includes: a recognition candidate extraction means for extracting a plurality of recognition candidates; and a rule for extracting one fixed candidate from the plurality of recognition candidates extracted by the recognition candidate extraction means. The voice recognition response device according to claim 6, comprising: extracting one fixed candidate from the plurality of recognition candidates using the rule.

8. A voice recognition response method for recognizing and responding to an utterance from a partner on a telephone, comprising: detecting the utterance state of the partner; and detecting the utterance state of the partner based on the detected utterance state of the partner. Controlling speech to the side. A speech recognition response method, comprising:

9. The voice recognition response according to claim 8, wherein the detection of the utterance state includes detecting a proficiency level of the utterance of the other party, and detecting an environmental level of the utterance of the other party. Method.

10. The speech recognition response method according to claim 8, wherein the control of the utterance includes controlling an utterance content and an utterance speed.

11. The method according to claim 9, wherein the detection of the proficiency level is performed based on a time until the other party responds to the utterance from the apparatus side and a response time of the other party. Described voice recognition response method.

12. The method of detecting the degree of environment, comprising: determining the strength of environmental noise of the other party, the strength of speech of the other party,
10. The environment level is detected based on a difference between the utterance strength of the other party and the strength of environmental noise of the other party.
The voice recognition response method described in 1.

13. A voice recognition response method for recognizing and responding to an utterance from the other party by telephone, comprising: a step of recognizing the utterance of the other party as character information; Detecting the environmental level of the utterance of the other party, and changing the content of the utterance from the apparatus based on the proficiency level and the environmental level of the utterance of the other party. Voice recognition response method.

14. The voice recognition includes: a step of extracting a plurality of recognition candidates; and a rule for extracting one fixed candidate from the plurality of extracted recognition candidates. 14. The speech recognition response method according to claim 13, wherein one confirmed candidate is extracted from the candidates using the rule.

15. A speech recognition and response device for recognizing and responding to an utterance from a partner on a telephone, comprising: utterance content recognition means for recognizing the utterance content of the other party; A speech recognition response device comprising: utterance content confirmation means for confirming the utterance content of; and utterance content change means for changing the utterance content to the other party based on the confirmation result of the utterance content confirmation means.

16. The utterance content confirmation unit includes: a recognition target candidate detecting unit that detects a plurality of recognition target candidates; and a related recognition target obtaining unit that obtains a related recognition target including the recognition target. If the utterance content recognition unit cannot confirm the utterance content of the other party, the plurality of recognition target candidates detected by the recognition target candidate detection unit and the recognition target acquired by the related recognition target acquisition unit are included. 16. The speech recognition response device according to claim 15, wherein the utterance content of the other party is specified by matching with the related recognition target.

17. The utterance content confirmation means, when the utterance of the other party has a pronunciation of “Shi” indicating 7
The voice recognition response device according to claim 15, wherein the confirmation is performed by replacing with "Nana".

18. The utterance content confirmation means has utterance storage means for storing the utterance of the other party, and when the utterance content of the other party cannot be confirmed a predetermined number of times or more,
16. The speech recognition response device according to claim 15, wherein the utterance content to be confirmed includes the utterance of the other party stored.

19. A voice recognition response device that utters a voice including a number in response to a voice uttered by the other party, comprising: a one-digit voice prompt storage means for storing a one-digit voice prompt from 0 to 9; 2 for storing two digit voice prompts from 00 to 99
Digit voice prompt storage means, and wherein even-numbered numbers generate a voice utterance using only the two-digit voice prompts stored in the two-digit voice prompt storage means, and odd-numbered numbers indicate the one-digit voice prompts. A speech recognition responder, wherein a speech utterance is created by using only one one-digit speech prompt stored in a storage means.

20. A speech recognition response device for extracting a speech item of a partner from a plurality of items based on a speech from a partner of the telephone, a first registration unit for registering the plurality of items, A second registration unit for dividing and registering a plurality of items registered by the registration unit for each item having a specific attribute and a relevance, and a relevancy extraction unit for extracting the relevance, When the utterance item of the other party cannot be extracted by the first registration means, the relevance extraction means extracts the relevance at a specific attribute including the utterance item, and determines the relevance at this specific attribute. A speech recognition response device for extracting a speech item of the other party from the second registration means based on the second registration means.

21. A voice recognition response device for recognizing and responding to an utterance from a partner on a telephone, comprising: a volume detector for detecting a volume of the partner from the utterance of the partner; Response volume control means for controlling the volume of the response based on the volume of the other party.

22. The voice recognition apparatus according to claim 21, wherein the response volume control means increases the response volume as the volume of the other party increases, and decreases the response volume as the volume of the other party decreases. Answering device.

23. A voice recognition response method for recognizing and responding to an utterance from the other party by telephone, comprising the steps of: recognizing the utterance content of the other party; and confirming the recognized utterance content of the other party. Changing the content of the utterance to the other party based on the result of the confirmation.

24. The confirmation of the utterance content includes a step of detecting a plurality of recognition target candidates and a step of acquiring a related recognition target including the recognition target, and confirming the utterance content of the other party. If not, the plurality of recognition target candidates detected in the step of detecting the recognition target candidate are matched with the related recognition target including the recognition target obtained in the step of obtaining the related recognition target. 24. The voice recognition response method according to claim 23, wherein the utterance content of the other party is specified.

25. The step of confirming the content of the utterance includes the step of: when the utterance of the other party has a pronunciation of “Shi” indicating 7
24. The voice recognition response method according to claim 23, wherein the confirmation is performed by replacing with "Nana".

26. The step of confirming the content of the utterance includes storing the utterance of the other party, and if the utterance content of the other party cannot be confirmed a predetermined number of times or more,
24. The voice recognition response method according to claim 23, wherein the utterance content to be confirmed includes the utterance of the other party stored.

27. A voice recognition response method in which a voice including a number is uttered in response to an utterance from the other party by telephone, wherein a one-digit voice prompt storage means for storing a one-digit voice prompt from 0 to 9; 2 for storing two digit voice prompts from 00 to 99
Digit voice prompt storage means, and wherein even-numbered numbers generate a voice utterance using only the two-digit voice prompts stored in the two-digit voice prompt storage means, and odd-numbered numbers indicate the one-digit voice prompts. A voice recognition response method, wherein a voice utterance is created by using only one one-digit voice prompt stored in a storage unit.

28. A voice recognition response method for extracting an utterance item of the other party from a plurality of items based on an utterance from a caller, a first registration unit for registering the plurality of items, The second step of dividing and registering a plurality of items registered in the registration step of each item with a specific attribute and related
And a relevancy extracting step of extracting the relevancy. If the first registration means cannot extract the utterance item of the other party, the utterance item is included in the relevancy extraction step. A speech recognition response method comprising: extracting a relevance in a specific attribute to be extracted; and extracting an utterance item of the other party from the second registration means based on the relevance in the specific attribute.

29. A voice recognition response method for recognizing and responding to an utterance from the other party by telephone, comprising detecting a volume of the other party from the utterance of the other party, and based on the detected volume of the other party. A voice recognition response method, comprising controlling a response volume.

30. The sound according to claim 29, wherein the control of the volume of the response is such that the response volume increases as the volume of the other party increases, and the response volume decreases as the volume of the other party decreases. Recognition response method.

31. A voice recognition response device for recognizing an utterance from the other party by telephone and responding to the confirmation, comprising: a confirmation prompt creating means for dividing the utterance of the other party for each confirmation item to create a confirmation prompt; An utterance time setting means for setting an utterance time for each confirmation item in the confirmation prompt; and an input from the other party when there is an input from the other party during the utterance time set by the utterance time setting means. Utterance content detection means for detecting the utterance content of the above, and input time point detection means for detecting the input time point of the input utterance of the other party, and the utterance content of the other party detected by the input time point detection means A speech recognition response device, which responds to confirmation based on the input time point and the utterance content of the other party detected by the utterance content detection means.

32. A confirmation prompt change for changing the content of the confirmation prompt based on the input time of the utterance of the other party detected by the input time detecting means and the utterance content of the other party detected by the utterance content detecting means. 32. The voice recognition response device according to claim 31, comprising means.

33. A voice recognition response method for recognizing a speech by recognizing an utterance of the other party by telephone, comprising: creating a confirmation prompt by dividing the utterance of the other party for each confirmation item; Setting the utterance time for each confirmation item in the prompt; and, if there is an input from the other party during the utterance time set in the utterance time setting step, the utterance content of the inputted other party is Detecting the input time point of the input utterance of the other party, and detecting the input time point of the utterance of the other party detected in the detecting step of the input time point and the content of the utterance A response to confirmation based on the content of the utterance of the other party detected in the step of performing voice recognition.

34. The content of the confirmation prompt is changed based on the input time point of the utterance of the other party detected in the step of detecting the input time point and the utterance content of the other party detected in the step of detecting the utterance content. The voice recognition response method according to claim 33, wherein: