JPH0518118B2

JPH0518118B2 -

Info

Publication number: JPH0518118B2
Application number: JP59103625A
Authority: JP
Inventors: Hiroyuki Senbon; Yoichi Takebayashi
Original assignee: Tokyo Shibaura Electric Co Ltd
Current assignee: Toshiba Corp
Priority date: 1984-05-24
Filing date: 1984-05-24
Publication date: 1993-03-11
Also published as: JPS60247697A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は音声入力による情報処理システムに用
いられる音声対話装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a voice dialogue device used in an information processing system using voice input.

[Technical background of the invention and its problems]

近年、音声認識、合成技術の発達が目覚まし
く、例えば連続音声認識や不特定話者を対象とし
た音声認識が可能となり、また一方、精度の高い
音声合成が可能となつている。 BACKGROUND ART In recent years, speech recognition and synthesis technology has made remarkable progress. For example, it has become possible to perform continuous speech recognition and speech recognition for unspecified speakers, and it has also become possible to perform highly accurate speech synthesis.

この様な技術を用いて電話公衆回線による各種
のサービスを行なう電話音声応答サービス、例え
ば銀行における預金残高の照会等が開発されてお
り、その有用性が注目されている。ところでこの
種のシステムのユーザは不特定多数であり、例え
ば老人、子供のようにシステムに不慣れな人もい
れば１日に何回も利用する熟練した人もいる。こ
れにもかかわらず、従来のシステムでは音声応答
の内容（様式）が固定的であり、又ユーザが音声
を入力した時から音声応答が出力されるまでの時
間や音声応答の速度も一定である為、全てのユー
ザにとつて扱い易いものとは云えず、人間と機械
との対話が円滑になされていなかつた。例えば電
話による銀行の預金残高照会サービスにおいて
は、ユーザが口座番号「123…」を電話口で音声
入力する場合「ピー」という入力要求信号音が聞
えると先ず「１」と云う。すると10秒程度経つと
後に「１」という確認のための音声応答が聞え
る。続いてユーザが「２」と云う。…という様に
この種のシステムに慣れたユーザにとつては応答
時間が冗長で苛立しさが生じたり、或いは慣れて
いないユーザにとつては応答内容が分り難いとい
う欠点があつた。 Telephone voice response services have been developed using such technology to provide various services over telephone public telephone lines, such as checking deposit balances at banks, and their usefulness is drawing attention. By the way, the users of this type of system are an unspecified number of people, and there are people who are not familiar with the system, such as elderly people and children, and there are also experienced people who use the system many times a day. Despite this, in conventional systems, the content (format) of the voice response is fixed, and the time from when the user inputs voice until the voice response is output and the speed of the voice response are also constant. Therefore, it could not be said that it was easy for all users to use, and interaction between humans and machines was not smooth. For example, in a bank account balance inquiry service by telephone, when a user inputs an account number "123..." by voice over the telephone, the user first says "1" when an input request signal sound of "beep" is heard. Then, after about 10 seconds, you will hear a confirmation voice response saying "1". Then the user says "2". For users who are accustomed to this type of system, the response time is lengthy and irritating, and for users who are not accustomed to it, the response content is difficult to understand.

[Purpose of the invention]

本発明の目的は、人間と機械との対話を円滑に
行なうことが可能となる音声対話装置を提供する
ことにある。 An object of the present invention is to provide a voice dialogue device that enables smooth dialogue between humans and machines.

[Summary of the invention]

本発明は、音声信号の入力を要求する信号を出
力する入力要求手段と、この入力要求手段により
入力要求信号が出力された後に、音声信号を入力
する入力手段と、この入力手段により入力された
音声信号を認識する認識手段と、前記入力要求手
段により入力要求信号が出力された時点から前記
入力手段による音声信号の入力が検出された時点
までの時間T1を計測する計測手段と、前記認識
手段による音声信号の認識結果に対応した音声応
答信号を出力する出力手段と、この出力手段によ
り音声応答信号を出力する際に、前記計測手段に
より計測された時間T1に基づいて、前記入力手
段による音声信号の入力が終了したことが検出さ
れた時点から前記音声応答信号を出力する時点ま
での時間T3、あるいは前記音声応答信号の出力
にかかる時間T4を可変制御する制御手段とを具
備したことを特徴とするものである。 The present invention provides an input requesting means for outputting a signal requesting input of an audio signal, an input means for inputting an audio signal after the input requesting means outputs an input request signal, and an input means for inputting an audio signal input by the input means. a recognition means for recognizing an audio signal; a measuring means for measuring a time T1 from the time when the input request signal is output by the input request means to the time when the input of the audio signal is detected by the input means; and the recognition means output means for outputting a voice response signal corresponding to the recognition result of the voice signal by the input means; It is characterized by comprising a control means for variably controlling the time T3 from the time when it is detected that the input of the signal has ended to the time when the voice response signal is output, or the time T4 required for outputting the voice response signal. That is.

〔Effect of the invention〕

本発明によれば、各ユーザに応じて適切な応答
を与えることが可能になる為に、人間と機械の対
話を円滑に行うことができ、ユーザにとつては実
用性が向上する。 According to the present invention, since it becomes possible to give an appropriate response according to each user, interaction between a human and a machine can be carried out smoothly, and practicality for the user is improved.

[Embodiments of the invention]

以下、図面を参照しながら本発明の実施例につ
いて説明する。 Embodiments of the present invention will be described below with reference to the drawings.

第１図は本発明の第１の実施例の概略構成図、
第２図は音声入力要求Ｐと音声入力R₁と音声応
答R₂のタイミングを模式化した模式図、第３図
は本発明の第１の実施例の処理フロー図である。
この第１の実施例は音声入力要求が出力されてか
らユーザにより音声が入力されるまでの時間を測
定し、その時間に応じて音声応答を制御して出力
するものである。第１図の点線内の各ブロツクは
音声認識応答装置を構成し、この入出力は図示し
ないサービス端末に接続されている。例えばユー
ザがサービス端末である電話器より所定の電話番
号を入力すると受話器を通して「預金残高照会サ
ービスを行ないます。ピーという信号音が聞えた
ら口座番号を１つずつ順に答えて下さい」という
応答が音声応答出力部７より送られる。この時点
からの音声入力と応答のやりとりに本発明が適用
される。先ず第１図の音声入力開始要求部８より
音声入力要求信号Ｐが図示しない端末へ出力され
るが、これは同時にタイミング測定部３にも送ら
れる（第３図ステツプ11）。タイミング測定部３
はこの音声入力要求信号Ｐを入力した時点を測定
する。ユーザ側では音声入力要求信号である「ピ
ー」という信号音を聞くと、「１」という音声を
受話器より入力する（第３図ステツプ12）。この
入力音声R₁は分析器１に入力されるとＡ／Ｄ変
換、スペクトル分析処理などが行なわれて、入力
された音声信号が特徴パラメータの系列（音声パ
ターン）に変換される（第３図ステツプ13）。音
声区間検出部２では、分析器１より出力された特
徴パラメータ系列（音声パターン）のエネルギー
情報を利用して音声パターン中の始端と終端を検
出し音声区間を切り出すものである（第３図ステ
ツプ14）。この音声区間検出部は音声パターンの
始端、終端を検出した時点でその始端信号、終端
信号を各々タイミング測定部３へ送る。タイミン
グ測定部３ではこの始端信号を入力した時点を測
定すると共に、先程の音声入力要求信号Ｐを入力
した時点から始端信号を入力した時点までの時間
T₁を計算する（第２図T₁、第３図ステツプ15）。
一方、音声区間検出部２は切り出した音声パター
ン（特徴パラメータ系列）を音声認識部４へ送
る。音声認識部４では入力した音声パターンに対
して予め辞書メモリ５に登録された音声辞書を利
用してその認識を行なうものである（第３図ステ
ツプ16）。この認識は例えば類似度計算法によつ
て行われる。この音声認識部４による音声パター
ンの認識結果はタイミング測定部３により計算さ
れたT₁と共に音声応答制御部６へ送られる。こ
の音声応答制御部６はT₁の長さに基いて（第３
図ステツプ17）音声応答R₂を制御して出力する
ものであるが、この制御の方法には以下の３通り
がある。 FIG. 1 is a schematic configuration diagram of a first embodiment of the present invention,
FIG. 2 is a schematic diagram illustrating the timing of the voice input request P, voice input _R1 , and voice response _R2, and FIG. 3 is a processing flow diagram of the first embodiment of the present invention.
In this first embodiment, the time from when a voice input request is output to when voice is input by the user is measured, and the voice response is controlled and output according to the measured time. Each block within the dotted line in FIG. 1 constitutes a voice recognition response device, whose input and output are connected to a service terminal (not shown). For example, when a user enters a predetermined phone number from a telephone, which is a service terminal, a voice response is heard through the handset saying, "We will perform a bank balance inquiry service. When you hear a beep, please answer your account numbers one by one." It is sent from the response output section 7. The present invention is applied to the exchange of voice input and responses from this point on. First, the audio input request signal P is outputted from the audio input start requesting section 8 in FIG. 1 to a terminal (not shown), and is also sent to the timing measuring section 3 at the same time (step 11 in FIG. 3). Timing measurement section 3
measures the point in time when this audio input request signal P is input. When the user hears the signal tone ``beep'' which is the voice input request signal, the user inputs the voice ``1'' from the receiver (step 12 in FIG. 3). When this input audio R ₁ is input to the analyzer 1, A/D conversion, spectrum analysis processing, etc. are performed, and the input audio signal is converted into a series of characteristic parameters (speech pattern) (Figure 3). Step 13). The speech section detection unit 2 uses the energy information of the feature parameter series (speech pattern) output from the analyzer 1 to detect the start and end of the speech pattern and cut out the speech section (steps in Figure 3). 14). When the voice section detection section detects the start and end of the voice pattern, it sends the start and end signals to the timing measurement section 3, respectively. The timing measurement unit 3 measures the time when this start signal is input, and also measures the time from the time when the audio input request signal P is input to the time when the start signal is input.
Calculate T ₁ (T ₁ in Figure 2, Step 15 in Figure 3).
On the other hand, the speech section detection section 2 sends the extracted speech pattern (feature parameter series) to the speech recognition section 4. The speech recognition unit 4 recognizes the input speech pattern using a speech dictionary registered in advance in the dictionary memory 5 (step 16 in FIG. 3). This recognition is performed, for example, by a similarity calculation method. The voice pattern recognition result by the voice recognition section 4 is sent to the voice response control section 6 together with T ₁ calculated by the timing measurement section 3 . This voice response control section 6 is based on the length of _T1 (the third
Figure Step 17) The voice response _R2 is controlled and output, and there are three methods for this control:

(i) 音声応答制御部６はタイミング測定部３より
T₁と共に終端信号を入力した時点データを入
力する。そして音声応答制御部６はT₁の長さ
に応じて終端信号の入力時点から音声応答R₂
を出力する時点までの時間T₃（第２図T₃）の長
さを可変制御する。つまりT₁が予め定められ
た時間長より短い場合は、ユーザが「ピー」と
いう信号音が聞えるとただちに音声を発声した
ことになり、ユーザがシステムに熟練している
か又は急いでいるものと思われる。このため応
答音声も早めに端末へ出力する必要が有り（第
３図ステツプ18）、T₃の時間長を既定の長さよ
り短くする。又、T₁が予め定められた時間長
より長い場合は、ユーザが「ピー」という信号
音が聞こえた後、かなり経つてから音声を発声
したことになり、ユーザがシステムに慣れてい
ないか又は時間的に余裕があるものと思われ
る。このため応答音声も遅めに端末へ出力する
必要が有り（第３図ステツプ19）、T₃の時間長
を既定の長さより長くする。(i) The voice response control section 6 receives the signal from the timing measurement section 3.
Input the data at the time when the termination signal is input together with _T1 . Then, the voice response control unit 6 responds to the voice response _R2 from the input point of the termination signal according to the length of _T1 .
The length of time T ₃ (T ₃ in FIG. 2) up to the point in time when the output is output is variably controlled. In other words, if T ₁ is shorter than the predetermined length of time, it means that the user uttered the sound as soon as he heard the beep, which suggests that the user is skilled with the system or is in a hurry. It will be done. Therefore, it is necessary to output the response voice to the terminal as soon as possible (step 18 in Figure 3), and the time length of _T3 is made shorter than the predetermined length. Also, if T ₁ is longer than the predetermined time length, it means that the user uttered the voice a long time after hearing the "beep" signal tone, indicating that the user is not familiar with the system or It seems that there is plenty of time. For this reason, it is necessary to output the response voice to the terminal later (step 19 in FIG. 3), and the time length of _T3 is made longer than the predetermined length.

(ii) 音声応答制御部６はT₁の長さに応じて音声
応答R₂を出力する時間（応答速度）を可変制
御する（第２図T₄）。つまりT₁が予め定められ
た時間長より短い場合は上述の理由により応答
速度を速くして音声応答R₂を出力する。T₁が
予め定められた時間長より長い場合は上述の理
由により応答速度を遅くしてR₂を出力する。
この際に、規則合成方式によつて音声応答R₂
が出力される場合には、規則合成の為の種々の
パラメータ（アクセント、ピツチ等）の速度を
制御する。また録音編集方式によつてR₂が出
力される場合には、予め録音された発話速度の
異なる単語や音声素片を選択する様にして応答
速度を制御する。(ii) The voice response control unit 6 variably controls the time (response speed) for outputting the voice response R ₂ according to the length of T ₁ (T ₄ in FIG. 2). That is, if T ₁ is shorter than the predetermined time length, the response speed is increased and the voice response R ₂ is output for the above-mentioned reason. If T ₁ is longer than the predetermined time length, the response speed is slowed down and R ₂ is output for the reason described above.
At this time, the voice response R ₂ is
is output, the speed of various parameters (accent, pitch, etc.) for rule synthesis is controlled. Further, when R ₂ is output by the recording/editing method, the response speed is controlled by selecting words or speech segments that have been recorded in advance and have different speaking speeds.

(iii) 音声応答制御部６はT₁の長さに応じて音声
応答R₂の内容（表現形式）を制御する（第２
図R₂）。例えばユーザが発信音「ピー」が聞え
てから「１」と発声したものとすると、これに
対する認識のための音声応答R₂を出力する場
合に、T₁の長さが予め定められた時間長より
も短い場合には上述した理由により「１」とい
う応答を出力する。T₁が予め定められた時間
より長い場合には上述した理由により「１です
ね、分りました。」という応答を出力する。つ
まり音声応答制御部６は入力された音声パター
ンの認識結果として「１」を音声認識部４より
受け取るが、「１」という確認のための音声応
答の表現形式を変えて出力するものである。(iii) The voice response control unit 6 controls the content (expression format) of the voice response _R2 according to the length of _T1 (second
Figure _R2 ). For example, if the user utters "1" after hearing the dial tone "beep", when outputting the voice response R ₂ for recognition, the length of T ₁ is a predetermined time length. If it is shorter than , a response of "1" is output for the reason mentioned above. If T ₁ is longer than the predetermined time, a response of "It's 1, I understand" is output for the reason mentioned above. In other words, the voice response control section 6 receives "1" from the voice recognition section 4 as a recognition result of the input voice pattern, but outputs the voice response after changing the expression format for confirmation of "1".

こうして（）（）（）によつてT₃，T₄，
R₂の制御方法が決定されると（第３図ステツプ
20）、音声応答出力部７は音声応答制御部６の指
示により音声応答R₂を出力する（第３図ステツ
プ21）。 Thus, by ()()(), T ₃ , T ₄ ,
Once the control method for R ₂ is determined (see step 3 in Figure 3)
20), the voice response output section 7 outputs the voice response _R2 according to the instruction from the voice response control section 6 (step 21 in FIG. 3).

この様に構成された本実施例では、第２図の模
式図に示すように入力要求信号Ｐから音声入力
R₁までの時間T₁に応じて、音声入力R₁から音声
応答R₂までの時間T₃を変化させたり、音声応答
R₂の応答時間T₄を変化させたり、音声応答R₂の
表現形式を変化させるので、システムの使用法に
慣れているユーザや、急いでいるユーザには応答
までの時間を短くしたり応答を早口にしたり、内
容を簡潔にしたり出来、システムの使用法に慣れ
ていないユーザや時間的に余裕のあるユーザに
は、応答までの時間を長くしたり、応答をゆつく
りした口調にしたり、内容を丁寧にすることが出
来る。又、上述した音声応答制御部による（）
（）（）の制御を個々に行わずに組合せて行な
うことも可能である。こうすることにより人間と
機械との対話の円滑化を図ることが出来る。 In this embodiment configured in this way, as shown in the schematic diagram of FIG.
Depending on the time T ₁ up to R ₁ , the time T ₃ from audio input R ₁ to audio response R ₂ can be changed, or the audio response
By changing the response time _T4 of _R2 and the expression format of the voice response _R2 , users who are accustomed to using the system or users who are in a hurry can shorten the response time or For users who are not familiar with how to use the system or who have plenty of time, it is possible to speak quickly or keep the content concise. You can make the content more detailed. Also, () by the voice response control section mentioned above.
It is also possible to control () and () in combination without controlling them individually. By doing this, it is possible to facilitate dialogue between humans and machines.

次に本発明の第２の実施例について図面を参照
して説明する。第４図は本発明の第２の実施例の
概略構成図、第５図は第２の実施例の処理フロー
図である。第２の実施例は第２図に示されるよう
に入力要求信号から音声入力開始までの時間T₁
と音声入力R₁の発声時間T₂とを検出して音声応
答R₂の出力を制御するようにしたものである。
第４図に示す構成は、第１図の構成と比較して、
分析部１、音声区間検出部２、タイミング測定部
３、音声認識部４、辞書メモリ５、音声応答制御
部６、音声応答出力部７、音声入力開始要求部８
は同じものであり、これらに発話時間測定部９を
付加したものである。つまり音声区間検出部２は
入力された音声パターンの始端、終端を検出した
時点でこれらの始端信号、終端信号を各々タイミ
ング測定部３へ送ると共に発話時間測定部９へも
送る。発話時間測定部９では始端信号を入力した
時点から終端信号を入力した時点までの時間T₂
を求める（第５図ステツプ22）。音声応答制御部
６ではタイミング測定部３により求められたT₁
と発話時間測定部９により求められたT₂を入力
する。音声応答制御部６ではこのT₂を予め定め
られた時間長と比較し（第５図ステツプ23）その
結果、及び上述したT₁の時間長の比較結果に応
じて音声応答の出力を制御する。すなわち発声時
間T₂が予め定められた時間長より短い場合は、
ユーザがシステムに慣れているか急いでいる為に
早口で発声したものと見なし上述した様に第２図
に示す時間T₃，T₄を短くしたり音声応答R₂の内
容を簡潔なものとする（第５図ステツプ24）。T₂
が予め定められた時間長より長い場合は、ユーザ
がシステムに慣れていないか時間的に余裕がある
為にゆつくりと遅口で発声したものと見なし、第
２図に示す時間T₃，T₄を長くしたり音声応答R₂
の内容を丁寧なものとする（第５図ステツプ25）。 Next, a second embodiment of the present invention will be described with reference to the drawings. FIG. 4 is a schematic configuration diagram of a second embodiment of the present invention, and FIG. 5 is a processing flow diagram of the second embodiment. In the second embodiment, as shown in FIG. 2, the time T ₁ from the input request signal to the start of voice input is
and the utterance time _T2 of the voice input _R1 are detected to control the output of the voice response _R2 .
The configuration shown in FIG. 4 is compared with the configuration shown in FIG.
Analysis section 1, speech section detection section 2, timing measurement section 3, speech recognition section 4, dictionary memory 5, speech response control section 6, speech response output section 7, speech input start request section 8
are the same, and an utterance time measuring section 9 is added to these. That is, when the voice section detection section 2 detects the start and end of the input voice pattern, it sends these start and end signals to the timing measurement section 3 and also to the speech time measurement section 9, respectively. The speech time measuring section 9 measures the time T ₂ from the time when the start signal is input to the time when the end signal is input.
(Step 22 in Figure 5). The voice response control unit 6 uses T ₁ determined by the timing measurement unit 3.
and T ₂ determined by the speech time measuring section 9 are input. The voice response control unit 6 compares this _T2 with a predetermined time length (step 23 in Figure 5) and controls the output of the voice response according to the comparison result and the comparison result of the time length of _T1 mentioned above. . In other words, if the utterance time T ₂ is shorter than the predetermined time length,
It is assumed that the user speaks quickly because he or she is used to the system or is in a hurry, and as mentioned above, the times T ₃ and T ₄ shown in Figure 2 are shortened, and the content of the voice response R ₂ is made concise. (Step 24 in Figure 5). _T2
If it is longer than the predetermined time length, it is assumed that the user is not accustomed to the system or has time to spare, and _therefore speaks slowly and slowly. ₄ or longer voice response R ₂
(Step 25 in Figure 5)

この様に第２の実施例によれば、第２図に示す
時間T₁とT₂を測定しこの結果に対応して音声応
答R₂の出力を制御するので、第１の実施例に比
べて更にユーザの性格や発声の時の情況を良く反
映させた応答が可能となる為に、ユーザと機械の
対話の自然性をより一層高めることが出来る。 As described above, according to the second embodiment, the times T ₁ and T ₂ shown in FIG. 2 are measured and the output of the voice response R ₂ is controlled in accordance with the results. Furthermore, since it is possible to provide a response that better reflects the user's personality and the situation at the time of the utterance, the naturalness of the interaction between the user and the machine can be further enhanced.

上述した第１、第２の実施例においては、音声
入力開始要求信号Ｐが音声入力開始要求部８より
出力されるものとしたが、これを音声応答出力部
７より出力させ、更に応答音声と入力要求音声を
連続して出力させることも出来る。つまりユーザ
からの発声と機械からの応答を次々と連続させて
行なうものである（第５図フローの点線）。第６
図は入力要求を含んだ応答音声と入力音声のタイ
ミングを模式化した模式図である。この図におい
て、R₀，R₂，R₄，R₆は各入力要求を含んだ応答
音声、R₁，R₃，R₅はユーザからの入力音声であ
る。例えば上述した残高照会サービスにおいて、 R₀「口座番号の数字を１つずつ順に御願いし
ます」 R₁「１」 R₂「１ですね。分りました。次の番号を御願
いします」 R₃「２」というものである。この様に応答の出力方法を変
形させた場合にも、第２の実施例と同様に、応答
音声から入力音声までの時間T₁，T₅，T₉、入力
音声の発話時間T₂，T₆，T₁₀を測定することによ
り、入力音声から応答音声までの時間T₃，T₇，
T₁₁、応答音声の発話時間T₂，T₆，T₁₀，T₁₂、
応答音声の内容R₀，R₂，R₄，R₆を変化させるこ
とが出来る。上述した実施例をこの様に変形する
ことにより音声入力と応答がスピーデイに行なわ
れ、更に回線使用のコストが削減でき、経済的価
値が絶大となる。 In the first and second embodiments described above, the voice input start request signal P is output from the voice input start request section 8, but this is output from the voice response output section 7, and is further output as a response voice. It is also possible to output the input request voice continuously. In other words, the user's voice and the machine's response are performed one after another (dotted line in the flowchart in FIG. 5). 6th
The figure is a schematic diagram illustrating the timing of a response voice including an input request and an input voice. In this figure, R ₀ , R ₂ , R ₄ , and R ₆ are response voices including each input request, and R ₁ , R ₃ , and R ₅ are input voices from the user. For example, in the balance inquiry service mentioned above, R ₀ "Please give me the account number one by one." R ₁ "1" R ₂ "That's 1. I understand. Please give me the next number." R ₃ "2". Even when the response output method is modified in this way, the times from the response voice to the input voice T ₁ , T ₅ , T ₉ and the utterance times of the input voice T ₂ , T By measuring ₆ , _T10 , the time from input voice to response voice _T3 , _T7 ,
T ₁₁ , utterance time of response voice T ₂ , T ₆ , T ₁₀ , T ₁₂ ,
The contents R ₀ , R ₂ , R ₄ , and R ₆ of the response voice can be changed. By modifying the above-described embodiment in this way, voice input and response can be performed quickly, and the cost of using the line can be further reduced, resulting in tremendous economic value.

尚、本発明は上記実施例に限定されるものでは
ない。例えばタイミング測定部が時間T₁とT₂の
両方を測定してもよい。又、入力要求信号から入
力音声までの時間の履歴の情報、すなわち何回か
の時間測定を行なつてユーザの性格をはつきりと
検出できた後に応答出力を変化させてもよい。更
に発話時間測定は発話速度測定でもよいし応答出
力として音声だけではなくCRT、プリンタ等を
利用して行つてもよい。入力音声の認識処理や音
声合成の方法は従来より知られた種々の方式を適
宜採用すればよい。要するに本発明はその要旨を
逸脱しない範囲で種々変形して実施することがで
きる。 Note that the present invention is not limited to the above embodiments. For example, the timing measuring section may measure both times _T1 and _T2 . Alternatively, the response output may be changed after the information on the history of the time from the input request signal to the input voice, that is, the time is measured several times to clearly detect the user's personality. Furthermore, the speech time measurement may be performed by measuring the speech rate, or by using not only voice but also a CRT, printer, etc. as a response output. As the input speech recognition process and speech synthesis method, various conventionally known methods may be appropriately adopted. In short, the present invention can be implemented with various modifications without departing from the gist thereof.

[Brief explanation of the drawing]

第１図は本発明の第１の実施例の概略構成図、
第２図は入力要求と入力音声及び応答音声のタイ
ミングの模式図、第３図は第１の実施例の処理フ
ロー図、第４図は本発明の第２の実施例の概略構
成図、第５図は第２の実施例の処理フロー図、第
６図は会話型の連続入力応答形式のタイミングの
模式図である。１……分析部、２……音声区間検出部、３……
タイミング測定部、４……音声認識部、５……辞
書メモリ、６……音声応答制御部、７……音声応
答出力部、８……音声入力開始要求部、９……発
話時間測定部。 FIG. 1 is a schematic configuration diagram of a first embodiment of the present invention,
FIG. 2 is a schematic diagram of the timing of an input request, input voice, and response voice; FIG. 3 is a processing flow diagram of the first embodiment; FIG. 4 is a schematic configuration diagram of the second embodiment of the present invention; FIG. 5 is a processing flow diagram of the second embodiment, and FIG. 6 is a schematic diagram of the timing of a conversational continuous input response format. 1... Analysis section, 2... Voice section detection section, 3...
Timing measurement section, 4... Voice recognition section, 5... Dictionary memory, 6... Voice response control section, 7... Voice response output section, 8... Voice input start request section, 9... Speech time measurement section.

Claims

[Scope of Claims] 1. Input requesting means for outputting a signal requesting input of an audio signal; Input means for inputting an audio signal after the input requesting means outputs the input request signal; recognition means for recognizing an input audio signal; measuring means for measuring time T1 from the time when the input request signal is output by the input request means to the time when input of the audio signal by the input means is detected; output means for outputting a voice response signal corresponding to the recognition result of the voice signal by the recognition means; and control means for variably controlling the time T3 from the time when it is detected that the input of the audio signal by the means has ended to the time when the audio response signal is output, or the time T4 required for outputting the audio response signal. A voice dialogue device characterized by: