JP2556978B2

JP2556978B2 - Interactive answering machine

Info

Publication number: JP2556978B2
Application number: JP62185363A
Authority: JP
Inventors: 宏之西; 順治小島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1987-07-27
Filing date: 1987-07-27
Publication date: 1996-11-27
Anticipated expiration: 2011-11-27
Also published as: JPS6430354A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、留守番電話装置に関するもので、特に本発
明は、発呼者が用件メッセージの発声を終了したか否か
を判断するための発声終了検出時間を合理的に制御する
ように工夫された対話形留守番電話装置に関するもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an answering machine, and more particularly, the present invention is for determining whether or not a caller has finished speaking a message. The present invention relates to an interactive answering machine which is devised to rationally control the end of utterance detection.

（従来の技術）留守番電話機は、通常第２図（ａ）のフローチャート
に示すように、着信を検出した後、自動的にループを閉
成し、応答メッセージの送出を行ったのちに用件メッセ
ージの録音を行い、ループを開放する動作を行う。この
場合、発呼者は用件メッセージを一度に発声しなくては
ならないため、自分の名前・電話番号・用件等を短時間
に整理して話し始めなければならなかった。従って、大
きな心理的な負担を生じることとなり、結果的には録音
された用件内容が不十分であったり、用件などを録音す
ることなく切断する発呼者が多いと言う欠点があったの
で、これらの問題点を解決するため、第２図（ｂ）に示
すように、発呼者名・電話番号・伝言内容等を一項目づ
つ対話形式により誘導して録音する本出願人の出願に係
る特願昭60−27566の対話形留守番電話装置が考案され
ている。(Prior Art) An answering machine normally closes a loop after detecting an incoming call and sends a response message after sending a response message as shown in the flowchart of FIG. 2 (a). The recording is performed and the operation of releasing the loop is performed. In this case, the caller had to speak out the message at once, so he had to sort out his name, telephone number, message, etc. in a short time and start speaking. Therefore, it causes a great psychological burden, and as a result, there is a drawback that the recorded content of the message is insufficient or that many callers disconnect the message without recording the message. Therefore, in order to solve these problems, as shown in FIG. 2 (b), the applicant's application in which the caller's name, telephone number, message content, etc. are interactively guided and recorded one by one An interactive answering machine of Japanese Patent Application No. 60-27566 has been devised.

対話形式で用件メッセージを録音する場合、発呼者の
音声の有無を監視し、発呼者が発声を終了したか否かを
検出する必要がある。人間同士の対話の場合は、相手の
音声の意味内容から、発声終了を知ることが可能である
が、人間対機械の場合は人間の音声の意味内容をリアル
タイムに認識することは、現状では極めて困難である。
そのため、第３図に示すように発呼者の音声の有無を監
視し、無音部分の長さが所定の値（第３図ではτ）を超
えた時点で発声終了であると判断するのが現実的な方法
である。When recording a message of interest in an interactive manner, it is necessary to monitor the presence or absence of voice from the calling party and detect whether or not the calling party has finished speaking. In the case of human-to-human dialogue, it is possible to know the end of utterance from the semantic content of the other party's voice, but in the case of human-machine, it is extremely difficult to recognize the semantic content of human voice in real time. Have difficulty.
Therefore, as shown in FIG. 3, the presence or absence of the voice of the caller is monitored, and when the length of the silent portion exceeds a predetermined value (τ in FIG. 3), it is judged that the utterance has ended. This is a realistic method.

第３図は応答メッセージの送出後ループで発声開始
を監視し、ループで発声終了を監視する。第３図にお
ける発声終了の原理は次のように示される。In FIG. 3, after the response message is transmitted, the start of utterance is monitored in a loop, and the end of utterance is monitored in a loop. The principle of ending vocalization in FIG. 3 is shown as follows.

発声開始後に観測される無音は、一時的なポーズか発
声終了のいづれかである。従って、人間の音声に含まれ
る無音部分の長さが統計的にみてある一定の値（τ）を
超えることは希であると言う観点から、τよりも長い無
音が観測されればその無音は一時的なポーズでなく、発
声終了であると判断できる。The silence observed after the start of vocalization is either a temporary pause or the end of vocalization. Therefore, from the viewpoint that it is rare that the length of the silent part contained in the human voice exceeds a certain value (τ) statistically, from the viewpoint that the silent part longer than τ is observed, the silent part is It can be judged that it is the end of vocalization, not a temporary pose.

（発明が解決しようとする問題点）しかしながら、一定の値τを以って発声終了検出時間
とする従来の方法には以下のような問題点がある。(Problems to be Solved by the Invention) However, the conventional method in which the utterance end detection time is set with a constant value τ has the following problems.

（１）対話形留守番電話においては第２図（ｂ）に示
したように、発呼者名・電話番号・用件等を一項目づつ
対話形式により誘導して録音する。この場合、例えば発
呼者名を話す音声中に含まれるポーズと、用件を話す音
声中に含まれるポーズとを比較すると、後者の音声に含
まれるポーズの方が、数・長さ共に大きいと考えられ
る。従って、両者に対して同一の発声終了検出時間を用
いた場合、発声終了検出の信頼度が同一でなく、発呼者
名の場合に比べて、用件メッセージの場合の方がより多
く検出誤りを生じることが予想される。(1) In an interactive answering machine, as shown in FIG. 2 (b), the caller's name, telephone number, message, etc. are interactively guided and recorded one item at a time. In this case, for example, comparing the pose included in the voice that speaks the caller name with the pose included in the voice that speaks the message, the latter voice has a larger number and length in both poses. it is conceivable that. Therefore, when the same end-of-speech detection time is used for both, the reliability of end-of-speech detection is not the same, and there are more detection errors in the case message than in the case of the caller name. Is expected to occur.

（２）用件メッセージが発声開始後、極めて短時間後
に終了することは希であると考えられる。従って、発呼
者が発声を開始した直後（例えば１秒以内）に生じた無
音と発声開始後相当の時間経過後（例えば10秒後）に生
じた無音とでは発声終了である確率は異なるはずであ
る。従ってどの時点で生じた無音であるかに無関係に、
一定の発声終了検出時間（τ）を用いた場合、生じた無
音の位置によって検出の信頼度が異なることが予想され
る。(2) It is considered rare that the message ends after a very short time after the start of utterance. Therefore, the probability of ending the utterance should be different between the silence that occurs immediately after the caller starts speaking (eg, within 1 second) and the silence that occurs after a considerable amount of time has elapsed (eg, 10 seconds after) the beginning of speech. Is. Therefore, regardless of when the silence occurred,
When a fixed utterance end detection time (τ) is used, it is expected that the detection reliability varies depending on the position of the generated silence.

本発明は、これらの問題点を解決するために、質問の
内容に応じて、また発声開始後どの時点で観測された無
音であるかによって、発声終了検出時間を可変とするこ
とを目的としている。In order to solve these problems, an object of the present invention is to make the utterance end detection time variable depending on the content of the question and depending on the silent time observed after the start of utterance. .

（問題点を解決するための手段）前記問題点を解決するため、対話形留守番電話装置
を、局線からの呼出し信号を検出する着信検出手段と、
着信検出後、ループを閉成するループ閉成手段と、複数
の応答メッセージを格納する応答メッセージ蓄積手段
と、応答メッセージ音声を回線に送出する応答メッセー
ジ送出手段と、発呼者の用件メッセージを録音する用件
メッセージ録音手段と、用件メッセージを再生する用件
メッセージ再生手段と、発呼者が用件メッセージを発声
中であるか否かを検出する音声検出手段と、該音声検出
手段からの情報により、発呼者音声中に観測される無音
時間の長さが所定の発声終了検出時間に達することをも
って発声終了を検出する発声終了検出手段とを備え、自
動着信後、応答メッセージの送出と用件メッセージの録
音を対話形式で繰り返した後に局線を開放する対話形留
守番電話装置において、前記発声終了検出時間を、直前
の応答メッセージの内容に応じて変化させるとともに、
前記音声検出手段からの情報を元に、発呼者が発声を開
始してからの経過時間を計測する経過時間計測手段を設
け、該経過時間計測手段から得られた発声開始後の経過
時間に適応して発声終了検出時間を低減させる制御手段
を設けて構成する。(Means for Solving Problems) In order to solve the problems, an interactive answering machine is provided with an incoming call detecting means for detecting a calling signal from a central line,
After the incoming call is detected, the loop closing means for closing the loop, the response message storing means for storing a plurality of response messages, the response message transmitting means for transmitting the response message voice to the line, and the message message of the caller are displayed. From a voice message detecting means for recording a message message recording means for recording, a message message reproducing means for reproducing the message message, a voice detecting means for detecting whether or not the caller is speaking the message message, and the voice detecting means. Information, the utterance end detection means for detecting the end of utterance when the length of silent time observed in the caller's voice reaches a predetermined utterance end detection time, and the response message is transmitted after the automatic incoming call. In an interactive answering machine in which the line is released after repeating the recording of the message and the message in an interactive mode, the utterance end detection time is set to the value of the immediately preceding response message. With varied according to volume,
Based on the information from the voice detection means, an elapsed time measuring means for measuring an elapsed time from when the caller starts speaking is provided, and the elapsed time after the start of the speech obtained from the elapsed time measuring means is set to the elapsed time. A control means for adapting and reducing the utterance end detection time is provided and configured.

（作用）本発明を前記の通り構成したので、従来の技術におけ
る問題点は解決され、質問の内容に応じて、また発声開
始後のどの時点で観測された無音であるかによって、発
声終了検出時間を可変とすることができるのである。(Operation) Since the present invention is configured as described above, the problems in the prior art are solved, and depending on the content of the question, and depending on at what time after the start of the utterance, the utterance ends. The detection time can be variable.

（実施例）以下本発明の一実施例を図面とともに説明する。Embodiment An embodiment of the present invention will be described below with reference to the drawings.

第１図は、本発明の一実施例の構成を示すブロック図
である。同図において、１は着信検出回路、２は制御
部、３はフックスイッチ、４はループ制御回路、５は応
答メッセージ蓄積部、６は通話回路、７は応答メッセー
ジ送出部、８は用件メッセージ録音部、９は音声検出回
路、10は発声終了検出手段、11は経過時間計測手段、12
は用件メッセージ再生部である。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention. In the figure, 1 is an incoming call detection circuit, 2 is a control unit, 3 is a hook switch, 4 is a loop control circuit, 5 is a response message storage unit, 6 is a call circuit, 7 is a response message transmission unit, and 8 is a message message. A recording section, 9 is a voice detection circuit, 10 is a voice end detection means, 11 is an elapsed time measurement means, 12
Is a message reproducing section.

以下、同図を用いて実施例の動作を説明する。 The operation of the embodiment will be described below with reference to FIG.

先ず、局線から着信があると、着信検出回路１がこれ
を検出し制御部２に出力する。制御部２は所定時間経過
後フックスイッチ３と並列に接続されたループ制御回路
４を作動せしめてループを閉成し、自動着信動作を終了
する。First, when there is an incoming call from the central office line, the incoming call detection circuit 1 detects it and outputs it to the control unit 2. After a lapse of a predetermined time, the control unit 2 activates the loop control circuit 4 connected in parallel with the hook switch 3 to close the loop and terminate the automatic incoming call operation.

次に、制御部２は応答メッセージ蓄積部５に予め登録
された第一の応答メッセージを、通話回路６の送話端子
に接続された応答メッセージ送出部７を動作させること
により、局線に送出する。その後、制御部２は発呼者が
発声した音声を以下に示す要領で用件メッセージ録音部
８に録音する。Next, the control unit 2 sends the first reply message registered in advance in the reply message storage unit 5 to the office line by operating the reply message sending unit 7 connected to the sending terminal of the call circuit 6. To do. After that, the control unit 2 records the voice uttered by the caller in the message recording unit 8 in the following manner.

第一の応答メッセージ送出後、制御部２は音声検出回
路９からの信号を元に発呼者が用件メッセージを発声中
であるか否かを監視すると共に、発声が途切れた場合の
無音の長さを測定する。発声終了検出手段10は測定され
た無音の長さが所定の値（発声終了検出時間）よりも長
くなった時点を発声終了と判定する。なお、判定に際し
て、制御部２は、発声終了検出手段10に、メッセージの
内容および経過時間計測手段11から得られた発声開始後
の経過時間に応じて、第４図のように可変の発声終了検
出時間を与える。第４図の値は発呼者の音声に先立つ機
械側の応答メッセージの内容および、検出の対象となる
無音が発声開始後のどの時点に開始したかに依存する。
例えば直前の応答メッセージが「恐れ入りますがどちら
様でしょうか」のように短時間で応えることのできる質
問の場合は第４図に（イ）として示すように発声終了検
出時間を初期値を短くし、かつ時間の経過と共に比較的
急速に短くする。一方、「ご用件を録音しますのでどう
ぞお話下さい」のように録音時間が長くなる可能性のあ
る場合は同図に（ロ）として示すように発声終了検出時
間の初期値を長く、かつ緩やかに減少させる。即ち発声
内容及び発呼者が発声を開始した後どれくらいの時間が
経過したかによって発声終了検出時間をコントロール
し、的確にしかも冗長になることなく発声終了を検出す
る。発声終了検出後は、直ちに予め設定された順番に従
って第二の応答メッセージを送出する。After sending the first response message, the control unit 2 monitors whether or not the caller is speaking a message based on the signal from the voice detection circuit 9, and detects whether the caller is silent. Measure the length. The utterance end detection means 10 determines the end of utterance when the measured silence length becomes longer than a predetermined value (utterance end detection time). At the time of determination, the control unit 2 causes the utterance end detection means 10 to change the utterance end variable as shown in FIG. 4 according to the content of the message and the elapsed time after the start of utterance obtained from the elapsed time measurement means 11. Give the detection time. The values in FIG. 4 depend on the contents of the response message on the machine side prior to the caller's voice and at what point in time after the start of utterance the silence to be detected begins.
For example, in the case of a question that can be answered in a short time, such as "I am sorry, which one is the last response message," shorten the initial value of the utterance end detection time as shown in (a) in Fig. 4. , And shortens relatively quickly over time. On the other hand, if there is a possibility that the recording time will be long, such as "Please record your message, please talk", set the initial value of the utterance end detection time to a long value as shown in (b) in the figure, and Reduce gradually. That is, the utterance end detection time is controlled depending on the utterance content and how much time has passed since the caller started uttering, and the utterance end is detected accurately and without redundancy. Immediately after the end of utterance is detected, the second response message is transmitted according to a preset order.

以下、上記応答メッセージの送出と用件メッセージの
録音を繰り返すことにより、対話録音を進める。全ての
用件メッセージの録音が終了すると、制御部２はループ
制御回路４を制御して、ループを開放し、空き状態に戻
る。Hereinafter, the dialog recording is advanced by repeating the above-mentioned sending of the response message and the recording of the message. When the recording of all the message is completed, the control unit 2 controls the loop control circuit 4 to open the loop and return to the empty state.

本装置の所有者が在室（在宅）状態において、録音さ
れた用件メッセージを再生する場合、制御部２は、用件
メッセージ録音部８に録音された用件メッセージを用件
メッセージ再生部12に再生させることにより、用件メッ
セージを聴取することが出来る。When the owner of this apparatus plays back the recorded message in the room (at home), the control unit 2 uses the message recording unit 12 to record the message recorded in the message recording unit 8. The message can be listened to by playing it back.

（発明の効果）以上説明したように、本発明の留守番電話装置を用い
ることにより、発呼者の発声終了を、信頼性高く、かつ
速やかに検出することができる。即ち、発声終了以前に
機械が割り込む動作を防止するとともに、応答時間の短
縮により、使い勝手を向上させることが可能となるとい
う効果がある。(Effects of the Invention) As described above, by using the answering machine of the present invention, it is possible to quickly and reliably detect the end of the caller's utterance. That is, it is possible to prevent the machine from interrupting the operation before the end of utterance and shorten the response time to improve the usability.

[Brief description of drawings]

第１図は本発明の一実施例の構成を示すブロック図、第２図（ａ）は従来の留守番電話の動作フローチャー
ト、第２図（ｂ）は従来の対話形留守番電話装置の動作フロ
ーチャート、第３図は発声終了を検出するための処理フローチャー
ト、第４図は発声開始後の経過時間と発声終了検出時間との
関係を示すグラフ、１……着信検出回路、２……制御部、３……フックスイ
ッチ、４……ループ制御回路、５……応答メッセージ蓄
積部、６……通話回路、７……応答メッセージ送出部、
８……用件メッセージ録音部、９……音声検出回路、10
……発声終了検出手段、11……経過時間計測手段、12…
…用件メッセージ再生部、L₁,L₂……局線。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention, FIG. 2 (a) is an operation flowchart of a conventional answering machine, and FIG. 2 (b) is an operation flowchart of a conventional interactive answering machine. FIG. 3 is a processing flowchart for detecting the end of utterance, and FIG. 4 is a graph showing the relationship between the elapsed time after the start of utterance and the end of utterance detection time. ...... Hook switch, 4 …… Loop control circuit, 5 …… Response message storage unit, 6 …… Call circuit, 7 …… Response message sending unit,
8 ... Message recording section, 9 ... Voice detection circuit, 10
…… Voice end detection means, 11 …… Elapsed time measurement means, 12…
… Message message playback section, L ₁ , L ₂ …… Local line.

Claims

(57) [Claims]

1. An incoming call detecting means for detecting a call signal from an office line, a loop closing means for closing a loop after detecting an incoming call, a response message accumulating means for storing a plurality of response messages, and a response message voice. Response message sending means for sending a message to the line, a message message recording means for recording a message message of the calling party, a message message reproducing means for playing the message message, and the caller utters the message message. Speaking ends when the length of the silent period observed in the caller's voice reaches a predetermined end-of-speech detection time, based on the sound detection means for detecting whether or not it is in the middle and the information from the sound detection means. With an utterance end detecting means for detecting a voice call, an interactive answering machine that automatically releases a station line after automatically answering a call, sending a response message and recording a message In the step of changing the utterance end detection time according to the content of the immediately preceding response message, and measuring the elapsed time from when the caller starts uttering based on the information from the voice detection means. An interactive answering machine, comprising: time measuring means, and control means for reducing the utterance end detection time in accordance with the elapsed time after the start of utterance obtained from the elapsed time measuring means.