JPH03276947A

JPH03276947A - Interactive automatic answering telephone set

Info

Publication number: JPH03276947A
Application number: JP2075551A
Authority: JP
Inventors: Kazuhiro Gomi; 五味　和洋; Yutaka Nishino; 豊西野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1990-03-27
Filing date: 1990-03-27
Publication date: 1991-12-09
Anticipated expiration: 2011-01-29
Also published as: JPH088602B2

Abstract

PURPOSE:To set each threshold level so that the effect of voice detection error is not made detrimental by providing a threshold level setting means able to set two kinds of voice detection level threshold level for interactive reply and pause compression independently. CONSTITUTION:A threshold level setting means 9A is provided, which is able to set two kinds of threshold levels as interactive reply voice detection threshold level and a pause compression voice detection threshold level independently. Through the constitution above, two kinds of interactive reply threshold level Vth-a and pause compression threshold level Vtn-n are prepared as the voice detection threshold levels Vth to be compared with the voice detection level and they are calculated separately and respectively and the voice detection sensitivity is decreased in order to prevent an event of detecting the voiced sound regardless when the caller makes no utterance in the interactive reply. That is, the threshold level Vth-a is set relatively to a larger value. On the other hand, in the pause compression, in order to prevent the detection of silence regardless when the caller makes utterance, the voice detection sensitivity is increased. That is, the Vth-b is set to a small value relatively. The voice level is detected based thereon.

Description

【発明の詳細な説明】［産業上の利用分野１この発明は、自動着信時に発呼者と電話機との間でメツ
セージのやり取りを行いながら、発呼者の音声を録音す
る対話型留守番電話機の中でも、発呼者の用件メツセー
ジ録音の際に発呼者が発声を行っていない無音区間は詰
めて録音するポーズ圧縮録音方式を採用している対話型
留守番電話機について、対話応答とポーズ圧縮とを、共
に性能よく実現させる対話型留守番電話機に関するもの
である。[Detailed Description of the Invention] [Industrial Application Field 1] The present invention relates to an interactive answering machine that records the caller's voice while exchanging messages between the caller and the telephone when a call is automatically received. In particular, for interactive answering machines that employ a pause compression recording method that compresses the silent sections in which the caller does not speak when recording the caller's business message, we have investigated the dialogue responses and pause compression. This invention relates to an interactive answering machine that realizes both of these functions with good performance.

［従来の技術］一般に、対話型留守番電話機は、応答メツセージを、■
看呼者名を明らかにする部分と、■着呼者が留守である
ことを述べる部分とに分け、各々を一定の間隔で送出す
る。発呼者は、■の部分を聞いたときには留守番電話機
が応答したことに気づかないので、発声を開始する確率
が高い。しがし、発呼者と電話機との間で自然な対話を
継続させ、発呼者に違和感を与えずに、さらに詳しい情
報を聞き出すには、応答メツセージをさらに細かい部分
に分けると同時番乙発呼者が発声を終了したタイミング
を検知し、■以降の応答メッセージの送出タイミングを
発呼者の発声終了に同期させる必要がある。[Prior Art] In general, interactive answering machines send response messages by
It is divided into a part that reveals the name of the person being called, and a part that states that the called person is away, and each part is sent at regular intervals. When the caller hears the part (■), he or she does not realize that the answering machine has answered, so there is a high probability that the caller will start speaking. However, in order to maintain a natural dialogue between the caller and the telephone, and to obtain more detailed information without making the caller feel uncomfortable, it is best to divide the response message into smaller parts. It is necessary to detect the timing at which the caller has finished speaking, and to synchronize the timing of sending the response messages after ① with the end of the caller's voice.

発呼者の発声終了は、回線信号の音声検出手段を用いて
発声終了検知処理を実行することにより、検知すること
が可能である。第５図に具体的な処理の例を示す。本処
理は、電話機が応答メツセージの送出を完了した直後に
実行される。以下に第５図の処理について説明する。な
お、８１〜ＳＩＯは各ステップを示す。The end of the caller's utterance can be detected by executing a utterance end detection process using the line signal audio detection means. FIG. 5 shows a specific example of processing. This process is executed immediately after the telephone completes sending the response message. The processing shown in FIG. 5 will be explained below. Note that 81 to SIO indicate each step.

はじめに、相手が発声を開始するか否かを確認する。具
体的には、タイマをリセット／スタートさせ（Ｓｔ）、
本処理開始後、無言判定時間しきい値Ｔ　ｍｕｇｏイ経
過しても、発声が開始されない、すなわち、音声検出手
段の判定結果が有音に、ならない場合には（Ｓ２．Ｓ３
．Ｓ４）　、相手が発声を開始する意志はないものと判
断する（Ｓ５）。First, check whether the other party starts speaking. Specifically, the timer is reset/started (St),
After the start of this process, even if the silence determination time threshold T has elapsed, utterance does not start, that is, if the determination result of the voice detection means does not indicate the presence of sound (S2.S3
．． S4), it is determined that the other party has no intention of starting vocalization (S5).

相手に発声開始の意志がないと判断した場合に、電話機
は通常次の応答メツセージの送出に処理を進める。If the telephone determines that the other party has no intention of starting speaking, the telephone typically proceeds with the process of sending the next response message.

ステップ（Ｓ３）で相手の発声開始が確認された場合に
は、タイマをリセット／スタートさせ（Ｓ６）、音声検
出処理に入り（Ｓ７）、相手発声中に出現する無音状態
の継続時間を監視する（Ｓ８）。発声には、息継ぎ、内
容の思考等に要する無音期間が含まれる。このような無
音期間の継続時間は一般に余り長いものではない。一方
、相手が発声を終了した場合には、無音状態が継続する
ことになる。そこで、発声終了判定時間しきい値Ｔ　＠
ｎｆ１を決め、単一の無音期間が発声終了判定時間しき
い値Ｔ、。６以上継続したときに初めて、相手が発声を
終了したと判断することとする（Ｓ９．５ＩＯ）。相手
の発声終了が確認されると電話機は次の応答メツセージ
の送出を行う。以上が第５図に関する説明である。If it is confirmed in step (S3) that the other party has started speaking, the timer is reset/started (S6), a voice detection process is entered (S7), and the duration of silence that appears while the other party is speaking is monitored. (S8). Vocalization includes silent periods required for breathing, thinking about the content, etc. The duration of such silent periods is generally not very long. On the other hand, if the other party finishes speaking, the silent state continues. Therefore, the utterance end judgment time threshold T @
nf1 is determined, and a single silent period is the utterance end determination time threshold T. Only when the number of utterances continues for 6 or more, it is determined that the other party has finished speaking (S9.5IO). When it is confirmed that the other party has finished speaking, the telephone transmits the next response message. The above is the explanation regarding FIG.

一方、ポーズ圧縮録音は、発声の合間に存在する息継ぎ
、内容の思考等に起因する無音期間は録音せず、有音部
分のみを録音媒体に記憶する技術であり、録音時の録音
媒体の利用効率を上げると共に、録音されたメツセージ
の再生に要する時間を短縮できるという長所がある。On the other hand, pause compression recording is a technology that does not record the silent periods caused by breathing pauses between vocalizations, thoughts about the content, etc., and stores only the vocal parts on the recording medium. It has the advantage of increasing efficiency and shortening the time required to play back recorded messages.

一般的な音声検出手段は、大きく分けてレベル検出回路
と比較部という２つの部分から構成されている。レベル
検出回路は、入力信号のエンベロープ波形あるいはパワ
ー情報を抽出する回路である。比較部は、音声検出レベ
ルしきい値を格納しており、レベル検出回路の出力値と
音声検出レベルしきい値とを比較し、レベル検出回路の
出力が音声検出レベルしきい値よりも大きい場合には音
声検出結果を有音、レベル検出回路の出力が音声検出レ
ベルしきい値よりも小さいときには音声検出結果を無音
とする。したがって、音声検出レベルしきい値を高めに
設定すると相対的にレベルの大きな信号のみを有音と判
定し、音声検出レベルしきい値を低めに設定すると逆に
相′対的にレベルの小さい信号でも有音と判定されるこ
ととなる。A typical audio detection means is roughly divided into two parts: a level detection circuit and a comparison section. The level detection circuit is a circuit that extracts the envelope waveform or power information of the input signal. The comparison section stores a voice detection level threshold, compares the output value of the level detection circuit with the voice detection level threshold, and compares the output value of the level detection circuit with the voice detection level threshold. If the output of the level detection circuit is smaller than the voice detection level threshold, the voice detection result is determined to be silent. Therefore, if the voice detection level threshold is set high, only signals with a relatively high level will be determined to be active, and if the voice detection level threshold is set low, signals with a relatively low level will be judged as sound. However, it will be determined that there is a sound.

また、回線条件９発呼者の周囲条件により変化する背景
雑音の大きさに適応して音声検出レベルしきい値の大き
さを設定することにより音声検出精度を高めることが可
能である（特開平１−３０７３５０号公報参照）。In addition, it is possible to improve voice detection accuracy by setting the voice detection level threshold in accordance with the level of background noise that changes depending on the surrounding conditions of the caller under line condition 9 (Japanese Patent Application Laid-Open No. 1-307350).

［発明が解決しようとする課題］電話回線を介して伝達されてくる発呼者音声は、発呼者
発声レベルの個人差、発呼者使用電話機の相違、加入者
線路や中継系が持つ損失の差、発呼者の周囲雑音や回線
雑音の大小当の影響を受け、Ｓ／Ｎ比は様々に変化する
。音声検出回路はこのような条件下でも、検出誤りな（
動作することを目標に設計されるが、検出誤りの発生を
完全に零にすることは困難である。したがって、たとえ
検出誤りが発生しても、その検出誤りが対話応答、ポー
ズ圧縮各々の機能に対し致命的な影響を与えないように
音声検出回路を設計するべきである。[Problem to be solved by the invention] The caller's voice transmitted via the telephone line is subject to individual differences in the caller's voice level, differences in the telephones used by the caller, and losses in subscriber lines and relay systems. The S/N ratio changes in various ways depending on the difference in the caller's surrounding noise and line noise. Even under these conditions, the voice detection circuit can detect false detections (
However, it is difficult to completely eliminate the occurrence of detection errors. Therefore, even if a detection error occurs, the speech detection circuit should be designed so that the detection error does not have a fatal effect on the dialogue response and pause compression functions.

ここで、音声検出回路の検出誤りが対話応答とポーズ圧
縮に及ぼす影響を分類すると以下のようになる。Here, the effects of detection errors in the voice detection circuit on dialogue responses and pause compression can be classified as follows.

■　回線雑音や発呼者の周囲雑音の影響により、発呼者
が発声を行っていないにもかかわらず、音声検出結果が
有音になった場合：・対話応答対話応答アルゴリズムは発呼者の発声が終了していない
と判断するので、次の応答メツセージ送出をいつまでた
っても開始しない。また、発呼者が全く発声を行わない
ときに、単発的な雑音が有音と検知された場合には、電
話機は発呼者が発声を開始しすぐに終了したと判断する
ため、発呼者が発声していないにもかかわらず、電話機
は次の応答メツセージ再生へと処理を進めてしまう。■ If the voice detection result shows sound even though the caller is not speaking due to the influence of line noise or surrounding noise of the caller: Dialogue response The dialogue response algorithm Since it is determined that the utterance has not finished, the next response message transmission will not start no matter how long it takes. In addition, if the caller does not make any utterances and a single noise is detected as sound, the telephone determines that the caller has started speaking and has immediately finished speaking. The telephone proceeds to play the next response message even though the person is not speaking.

・ポーズ圧縮ポーズ圧縮処理は、発呼者が発声を行っていないにもか
かわらず、この期間の信号を録音する。- Pause compression The pause compression process records the signal during this period even though the caller is not speaking.

■　発呼者の発声レベルが極端に小さい、あるいは回線
損失が極端に大きいなどの影響で、発呼者が発声を行っ
ているにもかかわらず音声検出結果が無音の場合：・対話応答対話応答アルゴリズムは発呼者が発声を開始しない無言
状態にあるものと判断し、応答メツセージ送出から無言
判定時間Ｔ　ｍｕｇｏｎ経過後に、次の応答メツセージ
を送出する。■ If the caller's voice level is extremely low or the line loss is extremely large, and the voice detection result is silent even though the caller is speaking: ・Dialogue response Dialogue response The algorithm determines that the caller is in a silent state without starting to speak, and sends the next response message after the silence determination time T mugon has elapsed since the response message was sent.

・ポーズ圧縮ポーズ圧縮処理は、発呼者が発声を行っているにもかか
わらず録音を行わない。この部分を再生すると、発呼者
の音声は脱落しており、発呼者の発声内容を理解するこ
とは困難になる。- Pause compression Pause compression processing does not record even though the caller is speaking. When this part is played back, the caller's voice is dropped, making it difficult to understand what the caller is saying.

対話応答・ポーズ圧縮という処理別に、どちらの検出誤
りがこれらの処理により致命的な影響を与えるかを整理
すると以下のようになる。The following is a breakdown of which detection errors have a fatal impact on each process: dialogue response and pause compression.

社１長玉検出誤り■の発生時には、発呼者が発声を終了している
にもかかわらず次の応答メツセージが再生されず、処理
がハングアップ状態に陥る。発呼者が発声を行わないと
きに単発的な雑音を発声とみなした場合には、電話機が
勝手に次の応答メツセージを再生するので、発呼者の発
声と電話機の応答メツセージ送出が衝突する可能性が強
い。When the first call detection error (2) occurs, the next response message is not reproduced even though the caller has finished speaking, and the process hangs up. If a single noise is considered to be a voice when the caller does not speak, the telephone automatically plays the next response message, causing a conflict between the caller's voice and the telephone's response message transmission. There is a strong possibility.

検出誤り■の発生時には、応答メツセージ送出終了から
次の応答メツセージ送出開始までに、無言判定に必要な
一定時間Ｔ　ｍｕｇｏｎが置かれることになる。つまり
、発呼者の発声に電話機の応答メツセージ送出が割り込
む可能性があるものの、発呼者の発声を検出せずに一定
の間隔をあけて次々と応答メツセージを送出するタイプ
の対話型留守番電話機と同等の動作となる。When detection error (2) occurs, a certain period of time T mugon necessary for silent determination is required from the end of sending a response message to the start of sending the next response message. In other words, this is a type of interactive answering machine that sends out response messages one after another at regular intervals without detecting the caller's voice, although there is a possibility that the telephone's response message transmission may interrupt the caller's voice. The operation is equivalent to .

以上のことから、検出誤り■の方が対話応答処理に与え
る影響はより致命的である。したがって、音声検出回路
の設計においては、音声検出しきい値を高めに設定し、
検出誤り■の発生確率を抑える方が安全である。From the above, detection error (2) has a more fatal impact on dialogue response processing. Therefore, when designing a voice detection circuit, the voice detection threshold should be set high.
It is safer to reduce the probability of detection error ■.

股二Ｘ旦Ｉ検・出誤り■の発生時には、本来無音の部分も録音され
るので、録音媒体の利用効率低下するものの発呼者の発
声内容はすべて録音される。したがって、録音内容再生
時に発呼者の発声した内容をすべて聴取することが可能
である。When a detection/output error ■ occurs, originally silent parts are also recorded, so the entire utterance of the caller is recorded, although the efficiency of using the recording medium is reduced. Therefore, it is possible to listen to all the content uttered by the caller when playing back the recorded content.

検出誤り■の発声時には、発呼者の発声内容は録音から
脱落しているので、録音内容を再生しても発呼者の発声
内容を聞き取ることはできない。When the detection error ■ is uttered, the caller's utterance is omitted from the recording, so even if the recording is played back, the caller's utterance cannot be heard.

以上のことから、検出誤り■の方がポーズ圧縮処理に与
える影響はより致命的といえる。しだがって、音声検出
回路の設計においては、音声検出しきい値を低めに設定
し、検出誤り■の発生確率を押さえる方が安全である。From the above, it can be said that the detection error (2) has a more fatal influence on the pose compression process. Therefore, when designing a voice detection circuit, it is safer to set the voice detection threshold to a lower value to reduce the probability of occurrence of detection error (2).

以上のように、対話応答については音声検出レベルしき
い値を高めの値に設定することにより雑音を誤って有音
と検出する確率を減じ、ポーズ圧縮については音声検出
レベルしきい値を低めの値に設定することにより、音声
脱落の可能性を小さくすることが望ましい。しかし、こ
れらの要求条件は相反するもので、単一の音声検出しき
い値を用いたのでは実現できないという問題点があった
。As described above, for dialog responses, setting the voice detection level threshold to a higher value reduces the probability of falsely detecting noise as voice, and for pause compression, setting the voice detection level threshold to a lower value. It is desirable to reduce the possibility of voice dropout by setting this value to a certain value. However, these requirements are contradictory, and there is a problem that it cannot be achieved using a single voice detection threshold.

［課題を解決するための手段］この発明にかかる対話型留守番電話機は、音声検出処理
における音声検出レベルしきい値を、対話応答用、ポー
ズ圧縮用と２種独立に設定できるしきい値設定手段を設
けたものである。[Means for Solving the Problems] The interactive answering machine according to the present invention has a threshold setting means that can independently set two types of voice detection level thresholds in voice detection processing, one for dialogue response and one for pause compression. It has been established.

［作用］この発明においては、音声検出結果に検出誤りが生じて
も、この検出誤りが対話応答、ポーズ圧縮に与える影響
は致命的なものとはならない。したがって、両機能を共
に性能よく実現することが可能となる。[Operation] In the present invention, even if a detection error occurs in the voice detection result, the effect of this detection error on dialogue response and pause compression will not be fatal. Therefore, it is possible to realize both functions with good performance.

［実施例］第１図にこの発明の一実施例のブロック構成を示す。１
は回線に呼出信号が到来したことを検知する吐出信号検
出部、２は回線の開閉を行うループ開閉部、３は前記ル
ープ開閉部２を介して回線に接続される通話回路、４は
前記通話回路３の送話（Ｔ）端子に接続される応答メツ
セージ送出部、５は複数の応答メツセージを格納する応
答メツセージ格納部、６は前記通話回路３の受話（Ｒ）
端子に接続され、回線信号を用件メツセージ格納部に録
音するための用件メツセージ録音部、７は用件メツセー
ジを格納する用件メツセージ格納部、８は前記ループ開
閉部２を介して回線に接続される信号レベル測定部、９
はＣＰＵなとの制御部、９Ａはしきい値設定手段、１０
は有音・無音区間の継続時間を測定するためのタイマで
ある。なお、タイマ１０は制御部９からタイマカウント
リセットの指示の到来により自動的に零からタイムカウ
ントを行うものとする。第２図は信号レベル測定部８の
具体的構成および制御部９とのインタフェース例である
。本回路例では、オペレーションアンプＯＰＩとその周
辺のダイオードＤにより半波整流回路が構成され、オペ
レーションアンプＯＰ２とその周辺のコンデンサＣ１抵
抗Ｒにより低域通過フィルタが構成されている。この時
、オペレーションアンプＯＰ２の出力は、入力信号のエ
ンベロープとなる。このエンベロープをＡ／Ｄ変換器８
Ａでディジタル信号に変換し、制御部９に一定周期で取
り込む。[Embodiment] FIG. 1 shows a block configuration of an embodiment of the present invention. 1
2 is a loop opening/closing unit that opens and closes the line; 3 is a telephone call circuit connected to the line via the loop opening/closing unit 2; 4 is the telephone call circuit; A response message sending section connected to the transmitting (T) terminal of the circuit 3; 5 a response message storage section storing a plurality of response messages; 6 a receiving (R) terminal of the telephone circuit 3;
A message recording section 7 is connected to the terminal and records the line signal in the message storage section, 7 is a message storage section for storing the message message, and 8 is connected to the line via the loop opening/closing section 2. Connected signal level measuring section, 9
9A is a control unit such as a CPU; 9A is a threshold setting means; 10
is a timer for measuring the duration of the sound/silence section. It is assumed that the timer 10 automatically counts the time from zero upon receiving an instruction from the control section 9 to reset the timer count. FIG. 2 shows a specific configuration of the signal level measurement section 8 and an example of the interface with the control section 9. In this circuit example, a half-wave rectifier circuit is configured by the operational amplifier OPI and the diode D around it, and a low-pass filter is configured by the operational amplifier OP2 and the resistor R of the capacitor C1 around it. At this time, the output of the operational amplifier OP2 becomes the envelope of the input signal. This envelope is sent to the A/D converter 8.
A converts the signal into a digital signal and inputs it into the control section 9 at regular intervals.

第３図は応答メツセージ格納部５の内部構成例である。FIG. 3 shows an example of the internal configuration of the response message storage section 5. As shown in FIG.

この例では、３種類の対話応答メツセージがあらかじめ
格納されている。In this example, three types of dialogue response messages are stored in advance.

第４図（ａ）、（ｂ）はこの実施例の動作を説明するフ
ローチャートである。なお、（Ｓｌｌ）〜（Ｓ　１８）
および（Ｓ２１）〜（Ｓ４０）は各ステップを示す。以
下、第４図に沿って実施例の動作を説明する。FIGS. 4(a) and 4(b) are flowcharts explaining the operation of this embodiment. In addition, (Sll) ~ (S18)
and (S21) to (S40) indicate each step. The operation of the embodiment will be described below with reference to FIG.

回線に呼出信号が到来したことを呼出信号検出部１が検
出すると（Ｓｌｌ）、制御部９はループ開閉部２を動作
させループを閉結した後（Ｓ１２）、応答メツセージ格
納部５から第１番目の応答メツセージを選択しく３１３
）、応答メツセージ送出部４から通話回路３を介して回
線に第１番目の応答メツセージを送出する（Ｓ・１４）
。応答メツセージの送出が終了すると、制御部９は用件
メツセージ録音動作を開始する（Ｓ１５）。応答メツセ
ージ番号が第３番目（＝３）でなければ応答メツセージ
番号に＋ＬＬ、（Ｓ１７）、以後、ステップ（Ｓ１４）
から以降を繰り返し、ステップ（Ｓ１６）で応答メツセ
ージ番号が第３番目（＝３）になれば回線閉結とする（
３１８）。When the ringing signal detection unit 1 detects that a ringing signal has arrived on the line (Sll), the control unit 9 operates the loop opening/closing unit 2 to close the loop (S12), and then the first Select the response message 313
), the first response message is sent from the response message sending unit 4 to the line via the communication circuit 3 (S.14).
. When the sending of the response message is completed, the control section 9 starts recording the business message (S15). If the response message number is not the third (=3), add LL to the response message number (S17), and then step (S14)
The process is repeated from and after, and when the response message number reaches the third (=3) in step (S16), the line is closed (
318).

用件メツセージ録音処理動作は、第５図で示した発声終
了検知処理と基本的構造は同一であり、第４図（ｂ）に
おいては、音声検出処理の内容を詳細に示しであると共
に、ポーズ圧縮録音処理についても具体的に記しである
。また、本図における音声検出処理では、発呼者周囲の
雑音や回線雑音といった背景に定常的に存在する雑音に
より音声検出結果が悪影響を受けないように、背景雑音
レベルに適応して音声検出レベルしきい値の大きさを変
化させている。The basic structure of the business message recording processing operation is the same as the utterance end detection processing shown in FIG. 5, and FIG. The compressed recording process is also specifically described. In addition, in the voice detection processing in this figure, the voice detection level is adjusted to match the background noise level so that the voice detection results are not adversely affected by background noise that constantly exists in the background, such as noise around the caller or line noise. The size of the threshold is changed.

以下、用件メツセージ録音処理の動作について第４図（
ｂ）を参照して詳細に説明する。本処理において、タイ
マをリセット／スタートさせ（Ｓ２１）、制御部９は信
号レベル測定部８の出力を一定周期で読み込むと共に（
Ｓ２２）、回線信号レベルについて、前回の読み込み値
と今回の読み込み値の差分△を計算する（Ｓ２３）。こ
れを等式の形で表すと第　（１１式のようにな°る。た
だし、■ｏは時刻ｎにおける信号レベル測定部８からの
読み込み値、Ｖ　ｎ−＋は１読み込み周期前における信
号レベル測定部８からの読み込み値を表している。Below, the operation of the message recording process is shown in Figure 4 (
This will be explained in detail with reference to b). In this process, the timer is reset/started (S21), and the control unit 9 reads the output of the signal level measurement unit 8 at a constant cycle, and (
S22), the difference Δ between the previous read value and the current read value is calculated for the line signal level (S23). Expressing this in the form of an equation, it becomes Equation 11. However, ■ o is the read value from the signal level measuring section 8 at time n, and V n-+ is the signal level one read cycle ago. It represents the read value from the measurement unit 8.

△＝■ゎ　−■ｎ−＋　　　　　　　　　　　　　・・
・・・・（１）差分値△は、制御部９内で音声始端検出
しきい値Ｖ　＠Ｌａｇ＠と比較される。ここで、Ｖ　＠
ａｇ＠は音声の始端を検出するためのしきい値で、差分
値△がＶ　ｓａｇｅ以上であれば（Ｓ２４）、音声区間
が開始したとみなす。すなわち、ある時刻ｋにおいて、
■、６、。≦△う＝Ｖ、−Ｖ、、　　−・・・・・・（
２）が成り立つときには、時刻ｋから音声区間が開始し
たと判断する。この時、Ｖ　ｋ−１は背景雑音のレベル
を代表しているとみなすことができる。相手音声が存在
する時の信号レベルは、背景雑音レベルに音声に起因す
るレベル分が加えられる形となるので、音声検出レベル
しきい値Ｖ　ｔｈは背景雑音レベルよりも大きな値に設
定されているべきである。このことから、一般に音声検
出しきい値■ｔｈの算出は第　（３）式に従う。ただし
、定数ｆは１より大きな値である。△＝■ゎ −■n−＋・・
(1) The difference value Δ is compared with the voice start detection threshold value V@Lag@ within the control unit 9. Here, V @
ag@ is a threshold value for detecting the start of a voice, and if the difference value Δ is greater than or equal to V sage (S24), it is considered that the voice section has started. That is, at a certain time k,
■, 6,. ≦△U=V, -V,, -・・・・・・(
When 2) holds true, it is determined that the voice section has started from time k. At this time, V k-1 can be considered to represent the level of background noise. The signal level when the other party's voice is present is such that the level caused by the voice is added to the background noise level, so the voice detection level threshold V th is set to a value larger than the background noise level. Should. From this, the voice detection threshold value ■th is generally calculated according to equation (3). However, the constant f is a value larger than 1.

Ｖ　ｉｈ＝　ｆ　Ｘ　Ｖ　ｍ−ｒ　　　　　　　　　　
・・・・・・（３）この実施例では、この時、Ｖ　ｔｈ
として、ポーズ圧縮用と対話応答用を別々に求める。具
体的には、第（３）式において、定数ｆの値をｆ、、ｆ
ｔ。V ih= f X V m−r
(3) In this embodiment, at this time, V th
The pose compression and dialogue responses are calculated separately. Specifically, in equation (3), the value of the constant f is expressed as f, , f
t.

の２種類用意しておき、対話応答用の音声検出しきい値
Ｖｔｈ−ａｌ　ポーズ圧縮用の音声検出しきい値Ｖ　ｔ
ｈ−ゎをそれぞれ以下の式にしたがって算出する（Ｓ２
５）　　。Two types are prepared: a voice detection threshold for dialogue response, Vth-al, and a voice detection threshold for pause compression, Vt.
Calculate h−ゎ according to the following formulas (S2
5).

Ｖ　ｔｌｉ−ａ＝　ｆ　ａ　　Ｘ　Ｖ　ｋ−＋　　　　
　　　　　　・・・・・・（４）Ｖ　ｔｈ−ゎ＝ｆゎ　
×■う−１・・・・・・（５）ここで、先にも説明した
とおり、対話応答については発呼者が発声していないに
もかかわらず有音と検出される事態を防ぐために、音声
検出感度を低くする。すなわち、Ｖ　ｔｈ−ａを相対的
に大きな値にしきい値設定手段９Ａで設定する。一方、
ポーズ圧縮については発呼者が発声しているにもかかわ
らず無音と検出され、音声が録音から脱落することを防
ぐために、音声検出感度を高くする。すなわち、■ｔｈ
−５を相対的に小さな値にしきい値設定手段９Ａで設定
する。したがって、ｆ、とｆｂの間には、ｆ、≧ｆゎ　　　　　　　　　　　・・・・・・（６）
の関係がある。以後、第（４）、（５）式によって算出
された音声検出しきい値Ｖ　ｔｈ−Ｉｌｌ　Ｖｔｈ−ｂ
を基に音声検出処理を行う。V tli-a= f a X V k-+
・・・・・・(4)V th−ゎ=fゎ
×■U-1・・・・・・(5) As explained earlier, regarding dialogue responses, in order to prevent the situation where the caller is detected as having spoken even though the caller is not speaking, , lower the voice detection sensitivity. That is, V th-a is set to a relatively large value by the threshold setting means 9A. on the other hand,
Regarding pause compression, voice detection sensitivity is increased to prevent voice from being detected as silent even though the caller is speaking and to prevent the voice from being dropped from the recording. That is, ■th
-5 is set to a relatively small value by the threshold setting means 9A. Therefore, between f and fb, f,≧fゎ (6)
There is a relationship between Hereinafter, the voice detection threshold value V th-Ill Vth-b calculated by equations (4) and (5)
Performs voice detection processing based on.

ポーズ圧縮録音処理を実行するに当たっては、信号レベ
ル測定部８から読み込まれた信号レベル■ｏとポーズ圧
縮用音声検出しきい値Ｖ　ｔｈ−ｂの間に、 ■ｏ　≧Ｖ　ｔｈ−ｂ　　　　　　　　　　　　　　　
　・・・・・・（７）の関係が成り立つか否かが確認さ
れる（Ｓ２６＋。When performing the pause compression recording process, between the signal level ■o read from the signal level measurement unit 8 and the voice detection threshold for pause compression V th-b, ■o ≧V th-b
. . . It is confirmed whether the relationship (7) holds true (S26+).

（Ｓ２７）。第　（７）式が成立する場合には、相手音
声を録音するべきと判断しくＳ２’８）、用件メツセー
ジ録音部６は通話回路３を介して回線信号を用件メツセ
ージ格納部７へ録音を行う。この録音は第　（７）式が
不成立となった時点で一旦停止しく５２９）、再度筒　
（７）式が成立すると再開される。(S27). If the formula (7) holds true, it is determined that the other party's voice should be recorded (S2'8), and the message recording section 6 records the line signal to the message storage section 7 via the telephone call circuit 3. I do. This recording is temporarily stopped when formula (7) does not hold529), and then the recording is repeated.
When the formula (7) is satisfied, the process is restarted.

一方、発声終了検知処理を実行するに当たっては、信号
レベル測定部８から読み込まれた信号レベル■。と対話
応答処理用音声検出しきい値Ｖ　ｔｈ−ｍの間に、 ■ｏ≧Ｖ　ｔｈ−ａ　　　　　　　　　　　・・・−（
８１の関係が成り立つか否かが確認される（Ｓ３０）。On the other hand, when executing the utterance end detection process, the signal level ■ read from the signal level measuring section 8 is used. and the voice detection threshold for dialogue response processing V th-m, ■o≧V th-a ...-(
It is confirmed whether the relationship 81 holds true (S30).

用件メツセージ録音処理の開始時にリセットスタートシ
たタイマ１０のカウント値が、無言判定時間しきい値Ｔ
。□。。以上になって第　（８）式が成立しない場合に
は（Ｓ３１）、相手が発声を行う意志がないものと判断
し、用件メツセージ録音処理を終了しく５３２）、電話
機は次の応答メツセージの送出動作に移る。この際、用
件メツセージ録音部６の録音処理がすでに開始されてい
た場合には、この録音処理を終了してから応答メッセジ
の送出を行う。ステップＣｓ　３０）において、タイマ
１０のカウント値が無言判定時間しきい値Ｔ、□。。以
上になる前に第　（８）式が成立した場合には、相手が
発声を開始したと判断し、発声の終了を検知する処理を
開始する。このため、タイマ１０はカウント値を一旦リ
セットしく３３３）、発声中に現れる無音区間の継続時
間長の測定を開始する（Ｓ３４）〜（Ｓ３７）。この時
、第　（８）式が成立した場合には（Ｓ３８）、タイマ
１０のカウントを再度リセットしく３３３）、第　（８
）式が不成立の場合には（Ｓ３８）、タイマ１０のカウ
ント値と発声終了判定時間しきい′値Ｔ　＠　ｎ　ｄと
比較しく５３９）、カウント値がＴ　＠Ｉｎ以上になっ
た時には、相手の発声が終了したと判断し、用件メツセ
ージ録音動作を終了しく５４０）、次応答メツセージの
送出を行う。The count value of timer 10, which was reset and started at the start of the message recording process, is the silent judgment time threshold T.
. □. . If Equation (8) does not hold true (S31), it is determined that the other party has no intention of speaking, and the message recording process is terminated (532), and the telephone sets the next response message. Move on to the sending operation. At this time, if the recording process of the business message recording unit 6 has already started, the response message is sent after finishing this recording process. In step Cs30), the count value of the timer 10 reaches the silence determination time threshold T, □. . If formula (8) is satisfied before the above occurs, it is determined that the other party has started speaking, and a process for detecting the end of speaking is started. Therefore, the timer 10 once resets the count value (333) and starts measuring the duration of the silent section that appears during utterance (S34) to (S37). At this time, if the formula (8) is satisfied (S38), the count of the timer 10 is reset again (333),
) is not satisfied (S38), compare the count value of the timer 10 and the utterance end judgment time threshold value T@n d (539), and when the count value exceeds T@In, the other party's It is determined that the utterance has ended, the message recording operation is ended (540), and the next response message is sent.

なお、ステップ（Ｓ３５．Ｓ３６，５３７）はステップ
（Ｓ２７）〜（Ｓ２９）と同時に第　（７）式が成立す
る区間のみ音声録音をるい、それ以外の区間は録音を行
わない処理である。Incidentally, steps (S35, S36, 537) are a process in which voice recording is performed only in the section where equation (7) is satisfied at the same time as steps (S27) to (S29), and no recording is performed in other sections.

以上の動作が最終応答メツセージ送出後の用件メツセー
ジ録音動作まで繰り返されろ。最終応答メツセージに対
する用件メツセージ録音動作が終了時には、再びループ
開閉部２を動作させ、回線は開放し自動応答を終了する
。The above operations are repeated until the message is recorded after the final response message is sent. When the business message recording operation for the final response message is completed, the loop opening/closing section 2 is operated again, the line is opened, and the automatic response is terminated.

〔Effect of the invention〕

以上説明したとおり、この発明は、音声検出手段の音声
検出しきい値として対話応答用音声検出しきい値とポー
ズ圧縮用音声検出しきい値の２種を独立に設定可能なし
きい値設定手段を備えたので、対話応答用音声検出しき
い値とポーズ圧縮用音声検出しきい値とを独立に設定で
きるので、音声検出が誤動作した時にも各処理に対する
影響が致命的な欠陥を露呈しないように各々のしきい値
を設定することが可能になる。As explained above, the present invention provides a threshold setting means that can independently set two types of voice detection thresholds for the voice detection means: a voice detection threshold for dialogue response and a voice detection threshold for pause compression. Since the voice detection threshold for dialogue response and the voice detection threshold for pause compression can be set independently, even if voice detection malfunctions, the effect on each process will not reveal a fatal flaw. It becomes possible to set each threshold value.

[Brief explanation of drawings]

第１図はこの発明の一実施例を示すブロック図、第２図
は信号レベル測定部の具体的な構成例および制御部との
インタフェースを説明した図、第３図は応答メツセージ
格納部の内部構成例を示す図、第４図（ａ）、（ｂ）は
、第１図の実施例の動作例を示したフローチャート、第
５図は発呼者が応答メツセージに対して発声を開始する
か否かの判定および一旦発声を開始した発呼者が発声を
終了したか否かの判定を音声検出結果を基に行う従来の
アルゴリズムを示したフローチャートである。図中、１は呼出信号検出部、２はループ開閉部、３は通
話回路、４は応答メツセージ送出部、５は応答メツセー
ジ格納部、６は用件メツセージ録音部、７は用件メツセ
ージ格納部、８は信号レベル測定部、８ＡはＡ／Ｄ変換
器、９は制御部、第図第３図第図（ａ）FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a diagram illustrating a specific example of the configuration of the signal level measuring section and the interface with the control section, and FIG. 3 is the inside of the response message storage section. 4(a) and 4(b) are flowcharts showing an example of the operation of the embodiment of FIG. 1, and FIG. 5 shows how the caller starts speaking in response to the response message. 12 is a flowchart illustrating a conventional algorithm for determining whether or not the caller has finished speaking based on the voice detection result. In the figure, 1 is a calling signal detection unit, 2 is a loop opening/closing unit, 3 is a communication circuit, 4 is a response message sending unit, 5 is a response message storage unit, 6 is a message recording unit, and 7 is a message storage unit , 8 is a signal level measuring section, 8A is an A/D converter, 9 is a control section, Fig. 3, Fig. 3 (a)

Claims

[Claims]

It has an interactive response function that automatically receives a call, sends out multiple response messages stored in the response message storage unit in response to messages from the caller, and allows the caller to respond in an interactive manner. and a voice detection means for detecting the presence or absence of the caller's voice. In an interactive answering machine that also has a pause compression recording function, it is possible to independently set two types of voice detection thresholds for the voice detection means: a voice detection threshold for dialogue response and a voice detection threshold for pause compression. An interactive answering machine characterized by being equipped with a threshold setting means.