JP2010206365A

JP2010206365A - Interaction device

Info

Publication number: JP2010206365A
Application number: JP2009047873A
Authority: JP
Inventors: Shoji Onofuji; 祥司尾野藤
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2009-03-02
Filing date: 2009-03-02
Publication date: 2010-09-16

Abstract

<P>PROBLEM TO BE SOLVED: To detect a distance to an operator without requiring an additional dedicated sensor or microphone, and to perform interaction with appropriate gain. <P>SOLUTION: A reception terminal 20 includes a microphone 207 for inputting voice, and a speaker 208 for outputting voice, acquires corresponding noise information by noise input through the microphone 207, outputs pseudo noise generated based on the acquired noise information through the speaker 208, acquires corresponding reflected voice information by voice reflected on an object of the pseudo noise input through the microphone 207, estimates that the object is a visitor M by carrying out a predetermined calculation process based on the acquired reflected voice information to detect the distance to the visitor M, and adjusts the gain of the microphone 207 based on the detection result. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、操作者が音声による対話方式により操作可能な対話装置に関する。 The present invention relates to an interactive device that can be operated by an operator using a voice interactive method.

例えば建造物への来訪者に対する受付業務を行う受付装置等、操作者が対話方式によって操作可能な対話装置が、従来より既に知られている。このような対話装置においては、操作者の発話内容をマイク等の音声入力手段により入力し、適宜のゲインで増幅した後に音声認識する。この際、発話音声レベルに対してゲインが小さすぎると誤認識が生じ、発話音声レベルに対してゲインが大きすぎると音割れにより認識不能となる。一般に、対話処理においては、原則として（特に要請しない限り）操作者は同一内容を一度しか発話しないため、上記のような場合は発話内容の認識漏れが生じる。このような認識漏れを防止するためには、上記ゲインの適正化が必要である。一般に、装置から操作者までの距離が遠いと音声入力手段に入力される発話音声レベルが小さく（言い換えればゲインを大きくする必要がある）、装置から操作者までの距離が近いと音声入力手段に入力される発話音声レベルが大きい（言い換えればゲインを小さくする必要がある）。したがって、ゲインの適正化のためには、装置から操作者までの距離を非接触で精度よく検出し、その距離に応じた適切なゲインで対話処理を行うことが好ましい。 For example, an interactive device that can be operated by an operator in an interactive manner, such as a reception device that performs a reception work for a visitor to a building, has been known. In such an interactive apparatus, the content of the operator's utterance is input by a voice input means such as a microphone, and is amplified by an appropriate gain and then recognized. At this time, if the gain is too small with respect to the utterance voice level, erroneous recognition occurs, and if the gain is too large with respect to the utterance voice level, recognition becomes impossible due to sound cracking. Generally, in the dialogue processing, as a general rule (unless specifically requested), the operator utters the same content only once. Therefore, in the above case, the utterance content is not recognized. In order to prevent such recognition omission, it is necessary to optimize the gain. Generally, when the distance from the device to the operator is long, the speech level input to the sound input means is small (in other words, the gain needs to be increased), and when the distance from the device to the operator is short, the sound input means The input speech level is high (in other words, the gain needs to be reduced). Therefore, in order to optimize the gain, it is preferable to accurately detect the distance from the apparatus to the operator in a non-contact manner and perform the dialogue process with an appropriate gain corresponding to the distance.

このような非接触距離検出に関しては、例えば特許文献１記載の従来技術が知られている。この従来技術では、超音波パルスを生成して対象物（物体）に対して出力し、検出対象物での反射波（エコーパルス）を探知する。そして超音波パルスの伝達時間を算出することにより、その伝達時間に基づいて対象物までの距離を検出するようになっている。 For such non-contact distance detection, for example, the prior art described in Patent Document 1 is known. In this prior art, an ultrasonic pulse is generated and output to an object (object), and a reflected wave (echo pulse) on the detection object is detected. Then, by calculating the transmission time of the ultrasonic pulse, the distance to the object is detected based on the transmission time.

特開２００５−３５１８９７号公報JP 2005-351897 A

しかしながら、対話装置に対し、上記従来技術のような超音波を用いた距離検出手法を適用しようとする場合、距離検出専用のセンサやマイクを新たに設ける必要が生じるという問題があった。 However, when the distance detection method using ultrasonic waves as in the above-described conventional technique is applied to the interactive device, there is a problem that it becomes necessary to newly provide a sensor or microphone dedicated to distance detection.

本発明の目的は、専用のセンサやマイクを新たに設ける必要がなく操作者までの距離検出を行い、適切なゲインで対話処理を行える対話装置を提供することにある。 An object of the present invention is to provide an interactive apparatus capable of detecting a distance to an operator and performing interactive processing with an appropriate gain without newly providing a dedicated sensor or microphone.

上記目的を達成するために、第１の発明は、操作者が対話方式により操作可能な対話装置であって、音声を入力するための音声入力手段と、音声を出力するための音声出力手段と、前記音声入力手段を介し入力され、距離検出用の検出音の生成元となるソース音により、対応する振幅あるいは周波数を含むソース音情報を取得するソース音取得手段と、前記音声入力手段が前記音を入力してから所定時間以内に、前記ソース音取得手段が取得した前記ソース音情報に基づき生成された前記検出音を、前記音声出力手段を介し出力する検出音出力手段と、前記音声入力手段を介し入力された、前記検出音の前記対象物での反射音により、対応する反射音情報を取得する反射音取得手段と、前記反射音取得手段で取得された前記反射音情報に基づき、所定の演算処理を行い、前記対象物が前記操作者であると推定して当該操作者までの距離を検出する距離検出手段と、前記距離検出手段での検出結果に基づき、前記音声入力手段のゲインを調整する感度調整手段とを有することを特徴とする。 In order to achieve the above object, the first invention is an interactive apparatus that can be operated by an operator in an interactive manner, and includes an audio input means for inputting audio, and an audio output means for outputting audio. Source sound acquisition means for acquiring source sound information including a corresponding amplitude or frequency from a source sound that is input via the voice input means and serves as a generation source of detection sound for distance detection, and the voice input means includes the A detection sound output means for outputting the detection sound generated based on the source sound information acquired by the source sound acquisition means within the predetermined time after the sound is input via the sound output means; and the sound input Based on the reflected sound information acquired by the reflected sound acquisition means and the reflected sound acquisition means for acquiring the corresponding reflected sound information by the reflected sound of the detected sound from the object input through the means. A distance detection means for performing a predetermined calculation process, detecting that the object is the operator and detecting a distance to the operator, and the voice input means based on the detection result of the distance detection means And a sensitivity adjusting means for adjusting the gain.

本願第１発明の対話装置においては、距離検出用の検出音を用いて操作者との距離を検出する。すなわち、検出音生成時の生成元となるソース音が音声入力手段を介して入力され、対応するソース音情報がソース音取得手段で取得される。すると、このソース音情報に基づき、検出音出力手段が、検出音を音声出力手段を介し出力する。出力された検出音が対象物に向かって伝搬し対象物で反射すると、その反射音が音声入力手段を介して入力され、対応する反射音情報が反射音取得手段で取得される。検出音が発せられてからその反射音が戻ってくるまでの時間は、装置から対象物までの距離に比例することから、距離検出手段が、上記反射音情報に基づき、対象物が操作者であると推定して操作者までの距離を検出する。そして、この検出された距離に基づき、感度調整手段が音声入力手段のゲインを調整する。これにより、操作者までの距離が比較的近い場合には対話時の操作者の発話音声が比較的大きいレベルで入力されることから音声入力手段のゲインを低くし、操作者までの距離が比較的遠い場合には対話時の操作者の発話音声が比較的小さいレベルで入力されることから音声入力手段のゲインを高くすることで、適切な信号レベルで対話処理を行うことが可能となる。この結果、認識漏れのない、確実な対話処理を行うことができる。 In the interactive apparatus according to the first aspect of the present application, the distance to the operator is detected using the detection sound for distance detection. In other words, the source sound that is the generation source when the detected sound is generated is input via the sound input unit, and the corresponding source sound information is acquired by the source sound acquisition unit. Then, based on this source sound information, the detection sound output means outputs the detection sound via the sound output means. When the output detection sound propagates toward the object and is reflected by the object, the reflected sound is input via the sound input means, and the corresponding reflected sound information is acquired by the reflected sound acquisition means. Since the time from when the detection sound is emitted until the reflected sound returns is proportional to the distance from the device to the target object, the distance detection means determines that the target object is an operator based on the reflected sound information. The distance to the operator is detected assuming that there is. Based on the detected distance, the sensitivity adjustment unit adjusts the gain of the voice input unit. As a result, when the distance to the operator is relatively close, the spoken voice of the operator during the dialogue is input at a relatively high level, so the gain of the voice input means is lowered and the distance to the operator is compared. When the user is far away, the speech voice of the operator at the time of dialogue is input at a relatively small level. Therefore, it is possible to perform dialogue processing at an appropriate signal level by increasing the gain of the voice input means. As a result, it is possible to perform reliable interactive processing without recognition failure.

以上のようにして、本願第１発明においては、音声入力手段及び音声出力手段を介して入出力する音を用いて、操作者までの距離を検出することができる。すなわち、対話処理のためにもともと備わっている音声入力手段（マイク等）や音声出力手段（スピーカ等）を活用することで、それ以外の別途の距離検出用のセンサや専用マイク等を新たに設けることなく距離検出を行うことができ、それに基づく適切なゲインで確実な対話処理を行うことができる。 As described above, in the first invention of the present application, the distance to the operator can be detected using the sound input / output via the voice input means and the voice output means. That is, by utilizing voice input means (such as a microphone) and voice output means (such as a speaker) that are originally provided for interactive processing, other distance detection sensors and dedicated microphones are newly provided. Distance detection can be performed without fail, and reliable dialogue processing can be performed with an appropriate gain based on the distance detection.

第２発明は、上記第１発明において、前記ソース音取得手段は、操作者が前記対話装置の対話処理において発声し前記音声入力手段で入力された前記ソース音としての発話音声により、前記ソース音情報としての発話音声情報を取得し、前記検出音出力手段は、前記ソース音取得手段が取得した前記発話音声情報に基づき生成された前記検出音を、前記音声出力手段を介し出力することを特徴とする。 According to a second aspect of the present invention, in the first aspect, the source sound acquisition unit is configured to generate the source sound based on the utterance voice as the source sound that is uttered by the operator in the dialogue processing of the dialogue apparatus and input by the voice input unit. Utterance voice information as information is acquired, and the detection sound output means outputs the detection sound generated based on the utterance voice information acquired by the source sound acquisition means via the voice output means. And

ソース音に基づき検出音を生成するとき、元となるソース音のレベルがあまりに小さいと、出力した検出音のレベルも小さく、その反射音を検出することが困難となる。通常、対話方式で操作を行おうとする操作者は、自己の発話音声をなるべく確実に認識してもらおうという意図が働き、ゆっくりと大きめの音量で発話を行う。したがって、距離検出時に、そのような操作者の発話音声に基づいて検出音（疑似発話音）を生成し利用することで、精度の高い確実な距離検出を行うことができる。また、操作者自らが発声している発話音声を利用することにより、音を用いて検出していることを操作者に比較的悟られにくいという効果や、予め操作者の位置を予想してゲインを調整可能となるので（オートゲインコントロールによって調整を行う場合に比べ）より適切な調節を行える効果もある。 When the detection sound is generated based on the source sound, if the level of the original source sound is too small, the level of the output detection sound is also small, and it is difficult to detect the reflected sound. Usually, an operator who intends to perform an operation in an interactive manner intends to recognize his / her speech as reliably as possible, and speaks slowly and at a high volume. Therefore, when detecting the distance, the detection sound (pseudo utterance sound) is generated and used based on the utterance voice of such an operator, so that accurate and reliable distance detection can be performed. In addition, by using the utterance voice that the operator himself utters, the effect that the operator is relatively less aware of the detection using sound, and the gain of the operator by predicting the position of the operator in advance. Can be adjusted (compared to the case where adjustment is performed by auto gain control).

第３発明は、上記第１発明において、前記ソース音取得手段は、操作者との対話処理において前記音声出力手段が出力し前記音声入力手段で入力された前記ソース音としての装置音声又は当該装置音声の反射音により、前記ソース音情報としての装置音声情報を取得し、前記検出音出力手段は、前記ソース音取得手段が取得した前記装置音声情報に基づき生成された前記検出音を、前記音声出力手段を介し出力することを特徴とする。 According to a third aspect of the present invention, in the first aspect, the source sound acquisition means is a device sound as the source sound that is output from the voice output means and input by the voice input means in an interactive process with an operator, or the device The apparatus sound information as the source sound information is acquired from the reflected sound of the sound, and the detection sound output means uses the detection sound generated based on the apparatus sound information acquired by the source sound acquisition means as the sound. It outputs through an output means.

ソース音に基づき検出音を生成するとき、元となるソース音のレベルがあまりに小さいと、出力した検出音のレベルも小さく、その反射音を検出することが困難となる。一般に、対話方式で操作者に操作してもらう対話装置では、操作者にわかりやすく説明や案内を行うために、操作者に向かってゆっくりと大きめの音量で発声を行う。したがって、距離検出時に、そのような装置発声に基づいて検出音（疑似装置音声）を生成し利用することで、精度の高い確実な距離検出を行うことができる（なおこの場合には検出タイミングは、装置発話の終了時が好ましい）。また、装置が対話用に発声している音声を利用することにより、音を用いて検出していることを操作者に比較的悟られにくいという効果や、距離算出に都合のよい音情報（周波数や振幅）を選択することで音声入力後の解析を速くできる効果もある。 When the detection sound is generated based on the source sound, if the level of the original source sound is too small, the level of the output detection sound is also small, and it is difficult to detect the reflected sound. In general, in an interactive apparatus that is operated by an operator in an interactive manner, the user speaks slowly and loudly toward the operator in order to provide easy-to-understand explanations and guidance. Therefore, at the time of distance detection, it is possible to perform highly accurate and reliable distance detection by generating and using a detection sound (pseudo device sound) based on such device utterance (in this case, the detection timing is Preferably at the end of the device utterance). In addition, by using the voice uttered by the device for dialogue, it is relatively difficult for the operator to recognize that the sound is detected using sound, and sound information (frequency that is convenient for distance calculation) Or amplitude) can be used to speed up analysis after voice input.

第４発明は、上記第１発明において、前記ソース音取得手段は、装置周囲で発生し前記音声入力手段で入力された前記ソース音としての周囲音により、前記ソース音情報としての周囲音情報を取得し、前記検出音出力手段は、前記ソース音取得手段が取得した前記周囲音情報に基づき生成された前記検出音を、前記音声出力手段を介し出力することを特徴とする。 In a fourth aspect based on the first aspect, the source sound acquisition means generates ambient sound information as the source sound information from the ambient sound as the source sound generated around the apparatus and input by the audio input means. The detection sound output means acquires and outputs the detection sound generated based on the ambient sound information acquired by the source sound acquisition means via the sound output means.

距離検出時に、音声入力手段で入力した周囲音に基づいて検出音（疑似周囲音）を生成し利用することにより、音を用いて検出していることを操作者に悟られることなく、距離検出を行うことができる。 When detecting the distance, the detection of the detection sound (pseudo-ambient sound) based on the ambient sound input by the voice input means is used, and the distance detection is realized without the operator realizing that the sound is detected. It can be performed.

第５発明は、上記第１乃至第４発明のいずれかにおいて、前記距離検出手段で検出された前記操作者までの距離が所定値以下となったら、前記感度調整手段による前記ゲインの調整を開始させる、開始制御手段を有することを特徴とする。 According to a fifth invention, in any one of the first to fourth inventions, when the distance to the operator detected by the distance detection means becomes a predetermined value or less, the gain adjustment by the sensitivity adjustment means is started. It is characterized by having a start control means.

対話方式で操作を行う対話装置の場合、操作者は、装置に比較的近づいてから発話を行うのが一般的である。したがって、装置から操作者までの距離が遠い場合には、操作者が操作を開始する可能性は低い。本願第５発明ではこれに対応して、開始制御手段が、操作者までの距離が比較的近くなってから音声入力手段のゲインの調整を開始する。これにより、無駄な調整動作を回避し、効率的な処理を行うことができる。 In the case of an interactive device that operates in an interactive manner, an operator generally utters after relatively approaching the device. Therefore, when the distance from the apparatus to the operator is long, the possibility that the operator starts the operation is low. In the fifth invention of this application, in response to this, the start control means starts adjusting the gain of the voice input means after the distance to the operator is relatively short. Thereby, useless adjustment operation can be avoided and efficient processing can be performed.

第６発明は、上記第５発明において、前記距離検出手段で検出された前記操作者までの距離が前記所定値より大きい場合、前記音声入力手段のゲインを所定値以上に設定する設定制御手段を有することを特徴とする。 According to a sixth invention, in the fifth invention, when the distance to the operator detected by the distance detecting means is larger than the predetermined value, a setting control means for setting the gain of the voice input means to a predetermined value or more. It is characterized by having.

前述したように、装置から操作者までの距離が遠い場合には、操作者が対話による操作を開始する可能性は低い。しかしながら、操作者によっては（あるいは状況によっては）比較的遠い距離のまま、操作者が発話を行う可能性もある。この場合、装置から操作者までの距離が遠いため、そのままでは操作者の発話音声が比較的小さいレベルで音声入力手段より入力されることとなる。そこで、本願第６発明においては、設定制御手段が音声入力手段のゲインを所定値以上に設定し、信号レベルを増大させる。これにより、このような遠方からの操作時においても、認識漏れの可能性を低減することができる。 As described above, when the distance from the device to the operator is long, it is unlikely that the operator will start an interactive operation. However, depending on the operator (or depending on the situation), there is a possibility that the operator speaks at a relatively long distance. In this case, since the distance from the apparatus to the operator is long, the speech voice of the operator is input from the voice input means at a relatively low level as it is. Therefore, in the sixth invention of the present application, the setting control means sets the gain of the voice input means to a predetermined value or more to increase the signal level. Thereby, the possibility of recognition failure can be reduced even during such an operation from a distance.

第７発明は、上記第６発明において、前記感度調整手段でゲインが調整された前記音声入力手段を用いて対話処理が終了した後、所定期間が経過したら、前記感度調整手段による前記音声入力手段のゲイン調整を終了する終了制御手段を有することを特徴とする。 In a seventh aspect based on the sixth aspect, the voice input means by the sensitivity adjustment means when a predetermined period has elapsed after the dialog processing is completed using the voice input means whose gain has been adjusted by the sensitivity adjustment means. And an end control means for ending the gain adjustment.

操作者との距離に応じた音声入力手段の感度調整後に対話処理が行われ、その対話処理が終了してしばらくたった場合には、対話していた操作者は既に別の場所に移動し、装置近傍に誰もいない状態になっている可能性が高い。そこで本願第７発明においてはこれに応じ、終了制御手段が、対話処理終了後所定期間が経過したら、音声入力手段のゲイン調整を終了し行わないようにする。これにより、その時点でのゲインの値により次の操作者を待ち受ける状態を確実に実現することができる。 When the dialogue processing is performed after the sensitivity adjustment of the voice input means according to the distance to the operator, and the dialogue processing is completed for a while, the operator who has been talking has already moved to another place, and the device There is a high possibility that no one is in the vicinity. Accordingly, in the seventh invention of the present application, the end control means stops the gain adjustment of the voice input means after the predetermined period has elapsed after the end of the dialogue processing. As a result, it is possible to reliably realize a state of waiting for the next operator based on the gain value at that time.

本発明によれば、専用のセンサやマイクを新たに設ける必要がなく操作者までの距離検出を行い、適切なゲインで対話処理を行うことができる。 According to the present invention, it is not necessary to newly provide a dedicated sensor or microphone, and it is possible to detect a distance to the operator and perform a dialogue process with an appropriate gain.

本発明の一実施形態の来訪者受付システムの全体構成を示すシステム構成図である。It is a system configuration figure showing the whole visitor reception system composition of one embodiment of the present invention. 来訪者受付システムのシステム全体の機能構成を表す機能ブロック図である。It is a functional block diagram showing the function structure of the whole system of a visitor reception system. 表示部における表示画面の一例を表す図である。It is a figure showing an example of the display screen in a display part. 受付端末の機能的構成を示す機能ブロック図である。It is a functional block diagram which shows the functional structure of a reception terminal. ＤＢサーバの機能的構成を表す機能ブロック図である。It is a functional block diagram showing the functional structure of DB server. スピーカより疑似雑音を出力するまでの手順の概要を説明した説明図である。It is explanatory drawing explaining the outline | summary of the procedure until it outputs pseudo noise from a speaker. 来訪者までの距離を検出する手法の概要、及び、検出した距離が所定値よりも大きかった場合のゲイン制御の内容を説明した図である。It is the figure explaining the outline | summary of the method of detecting the distance to a visitor, and the content of the gain control when the detected distance is larger than a predetermined value. 検出した距離が所定値以下であった場合のゲイン制御の内容を説明した図である。It is a figure explaining the content of the gain control when the detected distance is below a predetermined value. 受付端末による受付処理終了後の状態を模式的に表した図である。It is the figure which represented typically the state after the reception process completion by a reception terminal. 受付端末の制御回路部により実行する制御手順を表すフローチャートである。It is a flowchart showing the control procedure performed by the control circuit part of a reception terminal. 受付端末の制御回路部により実行する制御手順を表すフローチャートである。It is a flowchart showing the control procedure performed by the control circuit part of a reception terminal. ソース音として来訪者の発話音声を利用する変形例において、スピーカより疑似発話音声を出力するまでの手順の概要を説明した説明図である。It is explanatory drawing explaining the outline | summary of the procedure until it outputs a pseudo utterance voice from a speaker in the modification which uses a visitor's utterance voice as a source sound. 受付端末の制御回路部により実行する制御手順を表すフローチャートである。It is a flowchart showing the control procedure performed by the control circuit part of a reception terminal. 受付端末の制御回路部により実行する制御手順を表すフローチャートである。It is a flowchart showing the control procedure performed by the control circuit part of a reception terminal. ソース音として受付端末の案内音声を利用する変形例において、スピーカより疑似案内音声を出力するまでの手順の概要を説明した説明図である。It is explanatory drawing explaining the outline | summary of the procedure until it outputs a pseudo guidance audio | voice from a speaker in the modification which uses the guidance audio | voice of a reception terminal as a source sound.

以下、本発明の一実施の形態を図面を参照しつつ説明する。本実施形態では、本発明の対話装置を、例えば、ビルや会社その他の建造物への来訪者に対する受付業務を行う来訪者受付システムに適用した場合を表している。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. In the present embodiment, the dialogue apparatus of the present invention is applied to a visitor reception system that performs reception work for visitors to buildings, companies, and other buildings, for example.

（Ａ）システムの基本構成
図１は、本実施形態の来訪者受付システムの全体構成を示すシステム構成図である。 (A) Basic Configuration of System FIG. 1 is a system configuration diagram showing an overall configuration of a visitor reception system according to the present embodiment.

図１において、来訪者受付システム１は、例えば会社の入口付近に設置され、操作者（この例では、会社への来訪者）Ｍが対話方式により操作可能な受付端末２０（対話装置）を有している。受付端末２０には、音声を入力するためのマイク２０７（音声入力手段）と、音声を出力するためのスピーカ２０８（音声出力手段）とが設けられている。 In FIG. 1, a visitor reception system 1 is installed near the entrance of a company, for example, and has an acceptance terminal 20 (interactive device) that can be operated by an operator (in this example, a visitor to the company) M in an interactive manner. is doing. The reception terminal 20 is provided with a microphone 207 (voice input means) for inputting voice and a speaker 208 (voice output means) for outputting voice.

受付端末２０は、来訪者Ｍとの対話処理（この例では、来訪者Ｍとの対話による受付処理）や、マイク２０７及びスピーカ２０８に入出力される音声を用いた来訪者Ｍまでの距離の検出を行い、その検出した距離に基づき、マイク２０７のゲイン（後述）の調整を行う。本実施形態では、受付端末２０から来訪者Ｍまでの距離を検出する方法として、スピーカ２０８から距離検出用の検出音（この例では、後述の疑似雑音）を出力させて、その疑似雑音が来訪者Ｍで反射し、その反射音がマイク２０７に入力されるまでの所要時間を測定する。そして、この所要時間が、来訪者Ｍまでの距離に比例するという関係から、来訪者Ｍまでの距離を検出する。すなわち、来訪者Ｍまでの距離をＬ、上記所要時間をｔとすると、
Ｌ＝ｃ×ｔ／２・・・（式１）
で表される関係が成り立つ（詳細は図７で後述する）。なお、ｃは音速（約３４０［ｍ／ｓ］。但し、媒体である空気の密度や圧力により異なる）である。 The reception terminal 20 performs a dialogue process with the visitor M (in this example, a reception process by a dialogue with the visitor M) and a distance to the visitor M using voices input to and output from the microphone 207 and the speaker 208. Detection is performed, and the gain (described later) of the microphone 207 is adjusted based on the detected distance. In the present embodiment, as a method of detecting the distance from the reception terminal 20 to the visitor M, a detection sound for distance detection (in this example, pseudo noise described later) is output from the speaker 208, and the pseudo noise is visited. The time required until the reflected sound is input to the microphone 207 is measured. Then, the distance to the visitor M is detected from the relationship that the required time is proportional to the distance to the visitor M. That is, if the distance to the visitor M is L and the required time is t,
L = c × t / 2 (Formula 1)
(The details will be described later with reference to FIG. 7). Note that c is the speed of sound (about 340 [m / s], but varies depending on the density and pressure of air as a medium).

上記（式１）を解くことによって、来訪者Ｍまでの距離が検出できる。そして、検出した距離に基づき、マイク２０７のゲインを調整（設定）する（詳細は後述する）。
また、図１に示すように、受付端末２０は、表示部２１０、上記マイク２０７、及び上記スピーカ２０８を有している。表示部２１０は、例えば液晶ディスプレイで構成され、この例では水平に設置されるベース２１２に対してアーム２１１を介し支持され、来訪者Ｍの視線に対して直角となるように面方向が斜め上方を向いている。マイク２０７は、ベース２１２に対し先端を来訪者Ｍ側へ向けるようにして略円弧状に配置されている。 The distance to the visitor M can be detected by solving the above (Formula 1). Based on the detected distance, the gain of the microphone 207 is adjusted (set) (details will be described later).
As illustrated in FIG. 1, the reception terminal 20 includes a display unit 210, the microphone 207, and the speaker 208. The display unit 210 is composed of, for example, a liquid crystal display. In this example, the display unit 210 is supported by an arm 211 with respect to a base 212 installed horizontally, and the surface direction is obliquely upward so as to be perpendicular to the line of sight of the visitor M. Facing. The microphone 207 is arranged in a substantially arc shape with the tip thereof facing the visitor M side with respect to the base 212.

なお、表示部２１０をタッチパネルで構成し、表示される表示画面を来訪者Ｍが直接画面に触れながら操作できるようにしてもよい。 Note that the display unit 210 may be configured by a touch panel so that the visitor M can operate the displayed display screen while directly touching the screen.

図２は、来訪者受付システム１のシステム全体の機能構成を表す機能ブロック図である。 FIG. 2 is a functional block diagram showing the functional configuration of the entire system of the visitor reception system 1.

図２において、来訪者受付システム１は、上記受付端末２０と、周知のパーソナルコンピュータにより構成されるＤＢサーバ１０と、会社の従業員それぞれに対応して設けられた複数の（この例では２つの）ＩＰ電話機６０と、それら複数のＩＰ電話機６０の回線交換を行う周知の交換装置であるＩＰ−ＰＢＸ（ＩｎｔｅｎｅｔＰｒｏｔｏｃｏｌＰｒｉｖａｔｅＢｒａｎｃｈｅＸｃｈａｎｇｅ）５０とを有し、これらはすべてルータ４０を介して接続されている。 In FIG. 2, the visitor reception system 1 includes a plurality of (two in this example) provided corresponding to the reception terminal 20, a DB server 10 constituted by a well-known personal computer, and employees of the company. ) An IP telephone 60 and an IP-PBX (Internet Protocol Private Branch Exchange) 50, which is a well-known switching apparatus that performs circuit switching of the plurality of IP telephones 60, all of which are connected via a router 40 Yes.

受付端末２０は、端末本体２０Ａと、この端末本体２０Ａに接続された、上記表示部２１０、ゲイン可変アンプ２１７、及び上記スピーカ２０８と、上記ゲイン可変アンプ２１７に接続された上記マイク２０７とを有している。 The reception terminal 20 includes a terminal main body 20A, the display unit 210, the gain variable amplifier 217, the speaker 208, and the microphone 207 connected to the gain variable amplifier 217, which are connected to the terminal main body 20A. is doing.

マイク２０７は、入力された音声を音声情報に変換し、ゲイン可変アンプ２１７へ出力する。 The microphone 207 converts the input sound into sound information and outputs the sound information to the variable gain amplifier 217.

ゲイン可変アンプ２１７は、マイク２０７から入力された音声情報を増幅（この例では後述のＣＰＵ２０１からの制御信号によってマイク２０７のゲインが決定される増幅）して、端末本体２０Ａへ出力する。上記マイク２０７のゲインとは、入力と出力との比（出力／入力）、すなわち、ＣＰＵ２０１の制御によりゲイン可変アンプ２１７が行う増幅の度合い（＝増幅度）のことである。 The variable gain amplifier 217 amplifies the audio information input from the microphone 207 (in this example, amplification in which the gain of the microphone 207 is determined by a control signal from the CPU 201 described later), and outputs the amplified information to the terminal body 20A. The gain of the microphone 207 is the ratio between input and output (output / input), that is, the degree of amplification (= amplification degree) performed by the gain variable amplifier 217 under the control of the CPU 201.

スピーカ２０８は、端末本体２０Ａから入力された音声信号を、来訪者Ｍに対する報知音（案内音声）や距離検出用の検出音（この例では後述の疑似雑音。あるいは、疑似装置発声音や疑似発話音声でもよい。後述の（１）及び（２）の変形例参照）に変換して出力する。 The speaker 208 uses a voice signal input from the terminal main body 20A as a notification sound (guidance voice) for the visitor M or a detection sound for distance detection (in this example, pseudo noise described later. The sound may be converted into a later-described (1) and (2) modification) and output.

図３は、表示部２１０における表示画面の一例を表す図である。この画面においては、後述の描画プログラムによって生成された、受付業務を行う仮想人物ＩＭが、後述の受付処理が開始されると、オフィス風の背景Ｇとともに表示される。また、スピーカ２０８から発話される音声に対応する文章Ｂ（図中では「＊＊＊」で略記している）が併せて表示される。 FIG. 3 is a diagram illustrating an example of a display screen in the display unit 210. On this screen, a virtual person IM that performs a reception work, which is generated by a drawing program described later, is displayed together with an office-like background G when a reception process described later is started. In addition, a sentence B (abbreviated as “***” in the drawing) corresponding to the voice uttered from the speaker 208 is also displayed.

（Ｂ）受付端末の詳細機能
図４は、受付端末２０の機能的構成を示す機能ブロック図である。 (B) Detailed Function of Reception Terminal FIG. 4 is a functional block diagram showing a functional configuration of the reception terminal 20.

図４において、受付端末２０の端末本体２０Ａは、制御回路部２００と、入出力（Ｉ／Ｏ）インタフェイス２０４と、ハードディスク装置（ＨＤＤ）２０５と、計時手段であるタイマ２０９とを有している。 In FIG. 4, the terminal body 20A of the reception terminal 20 includes a control circuit unit 200, an input / output (I / O) interface 204, a hard disk device (HDD) 205, and a timer 209 that is a time measuring means. Yes.

制御回路部２００は、ＣＰＵ２０１と、受付端末２０の基本的な動作に必要なプログラムやそのための設定値を記憶したＲＯＭ２０２と、各種データを一時的に記憶するＲＡＭ２０３とを備えている。ＣＰＵ２０１は、ＲＯＭ２０２や、ＨＤＤ２０５に記憶されたプログラムに従って、受付端末２０全体の動作を制御する。 The control circuit unit 200 includes a CPU 201, a ROM 202 that stores programs necessary for basic operations of the receiving terminal 20 and setting values for the programs, and a RAM 203 that temporarily stores various data. The CPU 201 controls the overall operation of the reception terminal 20 according to programs stored in the ROM 202 and the HDD 205.

Ｉ／Ｏインタフェイス２０４には、上記ＣＰＵ２０１と、上記ハードディスク装置２０５と、上記タイマ２０９と、上記表示部２１０と、上記ゲイン可変アンプ２１７と、上記スピーカ２０８と、ネットワーク（ＮＷ）カード２０６とが接続されている。 The I / O interface 204 includes the CPU 201, the hard disk device 205, the timer 209, the display unit 210, the gain variable amplifier 217, the speaker 208, and a network (NW) card 206. It is connected.

ＨＤＤ２０５には、音声認識に使用するための言語モデル記憶エリア２５２、単語辞書記憶エリア２５３、来訪者を特定するための音声認識に使用される来訪者辞書記憶エリア２５４、及びプログラム記憶エリア２５６を含む複数の記憶エリアを備えている。 The HDD 205 includes a language model storage area 252 for use in speech recognition, a word dictionary storage area 253, a visitor dictionary storage area 254 used for speech recognition for specifying visitors, and a program storage area 256. A plurality of storage areas are provided.

プログラム記憶エリア２５６には、例えば、受付端末２０の各種動作を制御するための複数のプログラムが記憶されている。記憶されているプログラムとしては、例えば、受付端末２０の基本的な動作を制御するシステムプログラム、ＤＢサーバ１０との通信を制御する通信プログラム、表示部２１０に表示する画像を生成する描画プログラム、上述した音声認識を実行する音声認識プログラム、ＤＢサーバ１０のデータベースにアクセスし照合を行うためのＤＢ照合プログラム、音声合成プログラム、対話制御プログラム、ＩＰ電話機６０とＩＰ−ＰＢＸ５０との接続に係わる電話接続プログラム、前述した距離検出を制御する距離検出プログラム、前述したマイク２０７のゲインの調整を制御する感度調整プログラム等がある。 In the program storage area 256, for example, a plurality of programs for controlling various operations of the reception terminal 20 are stored. Examples of the stored program include a system program that controls basic operations of the receiving terminal 20, a communication program that controls communication with the DB server 10, a drawing program that generates an image to be displayed on the display unit 210, and the like described above. Voice recognition program for executing voice recognition, DB collation program for accessing and collating database of DB server 10, voice synthesis program, dialogue control program, telephone connection program for connection between IP telephone 60 and IP-PBX 50 There are a distance detection program for controlling the above-described distance detection, a sensitivity adjustment program for controlling the gain adjustment of the microphone 207, and the like.

なお、図示はされていないが、ＨＤＤ２０５には、その他、音声認識処理で一般的に使用される周知の音響モデルや、各種処理で使用される設定値等も記憶されている。なお、詳細は説明しないが、音響モデルは、音声の音響的特徴を統計的にモデル化したもので、例えば、母音、子音のそれぞれについて、音響的特徴（例えば、周波数特性）と対応する音素とで表現されている。 Although not shown, the HDD 205 also stores a well-known acoustic model generally used in voice recognition processing, setting values used in various processing, and the like. Although not described in detail, the acoustic model is a statistical model of the acoustic features of speech. For example, for each vowel and consonant, a phoneme corresponding to the acoustic feature (for example, frequency characteristics) It is expressed by.

ＮＷカード２０６は、上記ルータ４０に接続され、ＤＢサーバ１０などとの間でデータの送受信を可能とするための拡張カードである。 The NW card 206 is connected to the router 40 and is an expansion card for enabling data transmission / reception with the DB server 10 or the like.

（Ｃ）ＤＢサーバの詳細機能
図５は、ＤＢサーバ１０の機能的構成を表す機能ブロック図である。 (C) Detailed Function of DB Server FIG. 5 is a functional block diagram showing a functional configuration of the DB server 10.

図５に示すように、ＤＢサーバ１０は、ＣＰＵ１０１と、ＣＰＵ１０１に各々接続されたＲＯＭ１０２及びＲＡＭ１０３と、ＣＰＵ１０１に接続された入出力（Ｉ／Ｏ）インタフェイス１０４と、Ｉ／Ｏインタフェイス１０４にそれぞれ接続された、マウスコントローラ１０６、キーコントローラ１０７、ビデオコントローラ１０８、通信装置１０９、及びハードディスク装置（ＨＤＤ）１５０とを有している。 As shown in FIG. 5, the DB server 10 includes a CPU 101, a ROM 102 and a RAM 103 connected to the CPU 101, an input / output (I / O) interface 104 connected to the CPU 101, and an I / O interface 104. A mouse controller 106, a key controller 107, a video controller 108, a communication device 109, and a hard disk device (HDD) 150 are connected to each other.

ＲＯＭ１０２は、ＢＩＯＳを含む、ＤＢサーバ１０を動作させるための各種のプログラムを記憶している。ＲＡＭ１０３は、各種データを一時的に記憶する。ＣＰＵ１０１は、ＲＯＭ１０２や、後述するＨＤＤ１５０に記憶されたプログラムに従って、ＤＢサーバ１０の全体の制御を司る。 The ROM 102 stores various programs including the BIOS for operating the DB server 10. The RAM 103 temporarily stores various data. The CPU 101 governs overall control of the DB server 10 according to programs stored in the ROM 102 and an HDD 150 described later.

マウスコントローラ１０６、キーコントローラ１０７、及びビデオコントローラ１０８には、それぞれマウス１１６、キーボード１１７、及びディスプレイ１１８が接続されている。通信装置１０９は、ルータ４０に接続され、受付端末２０等、外部機器との間でデータの送受信を行うことを可能とする。 A mouse 116, a keyboard 117, and a display 118 are connected to the mouse controller 106, the key controller 107, and the video controller 108, respectively. The communication device 109 is connected to the router 40 and can exchange data with an external device such as the reception terminal 20.

ＨＤＤ１５０は、予定された来訪者Ｍに関する来訪情報を格納する来訪者予約データベース（ＤＢ）記憶エリア１５１、社員情報を格納する社員データベース（ＤＢ）記憶エリア１５５、及びプログラム記憶エリア１５６を含む複数の記憶エリアを備えている。 The HDD 150 has a plurality of storages including a visitor reservation database (DB) storage area 151 for storing visit information related to the scheduled visitor M, an employee database (DB) storage area 155 for storing employee information, and a program storage area 156. Has an area.

プログラム記憶エリア１５６には、システムプログラム、通信プログラム等、各種処理をＤＢサーバ１０に実行させるための各種プログラムが記憶されている。なお、これらのプログラムは、例えばＣＤ−ＲＯＭに記憶されたものがＣＤ−ＲＯＭドライブ（図示せず）を介してインストールされ、プログラム記憶エリア１５６に記憶される。又は、適宜のネットワークを介してシステム外部からダウンロードされたプログラムが記憶されてもよい。 The program storage area 156 stores various programs for causing the DB server 10 to execute various processes such as a system program and a communication program. For example, those programs stored in a CD-ROM are installed via a CD-ROM drive (not shown) and stored in the program storage area 156. Alternatively, a program downloaded from outside the system via an appropriate network may be stored.

（Ｄ）ゲインの調節の流れ
以上のような構成の本実施形態の最大の特徴は、マイク２０７を介し取得された雑音情報に基づき距離検出用の疑似雑音を生成しスピーカ２０８を介し出力することと、マイク２０７を介して取得した上記疑似雑音の反射音情報に基づき来訪者Ｍまでの距離を検出することと、その距離検出結果に基づきマイク２０７のゲインを調整することである。以下、その詳細を順を追って説明する。 (D) Flow of gain adjustment The greatest feature of the present embodiment configured as described above is that pseudo noise for distance detection is generated based on noise information acquired through the microphone 207 and output through the speaker 208. And detecting the distance to the visitor M based on the reflected sound information of the pseudo noise acquired via the microphone 207 and adjusting the gain of the microphone 207 based on the distance detection result. Hereinafter, the details will be described in order.

図６は、スピーカ２０８より疑似雑音を出力するまでの手順の概要を説明した説明図である。 FIG. 6 is an explanatory diagram for explaining an outline of the procedure until the pseudo noise is output from the speaker 208.

図６（ａ）には、マイク２０７に入力された雑音より、疑似雑音情報を生成する手順を模式的に示している。図６（ａ）に示すように、受付端末２０の周囲で、疑似雑音の生成元となるソース音としての雑音（周囲音。この例では、会社内の所定の場所に設置されたドア３０が閉まる音）が発生すると、この雑音が伝搬してマイク２０７に入力される。これにより、入力した雑音に対応する振幅（あるいは周波数でもよい。以下同様）を含むソース音情報としての雑音情報（周囲音情報）が取得され、この雑音情報に基づき、疑似雑音情報が生成される。なお、このとき、疑似雑音情報のサンプリング周波数が１６［ｋＨｚ］以上となるように生成してもよい。その場合、市販の音響モデルとの互換性を確保できるので、より利便性・応用性が高くなる。 FIG. 6A schematically shows a procedure for generating pseudo-noise information from noise input to the microphone 207. As shown in FIG. 6A, noise (ambient sound as a source sound that is a source of pseudo noise around the reception terminal 20 (in this example, a door 30 installed at a predetermined location in the company) This noise propagates and is input to the microphone 207. Thereby, noise information (ambient sound information) as source sound information including the amplitude (or frequency, which may be the same) corresponding to the input noise is acquired, and pseudo noise information is generated based on the noise information. . At this time, the pseudo-noise information may be generated so that the sampling frequency is 16 [kHz] or more. In that case, compatibility with a commercially available acoustic model can be ensured, so that convenience and applicability are further increased.

図６（ｂ）には、スピーカ２０８より疑似雑音が出力された状態を模式的に示している。図６（ｂ）に示すように、上記図６（ａ）のようにして疑似雑音情報が生成されると、この疑似雑音情報に基づき、距離検出用の疑似雑音がスピーカ２０８より出力される。なお、疑似雑音は、当該疑似雑音の生成元となった雑音がマイク２０７に入力されてから所定時間（例えば、１［ｍｓｅｃ］）以内にスピーカ２０８より出力される。 FIG. 6B schematically shows a state in which pseudo noise is output from the speaker 208. As shown in FIG. 6B, when pseudo noise information is generated as shown in FIG. 6A, pseudo noise for distance detection is output from the speaker 208 based on the pseudo noise information. The pseudo noise is output from the speaker 208 within a predetermined time (for example, 1 [msec]) after the noise that is the generation source of the pseudo noise is input to the microphone 207.

また、スピーカ２０８より疑似雑音が出力されるのとほぼ同時に、タイマ２０９（図４参照）が起動される。これにより、スピーカ２０８より疑似雑音が出力されてから、この疑似雑音が対象物に反射し、その反射音がマイク２０７に入力されるまでの所要時間（以下、単に「所要時間」という）の測定（計測）が開始される。本実施形態では、上記疑似雑音の反射音の反射元となった対象物が、来訪者Ｍであるとみなし（推測し）、後述のようにして来訪者Ｍまでの距離を検出する。 Also, the timer 209 (see FIG. 4) is started almost simultaneously with the pseudo-noise output from the speaker 208. As a result, the time required for the pseudo-noise to be reflected from the object after the pseudo-noise is output from the speaker 208 and the reflected sound to be input to the microphone 207 (hereinafter simply referred to as “required time”) is measured. (Measurement) is started. In the present embodiment, the object that is the reflection source of the reflected sound of the pseudo noise is considered to be the visitor M (estimation), and the distance to the visitor M is detected as described later.

図７は、来訪者Ｍまでの距離を検出する手法の概要、及び、検出した距離に応じたゲイン制御の内容を説明した図である。 FIG. 7 is a diagram for explaining the outline of the method for detecting the distance to the visitor M and the content of the gain control according to the detected distance.

図７において、上記図６（ｂ）のようにしてスピーカ２０８より出力された疑似雑音は、所定の距離範囲（伝搬可能な距離範囲。パワーによって異なる）に伝搬される。このとき、当該範囲内に来訪者Ｍが存在すると、上記疑似雑音は、来訪者Ｍにより反射し、その反射音が伝搬してマイク２０７に入力される。これにより、対応する反射音情報が取得される。そして、マイク２０７に反射音が入力されると、タイマ２０９によって行われていた上記所要時間の測定が終了する。すなわち、このときのタイマ２０９の測定値が上記所要時間となる。 In FIG. 7, the pseudo noise output from the speaker 208 as shown in FIG. 6B is propagated to a predetermined distance range (a distance range in which propagation is possible, depending on power). At this time, if the visitor M exists within the range, the pseudo noise is reflected by the visitor M, and the reflected sound propagates and is input to the microphone 207. Thereby, the corresponding reflected sound information is acquired. When the reflected sound is input to the microphone 207, the measurement of the required time performed by the timer 209 ends. That is, the measured value of the timer 209 at this time is the required time.

また、この例では、上記タイマ２０９による計時を開始してから予め定められた所定の最小音波受音時間を経過するまでは上記反射音情報の取得は開始されないようになっている。上記最小音波受音時間とは、スピーカ２０８より出力された疑似雑音が、来訪者Ｍに反射することなく、直接マイク２０７に入力されるまで（＝いわゆる疑似雑音のスピーカ２０８からマイク２０７への周り込み）の所要時間である。例えば、スピーカ２０８とマイク２０７との間の距離が３０［ｃｍ］であるとすると、最小音波受音時間は１．７３［ｍｓｅｃ］となる。タイマ２０９の測定時間が最小音波受音時間を経過するまで、マイク２０７には、疑似雑音の反射音は入力されない。したがって、最小音波受音時間が経過するまで反射音情報の取得を開始せずに待つことで、マイク２０７に入力する不要な音声（上記周り込みした疑似雑音）を除外することができる。 Further, in this example, the acquisition of the reflected sound information is not started until a predetermined minimum sound wave receiving time elapses after the time measurement by the timer 209 is started. The minimum sound wave reception time is the time until the pseudo noise output from the speaker 208 is directly input to the microphone 207 without being reflected by the visitor M (= the so-called pseudo noise around the speaker 207 from the speaker 208) Time). For example, if the distance between the speaker 208 and the microphone 207 is 30 [cm], the minimum sound wave receiving time is 1.73 [msec]. The reflected sound of pseudo noise is not input to the microphone 207 until the measurement time of the timer 209 has passed the minimum sound wave reception time. Therefore, by waiting without starting the acquisition of the reflected sound information until the minimum sound wave reception time elapses, unnecessary sound (the above-described surrounding pseudo noise) input to the microphone 207 can be excluded.

さらに、この例では、上記タイマ２０９による計時を開始してから予め定められた所定の最大音波受音時間を経過すると、反射音情報の取得は終了され、再び雑音情報の取得が開始されるようになっている。上記最大音波受音時間とは、スピーカ２０８より出力された疑似雑音が、所定の距離（例えば、受付端末２０による受付処理が行われる可能性がある最大距離）より反射され、その反射音がマイク２０７に入力されるまでの所要時間である。例えば、上記最大距離を１００［ｃｍ］とすると、最大音波受音時間は５．７７［ｍｓｅｃ］となる。この最大音波受音時間を経過した後、マイク２０７に入力された疑似雑音の反射音は、上記所定の距離より遠い（受付をするには遠い）位置に存在する対象物（来訪者Ｍとは限らない）により反射されたものである。タイマ２０９の測定時間が最大音波受音時間を経過すると、反射音情報の取得を終了とすることで、不要な反射音を除外することができる。 Furthermore, in this example, when a predetermined maximum sound wave reception time elapses after the timer 209 starts measuring, the acquisition of the reflected sound information is terminated and the acquisition of noise information is started again. It has become. The maximum sound wave reception time is the reflection of pseudo noise output from the speaker 208 from a predetermined distance (for example, the maximum distance at which reception processing by the reception terminal 20 can be performed), and the reflected sound is reflected from the microphone. This is the time required until it is input to 207. For example, when the maximum distance is 100 [cm], the maximum sound wave receiving time is 5.77 [msec]. After the maximum sound wave receiving time has elapsed, the reflected sound of the pseudo noise input to the microphone 207 is an object (a visitor M) that exists at a position farther than the predetermined distance (farther to accept). (Not limited). When the measurement time of the timer 209 exceeds the maximum sound wave receiving time, unnecessary reflected sound can be excluded by terminating the acquisition of reflected sound information.

また、上記疑似雑音及びその反射音は、共に音波であるので受付端末２０と来訪者Ｍとの間を音速で伝搬している。また、上記所要時間は、上記疑似雑音及びその反射音、すなわち音波が、受付端末２０と来訪者Ｍとの間を往復する往復伝搬時間である（詳細にはスピーカ２０８→来訪者Ｍ間の疑似雑音の伝搬時間と、来訪者Ｍ→マイク２０７間の反射音の伝搬時間との合計時間）。すなわち、音速と、上記所要時間の半分（＝片道の伝搬時間に相当）との積の値が、受付端末２０から来訪者Ｍまでの距離となる。このようなことから、上記の（式１）（図１参照）を解くことによって、受付端末２０から来訪者Ｍまでの距離を検出（算出）することができるのである。 Further, since the pseudo noise and the reflected sound are both sound waves, they propagate between the reception terminal 20 and the visitor M at the speed of sound. The required time is a round-trip propagation time in which the pseudo noise and its reflected sound, that is, a sound wave, reciprocate between the reception terminal 20 and the visitor M (more specifically, a pseudo-range between the speaker 208 and the visitor M). The total time of the propagation time of the noise and the propagation time of the reflected sound between the visitor M and the microphone 207). That is, the product of the sound speed and half of the required time (= corresponding to the one-way propagation time) is the distance from the reception terminal 20 to the visitor M. For this reason, the distance from the reception terminal 20 to the visitor M can be detected (calculated) by solving the above (Formula 1) (see FIG. 1).

例えば、音速を３４６．５［ｍ／ｓ］とし、タイマ２０９の測定値（＝上記所要時間）を２．０［ｍｓｅｃ］とすると、来訪者Ｍまでの距離Ｌは、
Ｌ＝３４６．５×２．０×１０^−３／２＝３４６．５×１０^−３［ｍ］≒３５［ｃｍ］
となる。 For example, if the sound speed is 346.5 [m / s] and the measured value of the timer 209 (= the above required time) is 2.0 [msec], the distance L to the visitor M is
L = 346.5 × 2.0 × 10 ⁻³ /2=346.5×10 ⁻³ [m] ≈35 [cm]
It becomes.

そして、図７の下段に示すように、上記の方法により検出された受付端末２０から来訪者Ｍまでの距離Ｌが所定値Ｌ０（例えば８０［ｃｍ］）よりも大きかった（Ｌ＞Ｌ０）場合には、マイク２０７のゲイン（＝上記ゲイン可変アンプ２１７での増幅度）を、所定値以上に（この例では固定的に）設定するようになっている。 Then, as shown in the lower part of FIG. 7, when the distance L from the reception terminal 20 to the visitor M detected by the above method is larger than a predetermined value L0 (for example, 80 [cm]) (L> L0) In this case, the gain of the microphone 207 (= the degree of amplification in the gain variable amplifier 217) is set to a predetermined value or more (in this example, fixed).

一方、上記検出した距離Ｌが所定値Ｌ０以下であった場合のゲイン制御の内容を図８を用いて説明する。この図８は上記図７に対応する図である。 On the other hand, the content of gain control when the detected distance L is equal to or less than the predetermined value L0 will be described with reference to FIG. FIG. 8 corresponds to FIG.

図８において、この例では、来訪者Ｍが、上記図７の状態に比べて少し受付端末２０に近づき、これによって受付端末２０から来訪者Ｍまでの距離Ｌが上記所定値Ｌ０以下となっている。そして、このＬ≦Ｌ０が検出されたことに応じて、マイク２０７のゲインの調整が開始され、検出された距離に応じて上記ゲインの値が制御されるようになっている。 In FIG. 8, in this example, the visitor M is slightly closer to the reception terminal 20 than in the state of FIG. 7, whereby the distance L from the reception terminal 20 to the visitor M becomes the predetermined value L0 or less. Yes. Then, in response to the detection of L ≦ L0, the adjustment of the gain of the microphone 207 is started, and the gain value is controlled in accordance with the detected distance.

本実施形態の受付端末２０では、例えば前述のようにしてマイク２０７のゲインの調整が開始されるのとほぼ同じタイミングで（すなわち上記Ｌ≦Ｌ０となったときに）受付処理が開始される（あるいは、所定の開始操作が行われることにより受付処理が開始されるようにしてもよい。後述の（１）及び（２）の変形例参照）。受付処理は、表示部２１０に所定の表示画面を表示しつつ、対話方式により行われる。 In the reception terminal 20 of the present embodiment, for example, the reception process is started at almost the same timing as the adjustment of the gain of the microphone 207 is started as described above (that is, when L ≦ L0 is satisfied) ( Alternatively, the reception process may be started by performing a predetermined start operation (refer to modifications (1) and (2) described later). The reception process is performed in an interactive manner while displaying a predetermined display screen on the display unit 210.

すなわち、受付処理では、スピーカ２０８より所定の案内音声（例えば、「いらっしゃいませ。どちら様でしょうか」等）が出力され、さらにこれに併せて表示部２１０に所定の表示画面が表示される（例えば、前述した図３参照）。来訪者Ｍがこれら案内音声や表示に応じて、受付端末２０に対して発話すると、対応する来訪者Ｍの発話音声がマイク２０７によって入力される。 That is, in the reception process, a predetermined guidance voice (for example, “Welcome. How is it?”) Is output from the speaker 208, and a predetermined display screen is displayed on the display unit 210 in conjunction with this (for example, , See FIG. 3 described above). When the visitor M speaks to the reception terminal 20 in accordance with the guidance voice and display, the voice of the corresponding visitor M is input by the microphone 207.

このとき、マイク２０７に入力された来訪者Ｍの発話音声の情報は、上記ゲイン可変アンプ２１７で、上記検出された距離に基づいて調整されたゲイン（増幅度）に応じて増幅される。そして、増幅された音声情報は、端末本体２０Ａ（図２等参照）に入力され、上記来訪者辞書記憶エリア２５４に記憶された来訪者辞書を用いて、音声認識が行われる（詳細は図１１で後述する）。また、来訪者Ｍが、受付端末２０を操作（発話による操作）している期間も、来訪者Ｍ（すなわち受付端末２０を操作中の来訪者Ｍ）の距離検出（図６〜図８参照）は引き続き行われ、その検出距離の結果に基づき上記ゲインの調整も随時行われている。 At this time, information on the speech voice of the visitor M input to the microphone 207 is amplified by the variable gain amplifier 217 according to the gain (amplification degree) adjusted based on the detected distance. The amplified voice information is input to the terminal body 20A (see FIG. 2 and the like), and voice recognition is performed using the visitor dictionary stored in the visitor dictionary storage area 254 (for details, see FIG. 11). Will be described later). Further, the distance detection of the visitor M (that is, the visitor M who is operating the reception terminal 20) is also performed during the period in which the visitor M operates the reception terminal 20 (operation by speech) (see FIGS. 6 to 8). The gain is adjusted as needed based on the result of the detection distance.

図９（ａ）及び図９（ｂ）は、受付端末２０による受付処理終了後の状態を模式的に表した図である。受付端末２０による受付処理が終了した直後では、来訪者Ｍは、まだ対話操作を行っていた受付端末２０の近傍に存在している（図９（ａ）に示した状態）。そして、受付処理が終了した後、所定期間（例えば１０秒）が経過すると、受付端末２０の近傍に存在した来訪者Ｍは、受付処理での応対結果（例えば担当者の指示により待合室で待機するよう応対される等）に応じて、受付端末２０から離れ、別の場所（例えば、待合室や担当者のいる居室等）に移動し、受付端末２０の近傍には誰もいない状態となる（図９（ｂ）に示した状態）。 FIGS. 9A and 9B are diagrams schematically illustrating a state after the reception process by the reception terminal 20 is completed. Immediately after the reception process by the reception terminal 20 is completed, the visitor M is present in the vicinity of the reception terminal 20 that is still performing the dialogue operation (the state shown in FIG. 9A). When a predetermined period (for example, 10 seconds) elapses after the reception process ends, the visitor M present in the vicinity of the reception terminal 20 waits in the waiting room according to the reception result in the reception process (for example, instructed by the person in charge). The reception terminal 20 moves away to another place (for example, a waiting room or a room where a person in charge is present), and no one is in the vicinity of the reception terminal 20 (see FIG. 9 (b)).

本実施形態では、上記図９（ｂ）に示した状態、すなわち、受付処理が終了した後、所定期間が経過したら、上記マイク２０７のゲインを、現在設定されているゲイン（受付処理終了後、所定期間が経過する直前のゲイン）にこの例では固定的に設定し、上記受付処理中に行われていたゲインの調整を終了させる。そして、このようにゲインが固定された状態で、上記図６（ａ）の状態に戻り、次の来訪者Ｍが来るのを待つようになっている。 In the present embodiment, the state shown in FIG. 9B, that is, when a predetermined period has elapsed after the reception process is completed, the gain of the microphone 207 is set to the currently set gain (after the reception process is completed, In this example, the gain is fixedly set to (gain immediately before a predetermined period elapses), and the gain adjustment performed during the reception process is terminated. Then, with the gain fixed in this manner, the process returns to the state shown in FIG. 6A and waits for the next visitor M to come.

（Ｅ）制御手順
図１０は、以上説明した内容を実現するために、受付端末２０の制御回路部２００により実行する制御手順を表すフローチャートである。なお、このフローに示す処理は、ＨＤＤ２０５のプログラム記憶エリア２５６に記憶された来訪者受付処理用のプログラム群（前述のシステムプログラム、描画プログラム、音声認識プログラム、対話制御プログラム、距離検出プログラム、感度調整プログラム等）に従って、ＣＰＵ２０１が実行するものである。 (E) Control Procedure FIG. 10 is a flowchart showing a control procedure executed by the control circuit unit 200 of the receiving terminal 20 in order to realize the contents described above. Note that the processing shown in this flow is a program group for visitor reception processing stored in the program storage area 256 of the HDD 205 (the aforementioned system program, drawing program, voice recognition program, dialogue control program, distance detection program, sensitivity adjustment). The program is executed by the CPU 201 in accordance with a program or the like.

図１０において、例えば受付端末２０の電源ＯＮによって、このフローが開始される（「ＳＴＡＲＴ」位置）。 In FIG. 10, for example, this flow is started when the receiving terminal 20 is turned on (“START” position).

まずステップＳ１０で、所定の初期化処理を実行する（このとき、マイク２０７のゲインの調整を開始したことを表す調整開始フラグＦｓをＦｓ＝０への初期化も併せて行う）。 First, in step S10, a predetermined initialization process is executed (at this time, the adjustment start flag Fs indicating that the adjustment of the gain of the microphone 207 is started is also initialized to Fs = 0).

そして、ステップＳ２０において、距離検出用の疑似雑音の生成元となる雑音をマイク２０７、ゲイン可変アンプ２１７、及びＩ／Ｏインタフェイス２０４を介して入力し、雑音情報を取得する（ソース音取得手段としての機能）。 In step S20, noise that is a source of generation of pseudo noise for distance detection is input via the microphone 207, the gain variable amplifier 217, and the I / O interface 204, and noise information is acquired (source sound acquisition means). As a function).

その後、ステップＳ３０で、上記ステップＳ２０で取得した雑音情報に所定の処理を行い、疑似雑音情報を生成する。 Thereafter, in step S30, a predetermined process is performed on the noise information acquired in step S20 to generate pseudo noise information.

そして、ステップＳ４０に移り、Ｉ／Ｏインタフェイス２０４を介してスピーカ２０８に上記生成した疑似雑音情報を出力し、疑似雑音を出力させる（検出音出力手段としての機能）。 In step S40, the generated pseudo noise information is output to the speaker 208 via the I / O interface 204, and the pseudo noise is output (function as detection sound output means).

その後、ステップＳ５０で、Ｉ／Ｏインタフェイス２０４を介してタイマ２０９に制御信号を出力し、タイマ２０９を起動させる。これにより、上記ステップＳ４０で出力した疑似雑音が対象物（来訪者Ｍ）に反射し、後述のステップＳ７０でその反射音がマイク２０７に入力されるまでの所要時間の測定（計時測定）が開始される。 Thereafter, in step S50, a control signal is output to the timer 209 via the I / O interface 204 to start the timer 209. Thereby, the pseudo noise output in step S40 is reflected on the object (visitor M), and measurement (time measurement) is started until the reflected sound is input to the microphone 207 in step S70 described later. Is done.

そして、ステップＳ６０に移り、タイマ２０９の測定時間に基づき、測定時間が前述の最小音波受音時間を経過したか否かを判定する。最小音波受音時間を経過するまでは判定が満たされずループ待機し、最小音波受音時間を経過したら判定が満たされて、ステップＳ７０に移る。 Then, the process proceeds to step S60, and based on the measurement time of the timer 209, it is determined whether or not the measurement time has passed the minimum sound wave reception time. The determination is not satisfied until the minimum sound wave receiving time elapses, and the loop stands by. When the minimum sound wave receiving time elapses, the determination is satisfied, and the process proceeds to step S70.

ステップＳ７０では、マイク２０７、ゲイン可変アンプ２１７、及びＩ／Ｏインタフェイス２０４を介して、上記疑似雑音の（対象物での）反射音を入力した否かを判定する。この判定は、上記疑似雑音情報とマイク２０７、ゲイン可変アンプ２１７、及びＩ／Ｏインタフェイス２０４を介して入力した音声の情報とのパワースペクトルを比較する等の公知の手法により行えば足りる。上記疑似雑音の反射音を入力していない場合には、判定が満たされずステップＳ８０に移る。 In step S <b> 70, it is determined whether or not the reflected sound (on the object) of the pseudo noise is input via the microphone 207, the gain variable amplifier 217, and the I / O interface 204. This determination may be made by a known method such as comparing the power spectrum of the pseudo-noise information with the information of the sound input via the microphone 207, the gain variable amplifier 217, and the I / O interface 204. When the reflected sound of the pseudo noise is not input, the determination is not satisfied and the routine goes to Step S80.

ステップＳ８０では、タイマ２０９の測定時間に基づき、測定時間が前述の最大音波受音時間を経過したか否かを判定する。最大音波受音時間を経過していない場合には、判定が満たされず上記ステップＳ７０に戻り、同様の手順を繰り返す。最大音波受音時間を経過した場合には、判定が満たされて、上記ステップＳ２０に戻り、同様の手順を繰り返す。 In step S80, based on the measurement time of the timer 209, it is determined whether or not the measurement time has exceeded the aforementioned maximum sound wave reception time. If the maximum sound wave receiving time has not elapsed, the determination is not satisfied and the routine returns to step S70 and the same procedure is repeated. If the maximum sound wave receiving time has elapsed, the determination is satisfied, the process returns to step S20, and the same procedure is repeated.

一方、上記ステップＳ７０において、上記疑似雑音の反射音を入力していた場合には、ステップＳ７０の判定が満たされてステップＳ９０に移る。 On the other hand, when the reflected sound of the pseudo noise is input in step S70, the determination in step S70 is satisfied and the process proceeds to step S90.

ステップＳ９０では、上記ステップＳ７０でマイク２０７、ゲイン可変アンプ２１７、及びＩ／Ｏインタフェイス２０４を介して入力された上記疑似雑音の反射音により、対応する振幅あるいは周波数を含む反射音情報を取得する。 In step S90, the reflected sound information including the corresponding amplitude or frequency is acquired from the reflected sound of the pseudo noise input via the microphone 207, the gain variable amplifier 217, and the I / O interface 204 in step S70. .

その後、ステップＳ１００で、上記ステップＳ９０で取得された反射音情報と、上記ステップＳ６０より計時開始されたタイマ２０９のこの時点での計測時間とに基づき、所定の演算処理（この例では、前述の図１や図７等で説明した上記の式１を用いる手法）を行い、上記反射音の反射元となった対象物が来訪者Ｍであると推測して、来訪者Ｍまでの距離Ｌを検出する（距離検出手段としての機能）。 After that, in step S100, based on the reflected sound information acquired in step S90 and the measurement time at this time of the timer 209 started in step S60, a predetermined calculation process (in this example, the above-described calculation process is performed). The method using the above formula 1 described with reference to FIG. 1 and FIG. 7 is performed, and it is estimated that the target object that is the reflection source of the reflected sound is the visitor M, and the distance L to the visitor M is determined. Detect (function as distance detection means).

そして、ステップＳ１１０で、上記ステップＳ１００での距離検出結果に基づき、検出した来訪者Ｍまでの距離Ｌが上記所定値Ｌ０以下であるか否かを判定する。Ｌ＞Ｌ０の場合には、判定が満たされずステップＳ１２０に移る。 In step S110, based on the distance detection result in step S100, it is determined whether or not the detected distance L to the visitor M is equal to or less than the predetermined value L0. If L> L0, the determination is not satisfied and the routine goes to Step S120.

ステップＳ１２０では、マイク２０７のゲインを、予め定められた所定値以上に固定的に設定する（設定制御手段としての機能）。例えば、マイク２０７のゲインとして制御可能な範囲中の最大値としてもよい。これにより、マイク２０７を介し入力した音声情報は、ゲイン可変アンプ２１７によって、当該設定されたゲインの値により増幅される。その後、上記ステップＳ２０に戻り、同様の手順を繰り返す。 In step S120, the gain of the microphone 207 is fixedly set to a predetermined value or more (function as a setting control unit). For example, the maximum value in a controllable range as the gain of the microphone 207 may be used. As a result, the audio information input via the microphone 207 is amplified by the gain variable amplifier 217 with the set gain value. Then, it returns to said step S20 and repeats the same procedure.

一方、上記ステップＳ１１０において、Ｌ≦Ｌ０であった場合には、判定が満たされてステップＳ１３０に移る。 On the other hand, if L ≦ L0 in step S110, the determination is satisfied, and the routine goes to step S130.

ステップＳ１３０では、上記ステップＳ１００での距離検出結果に基づき、適切な（最良の）マイク２０７のゲインを算出する。 In step S130, an appropriate (best) microphone 207 gain is calculated based on the distance detection result in step S100.

そして、ステップＳ１４０では、上記ステップＳ１３０での算出結果に基づき、マイク２０７のゲインを調整して設定する（感度調整手段としての機能）。これにより、マイク２０７を介し入力した音声情報は、ゲイン可変アンプ２１７によって、当該設定されたゲインにより増幅される。 In step S140, the gain of the microphone 207 is adjusted and set based on the calculation result in step S130 (function as sensitivity adjustment means). As a result, the audio information input via the microphone 207 is amplified by the gain variable amplifier 217 with the set gain.

その後、ステップＳ１５０で、上記調整開始フラグＦｓを、ゲインの調整開始を表すＦｓ＝１とし、ステップＳ１６０に移る。 Thereafter, in step S150, the adjustment start flag Fs is set to Fs = 1 indicating gain adjustment start, and the process proceeds to step S160.

ステップＳ１６０では、受付処理が終了したことを表す受付終了フラグＦｔがＦｔ＝１であるか否かを判定する。Ｆｔ＝０のままである場合（＝受付処理が終了していない場合）は、判定が満たされず上記ステップＳ２０に戻り、同様の手順を繰り返す。そして、Ｆｔ＝１になったら（＝受付処理が終了したら）、判定が満たされてステップＳ１８０に移る。 In step S160, it is determined whether or not a reception end flag Ft indicating that the reception process has ended is Ft = 1. If Ft = 0 remains (= the reception process has not ended), the determination is not satisfied and the routine returns to step S20 and the same procedure is repeated. When Ft = 1 (= when the acceptance process is completed), the determination is satisfied and the routine goes to Step S180.

ステップＳ１８０では、上記ステップＳ１４０でゲインが調整されたマイク２０７を用いて受付処理が終了した後、言い換えれば、後述の図１１のステップＳ３６０で上記受付終了フラグＦｔが受付処理の終了を表すＦｔ＝１になった後、所定期間が経過したか否かを判定する。Ｆｔ＝１になった後、所定期間が経過するまでは判定が満たされずループ待機し、所定期間が経過したら判定が満たされて、ステップＳ１９０に移る。 In step S180, after the reception process is completed using the microphone 207 whose gain has been adjusted in step S140, in other words, in step S360 of FIG. 11 described later, the reception end flag Ft represents the end of the reception process Ft = After becoming 1, it is determined whether or not a predetermined period has elapsed. After Ft = 1, the determination is not satisfied until the predetermined period elapses, and the loop waits. When the predetermined period elapses, the determination is satisfied, and the process proceeds to step S190.

ステップＳ１９０では、マイク２０７のゲインを、直前の上記ステップＳ１４０で設定したゲインに（つまりこの時点でのゲインの値に）固定し、上記ステップＳ１４０での上記ゲインの調整を終了させる（終了制御手段としての機能）。その後、ステップＳ１９５に移る。 In step S190, the gain of the microphone 207 is fixed to the gain set in the previous step S140 (that is, the gain value at this time), and the adjustment of the gain in the step S140 is ended (end control means). As a function). Thereafter, the process proceeds to step S195.

ステップＳ１９５では、上記調整開始フラグＦｓをＦｓ＝０とする。そして、このフローを終了する。なお、このフローは、例えば受付端末２０の電源がＯＮの間、あるいは所定の終了操作がされるまでの間は、所定の時間間隔（例えば２秒間隔）で繰り返し継続して実行される。 In step S195, the adjustment start flag Fs is set to Fs = 0. Then, this flow ends. This flow is repeatedly executed at a predetermined time interval (for example, every 2 seconds) while the receiving terminal 20 is powered on or until a predetermined end operation is performed, for example.

なお、以上において、ステップＳ７０及びステップＳ９０が、各請求項記載の反射音取得手段として機能し、ステップＳ１１０が、開始制御手段として機能する。 In the above, step S70 and step S90 function as reflected sound acquisition means described in each claim, and step S110 functions as start control means.

図１１は、上記図１０のフローと並行して、受付端末２０の制御回路部２００により実行する制御手順を表すフローチャートである。上記図１０のフローが来訪者Ｍまでの距離に基づくマイク２０７のゲインの調整に関するものであるのに対し、この図１１のフローは、受付処理に関するものである。なお、これら図１０及び図１１の２つのフローは、例えばコンピュータのＯＳ等でしばしば行われる「マルチタスク処理」と同様の公知の手法により、前述の来訪者受付処理用のプログラム群に従って、ＣＰＵ２０１によって同時並行処理されるようになっている。 FIG. 11 is a flowchart showing a control procedure executed by the control circuit unit 200 of the reception terminal 20 in parallel with the flow of FIG. The flow in FIG. 10 relates to the adjustment of the gain of the microphone 207 based on the distance to the visitor M, whereas the flow in FIG. 11 relates to the reception process. Note that these two flows in FIGS. 10 and 11 are performed by the CPU 201 in accordance with the above-described program for visitor reception processing by a known method similar to “multitask processing” often performed by a computer OS or the like, for example. It is designed to be processed in parallel.

図１１において、例えば受付端末２０の電源ＯＮによって、このフローが開始される（「ＳＴＡＲＴ」位置）。 In FIG. 11, for example, this flow is started by turning on the power of the receiving terminal 20 (“START” position).

まずステップＳ２００で、上記受付終了フラグＦｔをＦｔ＝０に初期化する。 First, in step S200, the acceptance end flag Ft is initialized to Ft = 0.

その後、ステップＳ２１０で、上記調整開始フラグＦｓがゲインの調整開始を表すＦｓ＝１であるか否かを判定する。Ｆｓ＝０のままである場合（＝ゲインの調整が開始されていない場合）には、判定が満たされずループ待機する。Ｆｓ＝１になったら（＝ゲインの調整が開始されたら）、判定が満たされてステップＳ２２０に移る。 Thereafter, in step S210, it is determined whether or not the adjustment start flag Fs is Fs = 1 indicating the start of gain adjustment. If Fs = 0 remains (= gain adjustment has not been started), the determination is not satisfied and the loop waits. When Fs = 1 (= when gain adjustment is started), the determination is satisfied, and the routine goes to Step S220.

ステップＳ２２０では、来訪者Ｍの発話音声を認識するために、ＤＢサーバ１０の来訪者予約データベース１５１０を参照しつつＨＤＤ２０５の来訪者辞書記憶エリア２５４に記憶された来訪者辞書を取得し、辞書更新を行う。すなわち、前述したように、受付処理が開始された時刻、言い換えれば、上記ステップＳ２１０の判定が満たされた時刻を基準とし、来訪者予約データベース１５１０の全予約データのうちその前後所定時間以内（例えば１時間以内等）を訪問予定日時とする予約データに基づいて作成された辞書を取得する。 In step S220, the visitor dictionary stored in the visitor dictionary storage area 254 of the HDD 205 is acquired while referring to the visitor reservation database 1510 of the DB server 10 in order to recognize the speech voice of the visitor M, and the dictionary is updated. I do. That is, as described above, with reference to the time at which the reception process is started, in other words, the time at which the determination in step S210 is satisfied, within a predetermined time before and after the reservation data in the visitor reservation database 1510 (for example, A dictionary created on the basis of the reservation data with a scheduled visit date and time within one hour or the like) is acquired.

そして、ステップＳ２３０に移り、Ｉ／Ｏインタフェイス２０４を介してスピーカ２０８へ音声信号を出力し、”いらっしゃいませ。どちら様でしょうか。（マイクに向かってお名前を入力してください）”という来訪者氏名を問いかける台詞を含む案内音声を出力させる。なお、このとき、表示部２１０に台詞と同様の内容のテキストを含む表示画面を表示させてもよい（後述のステップＳ２６０、ステップＳ３１０、ステップＳ３３０、及びステップＳ３４０についても同様）。 Then, the process proceeds to step S230, where an audio signal is output to the speaker 208 via the I / O interface 204, and a visit is made, "Welcome. Please enter your name into the microphone." The guidance voice including the dialogue which asks the person's name is output. At this time, a display screen including text having the same content as the dialogue may be displayed on the display unit 210 (the same applies to Step S260, Step S310, Step S330, and Step S340 described later).

その後、ステップＳ２４０で、この問いかけに対応して発話した来訪者Ｍの発話音声の情報をマイク２０７、ゲイン可変アンプ２１７、及びＩ／Ｏインタフェイス２０４を介して入力し（このときのゲイン可変アンプ２１７での増幅は、上記図１０のステップＳ１４０で調整されたゲインにより実行される）、上記ステップＳ２２０で取得した来訪者辞書を用いて音声認識を行う。 Thereafter, in step S240, information on the voice of the visitor M who has spoken in response to this inquiry is input via the microphone 207, the gain variable amplifier 217, and the I / O interface 204 (the gain variable amplifier at this time). (Amplification in 217 is executed with the gain adjusted in step S140 in FIG. 10), and voice recognition is performed using the visitor dictionary acquired in step S220.

そして、ステップＳ２５０に移り、上記入力した来訪者Ｍの発話音声情報が音声認識できたか否かを判定する。言語として音声認識できなかった場合にはステップＳ２５０の判定が満たされず、”音声を認識できませんでした。もう一度マイクに向かってお名前を入力してください”という音声認識ができなかったことを来訪者Ｍに通知する台詞を含む案内音声をスピーカ２０８に出力させ、上記ステップＳ２４０に戻り、同様の手順を繰り返す。なお、この図１１では図示を省略しているが、上記案内音声の出力は、予め定められた設定回数だけ行われ、その間に音声認識できない場合には、対応する処理（例えば部署代表やその他の受付担当者等に取り次ぐ等）を行う（後述のステップＳ２８０、ステップＳ３００、ステップＳ３３０、及びステップＳ３４０についても同様）。一方、言語として音声認識できた場合には、ステップＳ２５０の判定が満たされステップＳ２６０に移る。 Then, the process proceeds to step S250, and it is determined whether or not the input speech voice information of the visitor M has been recognized. If the speech cannot be recognized as a language, the determination in step S250 is not satisfied, and the visitor has not been able to recognize the speech, "The speech could not be recognized. Enter your name again into the microphone." The guidance voice including the dialogue to be notified to M is output to the speaker 208, the process returns to step S240, and the same procedure is repeated. Although not shown in FIG. 11, the guidance voice is output a predetermined number of times, and if voice recognition cannot be performed during that time, a corresponding process (for example, a department representative or other Etc.) (the same applies to step S280, step S300, step S330, and step S340 described later). On the other hand, if the speech can be recognized as the language, the determination in step S250 is satisfied, and the process proceeds to step S260.

ステップＳ２６０では、Ｉ／Ｏインタフェイス２０４を介してスピーカ２０８へ音声信号を出力し、”マイクに向かって担当者名を入力してください”という担当者名を問いかける台詞を含む案内音声を出力させる。 In step S260, a voice signal is output to the speaker 208 via the I / O interface 204, and a guidance voice including a dialogue that asks the name of the person in charge "Please input the person in charge name toward the microphone" is output. .

その後、ステップＳ２７０で、この問いかけに対応して発話した来訪者Ｍの発話音声情報をマイク２０７、ゲイン可変アンプ２１７、及びＩ／Ｏインタフェイス２０４を介して入力し（上記同様、このときのゲイン可変アンプ２１７での増幅は、上記図１０のステップＳ１４０で調整されたゲインにより実行される）、上記ステップＳ２４０と同様の方法により音声認識を行う。 Thereafter, in step S270, the speech voice information of the visitor M who has spoken in response to this question is input via the microphone 207, the gain variable amplifier 217, and the I / O interface 204 (same as above, the gain at this time). The amplification by the variable amplifier 217 is executed by the gain adjusted in step S140 of FIG. 10), and voice recognition is performed by the same method as in step S240.

そして、ステップＳ２８０に移り、上記入力した来訪者Ｍの発話音声情報が音声認識できたか否かを判定する。言語として音声認識できなかった場合にはステップＳ２５０の判定が満たされず、”音声を認識できませんでした。もう一度マイクに向かって担当者名を入力してください”という音声認識ができなかったことを来訪者Ｍに通知する台詞を含む案内音声をスピーカ２０８に出力させ、上記ステップＳ２７０に戻り、同様の手順を繰り返す。言語として音声認識できた場合には、判定が満たされステップＳ２９０に移る。 Then, the process proceeds to step S280, where it is determined whether or not the input speech voice information of the visitor M has been recognized. If speech recognition could not be performed as a language, the determination in step S250 was not satisfied, and it was visited that speech recognition could not be performed such as “Sound could not be recognized. The guidance voice including the dialogue to be notified to the person M is output to the speaker 208, the process returns to step S270, and the same procedure is repeated. If the speech can be recognized as the language, the determination is satisfied and the routine goes to Step S290.

ステップＳ２９０では、ＤＢサーバ１０の来訪者予約データベース１５１０にアクセスし、上記ステップＳ２４０及びステップＳ２７０において音声情報の音声認識により取得した来訪者名及び担当者名が、来訪者予約データベース１５１０のいずれかの予約データの「来訪者名」「担当者名」と一致するか否かを照合する。なお、完全な一致ではなく、ある類似幅、許容幅を持たせた範囲内で適合するかどうかを照合するようにしてもよい。 In step S290, the visitor reservation database 1510 of the DB server 10 is accessed, and the visitor name and the person-in-charge name obtained by the voice recognition of the voice information in the above step S240 and step S270 are either one of the visitor reservation database 1510. It is checked whether or not it matches the “visitor name” and “person in charge name” of the reservation data. In addition, it is possible to collate whether or not the matching is within a range having a certain similarity width and an allowable width instead of a perfect match.

その後、ステップＳ３００では、上記ステップＳ２９０での照合結果が一致したか（適合したか）否かを判定する。ステップＳ２９０での照合結果が一致しなかった（該当する来訪者名及び担当者名の予約データが存在しなかった）場合には判定が満たされず、”予約がされていませんでした。もう一度マイクに向かってお名前を入力してください”という予約データが存在しなかったことを来訪者Ｍに通知する台詞を含む案内音声をスピーカ２０８に出力させ、上記ステップＳ２４０に戻り、同様の手順を繰り返す。ステップＳ２９０での照合結果が一致した（該当する来訪者名及び担当者名の予約データが存在した）場合には、判定が満たされステップＳ３１０に移る。 Thereafter, in step S300, it is determined whether or not the collation result in step S290 matches (matches). If the collation result in step S290 does not match (there is no reservation data for the corresponding visitor name and person in charge), the determination is not satisfied, and “Reservation has not been made. The guidance voice including the dialogue notifying the visitor M that the reservation data “please input your name” does not exist is output to the speaker 208, the process returns to step S240, and the same procedure is repeated. If the collation results in step S290 match (there are reservation data for the corresponding visitor name and person in charge), the determination is satisfied and the routine goes to step S310.

ステップＳ３１０では、Ｉ／Ｏインタフェイス２０４を介してスピーカ２０８へ音声信号を出力し、予約内容とともに、”この内容でよろしいでしょうか。（よろしければ、「はい」を、間違っていたら、「いいえ」をマイクに向かって入力してください）”という最終確認を問いかける台詞を含む案内音声を出力させる。 In step S310, an audio signal is output to the speaker 208 via the I / O interface 204, along with the reservation content, “Are you sure you want this content? (Yes” if you want, “No” if wrong). To the microphone) and output a guidance voice that includes a dialogue asking for final confirmation.

そして、ステップＳ３２０に移り、この問いかけに対応して発話した来訪者Ｍの音声情報をマイク２０７、ゲイン可変アンプ２１７、及びＩ／Ｏインタフェイス２０４を介して入力し（上記同様、このときのゲイン可変アンプ２１７での増幅は、上記図１０のステップＳ１４０で調整されたゲインにより実行される）、上記ステップＳ２４０と同様の方法により音声認識を行う。 Then, the process proceeds to step S320, where the voice information of the visitor M who has spoken in response to this question is input via the microphone 207, the gain variable amplifier 217, and the I / O interface 204 (same as above, the gain at this time). The amplification by the variable amplifier 217 is executed by the gain adjusted in step S140 of FIG. 10), and voice recognition is performed by the same method as in step S240.

その後、ステップＳ３３０で、上記入力した来訪者Ｍからの音声情報が音声認識できたか否かを判定する。言語として音声認識できなかった場合にはステップＳ３３０の判定が満たされず、”音声を認識できませんでした。もう一度マイクに向かって「はい」、「いいえ」を入力してください”という音声認識ができなかったことを来訪者Ｍに通知する台詞を含む案内音声をスピーカ２０８に出力させ、上記ステップＳ３２０に戻り、同様の手順を繰り返す。言語として音声認識できた場合には、判定が満たされステップＳ３４０に移る。 Thereafter, in step S330, it is determined whether or not the input voice information from the visitor M has been recognized. If the voice cannot be recognized as the language, the judgment in step S330 is not satisfied, and the voice recognition "Could not recognize voice. Enter" Yes "or" No "again into the microphone cannot be recognized. The guidance voice including the dialogue for notifying the visitor M is output to the speaker 208, and the process returns to step S320, and the same procedure is repeated, and if the voice can be recognized as the language, the determination is satisfied and the process returns to step S340. Move.

ステップＳ３４０では、上記ステップＳ３２０において音声情報の音声認識により取得した情報（「はい」又は「いいえ」）が、「はい」であったか否かを判定する。「いいえ」であった場合には判定が満たされず、”操作を最初からやり直します。もう一度マイクに向かってお名前を入力してください”という操作のやり直しを来訪者Ｍに通知する台詞を含む案内音声をスピーカ２０８に出力させ、上記ステップＳ２３０に戻り、同様の手順を繰り返す。「はい」であった場合には、判定が満たされステップＳ３５０に移る。 In Step S340, it is determined whether or not the information (“Yes” or “No”) acquired by the voice recognition of the voice information in Step S320 is “Yes”. If the answer is “No”, the judgment is not satisfied, and the guidance includes a dialogue to notify the visitor M of the operation re-execution of “Retry the operation from the beginning. Enter your name again into the microphone”. The sound is output to the speaker 208, the process returns to step S230, and the same procedure is repeated. If “yes”, the determination is satisfied and the routine goes to Step S350.

ステップＳ３５０では、正当な来訪者Ｍが訪ねてきたことが確認できたことに対応して、対応する担当者のＩＰ電話機６０に発信（コール）を行う。具体的には、担当者への通知文を作成し、その通知文のテキストデータを音声データに変換し、上記予約データにより特定された担当者の電話番号を用いて、ＩＰ−ＰＢＸ５０を介し、担当者の使用するＩＰ電話機６０に、音声データを発信する。 In step S350, in response to confirming that the legitimate visitor M has visited, a call is made to the IP phone 60 of the corresponding person in charge. Specifically, a notice sentence to the person in charge is created, the text data of the notice sentence is converted into voice data, and the telephone number of the person in charge specified by the reservation data is used to transmit the notice sentence via the IP-PBX 50. Voice data is transmitted to the IP telephone 60 used by the person in charge.

そして、ステップＳ３６０で、上記受付終了フラグＦｔを受付処理の終了を表すＦｔ＝１にした後、このフローを終了する。なお、このフローは、例えば受付端末２０の電源がＯＮの間、あるいは所定の終了操作がされるまでの間は、所定の時間間隔（例えば２秒間隔）で繰り返し継続して実行される。 In step S360, the reception end flag Ft is set to Ft = 1 indicating the end of the reception process, and then this flow ends. This flow is repeatedly executed at a predetermined time interval (for example, every 2 seconds) while the receiving terminal 20 is powered on or until a predetermined end operation is performed, for example.

以上説明したように、本実施形態の受付端末２０においては、距離検出用の検出音（上記の例では疑似雑音）を用いて来訪者Ｍとの距離を検出する。すなわち、疑似雑音生成時の生成元となる雑音がマイク２０７を介して入力されると、対応する振幅を含む雑音情報を取得し（図１０のステップＳ２０参照）、これに基づく疑似雑音をスピーカ２０８を介し出力する（図１０のステップＳ４０参照）。そして、出力された疑似雑音の反射音情報に基づき、来訪者Ｍまでの距離を検出する（図１０のステップＳ１００参照）。 As described above, in the reception terminal 20 of the present embodiment, the distance to the visitor M is detected using the detection sound for distance detection (pseudo noise in the above example). That is, when noise that is a generation source at the time of generating pseudo noise is input via the microphone 207, noise information including the corresponding amplitude is acquired (see step S20 in FIG. 10), and pseudo noise based on the noise information is acquired from the speaker 208. (See step S40 in FIG. 10). Then, the distance to the visitor M is detected based on the output reflected sound information of the pseudo noise (see step S100 in FIG. 10).

そして、検出された来訪者Ｍまでの距離に基づき、距離に応じた適切な（最良の）マイク２０７のゲインを算出し、ゲイン調整を行う（図１０のステップＳ１４０参照）。すなわち、来訪者Ｍまでの距離が比較的近い場合には、対話時の来訪者Ｍの発話音声が比較的大きいレベルで入力されることから、上記ゲインを低く設定する。一方、来訪者Ｍまでの距離が比較的遠い場合には、対話時の来訪者Ｍの発話音声が比較的小さいレベルで入力されることから、上記ゲインを高く設定する。これにより、適切な信号レベルで受付処理を行うことができる。この結果、認識漏れのない、確実な受付処理を行うことができる。 Then, based on the detected distance to the visitor M, an appropriate (best) microphone 207 gain corresponding to the distance is calculated and gain adjustment is performed (see step S140 in FIG. 10). That is, when the distance to the visitor M is relatively close, the utterance voice of the visitor M at the time of dialogue is input at a relatively high level, and thus the gain is set low. On the other hand, when the distance to the visitor M is relatively long, the utterance voice of the visitor M at the time of dialogue is input at a relatively small level, and thus the gain is set high. Thereby, the reception process can be performed with an appropriate signal level. As a result, it is possible to perform reliable reception processing with no recognition failure.

以上のように、本実施形態の受付端末２０によれば、マイク２０７及びスピーカ２０８を介して入出力する音を用いて、来訪者Ｍまでの距離を検出することができる。すなわち、受付処理のためにもともと備わっているマイク２０７やスピーカ２０８を活用することで、それ以外の別途の距離検出用のセンサや専用マイク等を新たに設けることなく距離検出を行うことができ、それに基づく適切なゲインで確実な受付処理を行うことができる。 As described above, according to the reception terminal 20 of the present embodiment, the distance to the visitor M can be detected using the sound input / output via the microphone 207 and the speaker 208. That is, by utilizing the microphone 207 and the speaker 208 that are originally provided for the reception process, distance detection can be performed without newly providing a separate sensor or a dedicated microphone for distance detection, Reliable reception processing can be performed with an appropriate gain based thereon.

また、本実施形態では特に、マイク２０７で入力された雑音により、雑音情報を取得し、この取得した雑音情報に基づき生成した疑似雑音を、スピーカ２０８を介し出力する。このように、距離検出時に、マイク２０７で入力した雑音に基づいて疑似雑音を生成し利用することにより、音を用いて検出していることを来訪者Ｍに悟られることなく、距離検出を行うことができる。 In the present embodiment, in particular, noise information is acquired from noise input from the microphone 207, and pseudo noise generated based on the acquired noise information is output via the speaker 208. In this way, at the time of distance detection, by generating and using pseudo noise based on the noise input by the microphone 207, the distance is detected without the visitor M realizing that it is detected using sound. be able to.

また、受付端末２０において対話方式で操作を行うとき、来訪者Ｍは、受付端末２０に比較的近づいてから発話を行うのが一般的であり、受付端末２０から来訪者Ｍまでの距離が遠い場合には、来訪者Ｍが操作を開始する可能性は低い。そこでこれに対応し、本実施形態では特に、上記図１０のステップＳ１００で検出した来訪者Ｍまでの距離Ｌが所定値Ｌ０以下となってから、上記ゲインの調整を開始する。すなわち、検出した来訪者Ｍまでの距離が比較的近くなってから上記ゲインの調整を開始することにより、無駄な調整動作を回避し、効率的な処理を行うことができる。 In addition, when an operation is performed interactively at the reception terminal 20, the visitor M generally utters after being relatively close to the reception terminal 20, and the distance from the reception terminal 20 to the visitor M is long. In this case, the possibility that the visitor M starts the operation is low. Accordingly, in response to this, particularly in the present embodiment, the adjustment of the gain is started after the distance L to the visitor M detected in step S100 in FIG. That is, by starting the gain adjustment after the detected distance to the visitor M is relatively short, useless adjustment operation can be avoided and efficient processing can be performed.

また、受付端末２０から来訪者Ｍまでの距離が遠い場合、来訪者Ｍが対話操作を行う可能性は低いが、来訪者Ｍによっては（あるいは状況によっては）比較的遠い距離のまま、発話を行う可能性もある。このとき、受付端末２０から来訪者Ｍまでの距離が遠いため、そのままでは来訪者Ｍの発話音声が比較的小さいレベルでマイク２０７より入力されることとなる。そこでこれに対応して、本実施形態では特に、上記図１０のステップＳ１００で検出した来訪者Ｍまでの距離Ｌが所定値Ｌ０より大きい場合、マイク２０７のゲインを所定値以上に設定し（図１０のステップＳ１２０参照）、信号レベルをある程度まで増大させる。これにより、上記のような遠方からの対話操作時においても、認識漏れの可能性を低減することができる。 In addition, when the distance from the reception terminal 20 to the visitor M is far, the possibility that the visitor M performs a dialogue operation is low. However, depending on the visitor M (or depending on the situation), the utterance is kept at a relatively long distance. There is also a possibility to do. At this time, since the distance from the reception terminal 20 to the visitor M is long, the uttered voice of the visitor M is input from the microphone 207 at a relatively low level as it is. Accordingly, in the present embodiment, in particular, when the distance L to the visitor M detected in step S100 in FIG. 10 is larger than a predetermined value L0, the gain of the microphone 207 is set to a predetermined value or more (see FIG. 10 step S120), the signal level is increased to some extent. As a result, the possibility of recognition failure can be reduced even during a dialog operation from a distance as described above.

また、来訪者Ｍとの距離に応じたマイク２０７のゲイン調整後に受付処理が行われ、その受付処理が終了してしばらくたった場合には、対話していた来訪者Ｍは既に別の場所に移動し、受付端末２０近傍に誰もいない状態（図９（ｂ）の状態）になっている可能性が高い。これに対応して、本実施形態では特に、ゲインが調整されたマイク２０７を用いて受付処理が終了した後、所定期間が経過したら、上記ゲインの調整を終了する（図１０のステップＳ１９０参照）。すなわち、この場合、受付処理終了後所定期間が経過したら、直前に設定した上記ゲインに固定して、上記ゲインの調整を終了し、調整を行わないようにする。これにより、この時点でのゲインの値により次の来訪者Ｍの待ち受け状態を確実に実現することができる。 In addition, the reception process is performed after the gain of the microphone 207 is adjusted in accordance with the distance from the visitor M, and when the reception process is completed, the visitor M who has been in conversation has already moved to another location. However, there is a high possibility that there is no one in the vicinity of the reception terminal 20 (the state shown in FIG. 9B). Correspondingly, in the present embodiment, the gain adjustment is finished when a predetermined period has elapsed after the reception process is completed using the microphone 207 whose gain is adjusted (see step S190 in FIG. 10). . That is, in this case, when a predetermined period elapses after the end of the acceptance process, the gain is fixed to the gain set immediately before, the gain adjustment is terminated, and the adjustment is not performed. Thereby, the waiting state of the next visitor M can be reliably realized by the gain value at this time.

なお、本発明は、上記実施形態に限られるものではなく、その趣旨及び技術的思想を逸脱しない範囲内で種々の変形が可能である。以下、そのような変形例を順を追って説明する。 The present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the spirit and technical idea of the present invention. Hereinafter, such modifications will be described in order.

（１）ソース音として来訪者の発話音声を利用する場合
上記実施形態においては、距離検出用の検出音（上記の例では疑似雑音）の生成元となるソース音として、受付端末２０の周囲で発生した雑音を利用していたが、これに限られない。すなわち、上記ソース音として、受付処理において来訪者Ｍが発声した発話音声を利用するようにしてもよい。 (1) When a visitor's speech is used as a source sound In the above embodiment, a source sound that is a generation source of a detection sound for distance detection (pseudo noise in the above example) is used around the reception terminal 20. Although the generated noise was used, it is not limited to this. That is, as the source sound, a speech voice uttered by the visitor M in the reception process may be used.

図１２（ａ）及び図１２（ｂ）は、本変形例において、スピーカ２０８より疑似発話音声を出力するまでの手順の概要を説明した説明図である。 FIG. 12A and FIG. 12B are explanatory diagrams illustrating an outline of a procedure until the pseudo utterance voice is output from the speaker 208 in the present modification.

本変形例の受付端末２０では、上記実施形態と異なり、例えば来訪者Ｍによって図示しない操作部が適宜に操作されることにより、受付処理が開始される。そして、図１２（ａ）に示すように、受付処理中において、来訪者Ｍが受付端末２０に対して発話すると、来訪者Ｍより発声されたソース音としての発話音声が伝搬してマイク２０７に入力される。これにより、入力した来訪者Ｍの発話音声に対応する振幅（又は周波数でもよい）を含むソース音情報としての発話音声情報が取得され、この発話音声情報に基づき、疑似発話音声情報が生成される。 In the reception terminal 20 of the present modification, unlike the above embodiment, for example, the reception process is started when a visitor M appropriately operates an operation unit (not shown). Then, as shown in FIG. 12A, when the visitor M speaks to the reception terminal 20 during the reception process, the utterance voice as the source sound uttered by the visitor M is propagated to the microphone 207. Entered. Thereby, the utterance voice information as the source sound information including the amplitude (or frequency) corresponding to the utterance voice of the input visitor M is acquired, and the pseudo utterance voice information is generated based on the utterance voice information. .

そして、図１２（ｂ）に示すように、上記のようにして疑似発話音声情報が生成されると、この疑似発話音声情報に基づき、疑似発話音声（検出音）がスピーカ２０８より出力される。なお、疑似発話音声は、当該疑似発話音声の生成元となった来訪者Ｍの発話音声がマイク２０７に入力されてから所定時間（例えば、１［ｍｓｅｃ］）以内にスピーカ２０８より出力されるようになっている。その後の処理は上記実施形態の図７及び図８とほぼ同様であり、疑似発話音声の反射音が入力されることで、対応する反射音情報が取得され、来訪者Ｍまでの距離検出が行われる。そして、検出された距離に応じてマイク２０７のゲインが調整される。 Then, as shown in FIG. 12B, when the pseudo speech information is generated as described above, the pseudo speech (detected sound) is output from the speaker 208 based on the pseudo speech information. The pseudo utterance voice is output from the speaker 208 within a predetermined time (for example, 1 [msec]) after the utterance voice of the visitor M who is the generation source of the pseudo utterance voice is input to the microphone 207. It has become. The subsequent processing is almost the same as that in FIGS. 7 and 8 of the above-described embodiment. When the reflected sound of the pseudo utterance voice is input, the corresponding reflected sound information is acquired, and the distance to the visitor M is detected. Is called. Then, the gain of the microphone 207 is adjusted according to the detected distance.

図１３は、本変形例における受付端末２０の制御回路部２００により実行する制御手順を表すフローチャートであり、前述の図１０に対応する図である。図１０と同等の部分には同符号を付し適宜説明を省略する。 FIG. 13 is a flowchart showing a control procedure executed by the control circuit unit 200 of the reception terminal 20 in this modification, and corresponds to FIG. 10 described above. Portions equivalent to those in FIG. 10 are denoted by the same reference numerals and description thereof is omitted as appropriate.

図１３において、まず、上記図１０のステップＳ１０に対応したステップＳ１０′で、所定の初期化処理を実行する（フラグＦｓを用いないためＦｓ＝０の初期化がない点が図１０のステップＳ１０と異なる）。そして、新たに設けたステップＳ１５に移り、受付処理が開始されたことを表す前述の受付開始フラグＦｍがＦｍ＝１であるか否かを判定する。Ｆｍ＝０のままである場合（＝受付処理が開始されていない又は終了されている場合）は、判定が満たされずループ待機し、Ｆｍ＝１になったら（＝受付処理が開始されたら）、判定が満たされて、上記図１０のステップＳ２０に対応したステップＳ２０′に移る。 13, first, a predetermined initialization process is executed in step S10 ′ corresponding to step S10 in FIG. 10 (the fact that Fs = 0 is not initialized because the flag Fs is not used is step S10 in FIG. 10). Different). Then, the process proceeds to newly provided step S15, and it is determined whether or not the above-described reception start flag Fm indicating that the reception process has been started is Fm = 1. When Fm = 0 remains (= when the reception process has not been started or ended), the determination is not satisfied and the loop waits. When Fm = 1 (= when the reception process is started), The determination is satisfied, and the routine goes to Step S20 ′ corresponding to Step S20 in FIG.

ステップＳ２０′では、来訪者Ｍにより受付端末２０の受付処理（後述の図１４参照）において発声され、距離検出用の疑似発話音声の生成元となる発話音声をマイク２０７、ゲイン可変アンプ２１７、及びＩ／Ｏインタフェイス２０４を介して入力し、対応する振幅あるいは周波数を含む発話音声情報を取得する（ソース音取得手段としての機能）。 In step S20 ′, the utterance voice, which is uttered by the visitor M in the reception process of the reception terminal 20 (see FIG. 14 described later) and becomes the generation source of the pseudo utterance voice for distance detection, is converted into the microphone 207, the gain variable amplifier 217, and Input through the I / O interface 204, and utterance voice information including the corresponding amplitude or frequency is acquired (function as source sound acquisition means).

その後、上記図１０のステップＳ３０に対応したステップＳ３０′で、上記ステップＳ２０′で取得した発話音声情報に所定の処理を行い、疑似発話音声情報を生成する。 Thereafter, in step S30 ′ corresponding to step S30 in FIG. 10, predetermined processing is performed on the utterance voice information acquired in step S20 ′ to generate pseudo utterance voice information.

そして、上記図１０のステップＳ４０に対応したステップＳ４０′に移り、Ｉ／Ｏインタフェイス２０４を介してスピーカ２０８に上記生成した疑似発話音声情報を出力し、スピーカ２０８より疑似発話音声を出力させる（検出音出力手段としての機能）。 Then, the process proceeds to step S40 ′ corresponding to step S40 in FIG. 10, and the generated pseudo speech information is output to the speaker 208 via the I / O interface 204, and the pseudo speech is output from the speaker 208 ( Function as detection sound output means).

その後のステップＳ５０及びステップＳ６０は、前述の図１０と同様であり、タイマ２０９を起動させ、タイマ２０９の測定時間が上記最小音波受音時間を経過するまで待機し、最小音波受音時間を経過したら、上記図１０のステップＳ７０に対応したステップＳ７０′に移る。 Subsequent steps S50 and S60 are the same as in FIG. 10 described above. The timer 209 is started, the measurement time of the timer 209 waits until the minimum sound wave reception time elapses, and the minimum sound wave reception time elapses. Then, the process proceeds to step S70 ′ corresponding to step S70 in FIG.

ステップＳ７０′では、マイク２０７、ゲイン可変アンプ２１７、及びＩ／Ｏインタフェイス２０４を介して、上記疑似発話音声の（対象物での）反射音を入力した否かを判定する。この判定は、上記疑似発話音声とマイク２０７、ゲイン可変アンプ２１７、及びＩ／Ｏインタフェイス２０４を介して入力した音声の情報とのパワースペクトルを比較する等の公知の手法により行えば足りる。上記疑似発話音声の反射音を入力していない場合には、判定が満たされずステップＳ８０に移る。 In step S70 ′, it is determined whether or not the reflected sound (on the object) of the pseudo utterance voice is input via the microphone 207, the gain variable amplifier 217, and the I / O interface 204. This determination may be performed by a known method such as comparing the power spectrum of the pseudo speech and the information of the voice input via the microphone 207, the gain variable amplifier 217, and the I / O interface 204. If the reflected sound of the pseudo utterance voice is not input, the determination is not satisfied and the routine goes to Step S80.

ステップＳ８０は、前述の図１０と同様であり、タイマ２０９の測定時間が上記最大音波受音時間を経過したか否かを判定し、最大音波受音時間を経過していない場合には、上記ステップＳ７０′に戻り、最大音波受音時間を経過した場合には、上記ステップＳ２０′に戻る。 Step S80 is the same as in FIG. 10 described above, and it is determined whether or not the measurement time of the timer 209 has passed the maximum sound wave reception time. Returning to step S70 ', if the maximum sound wave receiving time has elapsed, the process returns to step S20'.

一方、上記ステップＳ７０′において、上記疑似発話音声の反射音を入力していた場合には、ステップＳ７０′の判定が満たされて、上記図１０のステップＳ９０に対応したステップＳ９０′に移る。 On the other hand, if the reflected sound of the pseudo utterance voice is input in step S70 ′, the determination in step S70 ′ is satisfied, and the process proceeds to step S90 ′ corresponding to step S90 in FIG.

ステップＳ９０′では、上記ステップＳ７０′でマイク２０７、ゲイン可変アンプ２１７、及びＩ／Ｏインタフェイス２０４を介して入力された上記疑似発話音声の反射音により、対応する反射音情報を取得する。 In step S90 ′, the corresponding reflected sound information is acquired from the reflected sound of the pseudo utterance speech input via the microphone 207, the variable gain amplifier 217, and the I / O interface 204 in step S70 ′.

その後のステップＳ１００〜ステップＳ１４０は、前述の図１０と同様である。ステップＳ１４０で、マイク２０７のゲインを調整したら、上記図１０のステップＳ１６０に対応したステップＳ１６０′に移る。 Subsequent steps S100 to S140 are the same as those in FIG. If the gain of the microphone 207 is adjusted in step S140, the process proceeds to step S160 ′ corresponding to step S160 in FIG.

ステップＳ１６０′では、上記受付開始フラグＦｍがＦｍ＝０に戻っているか否かを判定する。Ｆｍ＝１のままである場合（＝受付処理がまだ実行中である場合）は、判定が満たされず上記ステップＳ２０′に戻り、同様の手順を繰り返す。そして、Ｆｍ＝０に戻ったら（＝受付処理が終了したら）、判定が満たされて上記図１０のステップＳ１８０に対応したステップＳ１８０′に移る。 In step S160 ′, it is determined whether or not the reception start flag Fm has returned to Fm = 0. If Fm = 1 remains (= the accepting process is still being executed), the determination is not satisfied and the routine returns to step S20 ′ and the same procedure is repeated. When Fm = 0 is returned (= when the acceptance process is completed), the determination is satisfied, and the routine goes to Step S180 ′ corresponding to Step S180 in FIG.

ステップＳ１８０′では、上記ステップＳ１４０でゲインが調整されたマイク２０７を用いて受付処理が終了した後、言い換えれば、後述の図１４のステップＳ３６５で上記受付開始フラグＦｍがＦｍ＝０になった後、所定期間が経過したか否かを判定する。Ｆｍ＝０になった後、所定期間が経過するまでは判定が満たされずループ待機し、所定期間が経過したら判定が満たされて、ステップＳ１９０に移る。 In step S180 ′, after the reception process is completed using the microphone 207 whose gain has been adjusted in step S140, in other words, after the reception start flag Fm becomes Fm = 0 in step S365 of FIG. 14 described later. It is determined whether a predetermined period has elapsed. After Fm = 0, the determination is not satisfied until the predetermined period elapses, and the loop waits. When the predetermined period elapses, the determination is satisfied, and the process proceeds to step S190.

ステップＳ１９０は、前述の図１０と同様であるので説明を省略する。 Step S190 is the same as that of FIG.

なお、以上において、ステップＳ７０′及びステップＳ９０′が、各請求項記載の反射音取得手段として機能する。 In the above, step S70 'and step S90' function as reflected sound acquisition means described in each claim.

図１４は、上記図１３のフローと並行して、受付端末２０の制御回路部２００により実行する制御手順を表すフローチャートであり、前述の図１１に対応する図である。図１１と同等の部分には同符号を付し適宜説明を省略する。 FIG. 14 is a flowchart showing a control procedure executed by the control circuit unit 200 of the reception terminal 20 in parallel with the flow of FIG. 13, and corresponds to FIG. 11 described above. Portions equivalent to those in FIG. 11 are denoted by the same reference numerals, and description thereof is omitted as appropriate.

図１４において、まず図１１のステップＳ２００に代えて設けたステップＳ２０５で、受付処理が開始されたことを表す上記受付開始フラグＦｍをＦｍ＝０に初期化する。 In FIG. 14, first, in step S205 provided in place of step S200 in FIG. 11, the reception start flag Fm indicating that the reception process is started is initialized to Fm = 0.

その後、新たに設けたステップＳ２１２で、受付処理を開始する操作が、来訪者Ｍにより図示しない操作部を介して行われたか否かを判定する。来訪者Ｍにより受付処理を開始する操作が行われるまでループ待機し、来訪者Ｍにより受付処理を開始する操作が行われた場合には、判定が満たされ、新たに設けたステップＳ２１４に移る。 Thereafter, in step S212 newly provided, it is determined whether or not an operation for starting the reception process has been performed by the visitor M via an operation unit (not shown). The loop waits until an operation for starting the reception process is performed by the visitor M. When the operation for starting the reception process is performed by the visitor M, the determination is satisfied, and the process proceeds to step S214 newly provided.

ステップＳ２１４では、上記受付開始フラグＦｍを受付処理の開始を表すＦｍ＝１とし、ステップＳ２２０に移る。 In step S214, the reception start flag Fm is set to Fm = 1 indicating the start of the reception process, and the process proceeds to step S220.

その後のステップＳ２２０〜ステップＳ３５０は、前述の図１１と同様である。ステップＳ３５０において、正当な来訪者Ｍが訪ねてきたことが確認できたことに対応して、対応する担当者のＩＰ電話機６０に発信（コール）を行ったら、図１１のステップＳ３６０に代えて設けたステップＳ３６５に移る。 Subsequent steps S220 to S350 are the same as those in FIG. In step S350, in response to confirming that the legitimate visitor M has visited, if a call is made to the IP phone 60 of the corresponding person in charge, it is provided instead of step S360 in FIG. The process proceeds to step S365.

ステップＳ３６５では、上記受付開始フラグＦｍをＦｍ＝０とした後、このフローを終了する。 In step S365, after the reception start flag Fm is set to Fm = 0, this flow is finished.

本変形例によれば、以下のような効果が得られる。 According to this modification, the following effects can be obtained.

すなわち、距離検出用の検出音の生成元となるソース音に基づき、検出音を生成するとき、ソース音のレベルがあまりにも小さいと、スピーカ２０８を介し出力する検出音のレベルも小さく、その反射音を検出することが困難となる場合があり得る。 That is, when the detection sound is generated based on the source sound that is the generation source of the detection sound for distance detection, if the level of the source sound is too low, the level of the detection sound output through the speaker 208 is also small and the reflection thereof It may be difficult to detect sound.

本変形例では、上記のような場合に対応することができる。すなわち、通常、受付端末２０において対話方式で操作を行おうとする来訪者Ｍは、自己の発話音声をなるべく認識してもらおうという意図が働き、ゆっくりと大きめの音量で発話を行う。したがって、距離検出時に、マイク２０７で入力した、上記のような来訪者Ｍの発話音声に基づいて疑似発話音声を生成し利用することで、精度の高い確実な距離検出を行うことができる。また、来訪者Ｍ自らが発声している発話音声を利用することにより、音を用いて検出していることを来訪者Ｍに比較的悟られにくいという効果もある。 This modification can cope with the above case. That is, normally, the visitor M who wants to perform an operation in an interactive manner at the reception terminal 20 works to intentionally recognize his / her speech as much as possible, and speaks slowly at a loud volume. Therefore, when the distance is detected, the pseudo utterance voice is generated and used based on the utterance voice of the visitor M as described above, which is input by the microphone 207, so that accurate and reliable distance detection can be performed. In addition, by using the uttered voice uttered by the visitor M himself, there is an effect that the visitor M is relatively less aware of the detection using sound.

（２）ソース音として受付端末の案内音声を利用する場合
以上においては、上記ソース音として、受付端末２０の周囲で発生した雑音や来訪者Ｍの発話音声を利用していたが、これに限られない。すなわち、上記ソース音として、受付処理においてスピーカ２０８を介し出力した案内音声を利用するようにしてもよい。 (2) When the guidance voice of the reception terminal is used as the source sound In the above description, the noise generated around the reception terminal 20 or the voice of the visitor M is used as the source sound. I can't. That is, as the source sound, guidance voice output through the speaker 208 in the reception process may be used.

図１５は、本変形例において、スピーカ２０８より疑似案内音声を出力するまでの手順の概要を説明した説明図である。 FIG. 15 is an explanatory diagram illustrating an outline of a procedure until the pseudo guidance voice is output from the speaker 208 in the present modification.

本変形例の受付端末２０では、上記（１）の変形例と同様に、例えば来訪者Ｍによって図示しない操作部が適宜に操作されることにより、受付処理が開始される。そして、図１５（ａ）に示すように、受付処理中において、スピーカ２０８よりソース音としての案内音声（装置音声）が出力されると、この案内音声が伝搬してマイク２０７に入力される（いわゆる案内音声のスピーカ２０８からマイク２０７へのまわり込み）。これにより、入力した案内音声に対応する振幅（又は周波数でもよい）を含むソース音情報としての案内音声情報（装置音声情報）が取得され、この案内音声情報に基づき、疑似案内音声情報が生成される。 In the reception terminal 20 of this modification, as in the modification of (1) above, the reception process is started by appropriately operating an operation unit (not shown) by the visitor M, for example. Then, as shown in FIG. 15A, during the reception process, when the guidance voice (device voice) as the source sound is output from the speaker 208, this guidance voice is propagated and input to the microphone 207 ( So-called guidance voice from the speaker 208 to the microphone 207). Thereby, guidance voice information (device voice information) as source sound information including amplitude (or frequency) corresponding to the inputted guidance voice is acquired, and pseudo guidance voice information is generated based on this guidance voice information. The

そして、図１５（ｂ）に示すように、上記のようにして疑似案内音声情報が生成されると、この疑似案内音声情報に基づき、疑似案内音声（検出音）がスピーカ２０８より出力される。なお、疑似案内音声は、当該疑似案内音声の生成元となった案内音声がマイク２０７に入力されてから所定時間（例えば、１［ｍｓｅｃ］）以内にスピーカ２０８より出力される。 Then, as shown in FIG. 15B, when the pseudo guidance voice information is generated as described above, the pseudo guidance voice (detected sound) is output from the speaker 208 based on the pseudo guidance voice information. The pseudo guidance voice is output from the speaker 208 within a predetermined time (for example, 1 [msec]) after the guidance voice that is the generation source of the pseudo guidance voice is input to the microphone 207.

なお、このとき特に、上記疑似案内音声が出力されるタイミングを、疑似案内音声の生成元となった案内音声の出力が終了する時としてもよい。すなわち、受付端末２０を対話による操作をしている来訪者Ｍは、スピーカ２０８より案内音声が出力されているときは、その案内音声の内容を聞いているため発話せず、案内音声の出力が終了した後に、その案内音声の内容に沿って発話するのが一般的である。したがって、このように疑似案内音声を出力するタイミングを、案内音声の出力が終了する時（来訪者Ｍが発話する前）とした場合には、発話しているときの来訪者Ｍまでの距離を、より正確に検出することができるという効果がある。 At this time, in particular, the timing at which the pseudo guidance voice is output may be the time when the output of the guidance voice that is the generation source of the pseudo guidance voice is finished. That is, when the guidance voice is output from the speaker 208, the visitor M who is operating the reception terminal 20 by dialogue does not utter because the guidance voice is output, and the guidance voice is output. It is common to utter along the content of the guidance voice after the end. Therefore, when the timing of outputting the pseudo guidance voice is set to the time when the output of the guidance voice is finished (before the visitor M speaks), the distance to the visitor M when speaking is set as the distance. There is an effect that it can be detected more accurately.

その後の処理は上記実施形態の図７及び図８とほぼ同様であり、疑似案内音声の反射音がマイク２０７に入力されることで、対応する反射音情報が取得され、来訪者Ｍまでの距離検出が行われる。そして、検出された距離に応じてマイク２０７のゲインが調整される。 The subsequent processing is almost the same as in FIGS. 7 and 8 of the above embodiment, and the reflected sound information corresponding to the reflected sound of the pseudo guidance voice is input to the microphone 207, and the distance to the visitor M is obtained. Detection is performed. Then, the gain of the microphone 207 is adjusted according to the detected distance.

このように、本変形例においては、来訪者Ｍとの受付処理においてスピーカ２０８が出力しマイク２０７で入力された案内音声（又は後述のように当該案内音声の反射音でもよい）により、案内音声情報を取得し、この取得した案内音声情報に基づき生成された疑似案内音声を、スピーカ２０８を介し出力する。これにより、上記（１）の変形例と同様、精度の高い確実な距離検出を行うことができ、音を用いて検出していることを来訪者Ｍに比較的悟られにくいという効果を得る。 As described above, in this modification, the guidance voice is output from the speaker 208 and received by the microphone 207 in the reception process with the visitor M (or a reflected sound of the guidance voice as described later). Information is acquired, and pseudo guidance voice generated based on the obtained guidance voice information is output via the speaker 208. As a result, as in the modified example of (1), it is possible to perform accurate distance detection with high accuracy and to obtain an effect that it is relatively difficult for the visitor M to detect using sound.

また、スピーカ２０８を介し出力した、言い換えれば、受付端末２０の内部で生成した案内音声を、ソース音として利用しているので、距離検出に都合のよい振幅（周波数）を含む案内音声情報に基づき、疑似案内音声（情報）を生成することができる。すなわち、このような案内音声情報に基づき、疑似案内音声（情報）を生成するときには、例えば案内音声情報の振幅が大きいものを選択して用いることで、解析（例えばフィルタリング解析等）の処理速度を速くすることができる。 In addition, since the guidance voice generated through the speaker 208, in other words, generated inside the reception terminal 20, is used as the source sound, the guidance voice information including the amplitude (frequency) convenient for distance detection is used. A pseudo guidance voice (information) can be generated. That is, when generating pseudo guidance voice (information) based on such guidance voice information, for example, by selecting and using one having a large amplitude of guidance voice information, the processing speed of analysis (for example, filtering analysis or the like) is increased. Can be fast.

なお、上記の例では、ソース音として上記案内音声を利用した例を示したが、上記案内音声の反射音を適宜利用してもよい。 In the above example, the guide voice is used as the source sound. However, the reflected sound of the guide voice may be used as appropriate.

（３）その他
上記来訪者受付システム１においては、来訪者Ｍまでの距離検出やマイク２０７のゲインの調整を行う受付端末２０と、ＤＢサーバ１０とは、別個の装置であった。しかしながらこれに限られず、マイク、スピーカを備えたサーバを会社の入口付近に設置し、サーバのみで、来訪者Ｍまでの距離検出、マイク２０７のゲインの調整、受付処理のすべてを行うようにしてもよい。また、来訪者予約ＤＢ１５１や社員ＤＢ１５５等、ＨＤＤ１５０に記憶されている情報を受付端末２０側のＨＤＤ２０５に記憶するようにしてもよいし、さらにはネットワークを介して受付端末２０に接続可能な別個の記憶装置に記憶させておき、受付処理中に、必要な情報を読み出す構成としてもよい。 (3) Others In the visitor reception system 1, the reception terminal 20 that detects the distance to the visitor M and adjusts the gain of the microphone 207 and the DB server 10 are separate devices. However, the present invention is not limited to this. A server equipped with a microphone and a speaker is installed near the entrance of the company, and only the server performs distance detection to the visitor M, adjustment of the gain of the microphone 207, and reception processing. Also good. Further, information stored in the HDD 150 such as the visitor reservation DB 151 and the employee DB 155 may be stored in the HDD 205 on the reception terminal 20 side, or a separate connection that can be connected to the reception terminal 20 via a network. It is good also as a structure which memorize | stores in a memory | storage device and reads required information during a reception process.

また、上記において、音声入力手段は、１つのマイク２０７で構成されていたが、これに限らず、複数の（例えば、２つの）マイクで構成してもよい（いわゆるアレー型のマイクロホン装置）。 In the above description, the voice input unit is configured by one microphone 207, but is not limited thereto, and may be configured by a plurality of (for example, two) microphones (a so-called array type microphone device).

また、上記においては、所定の演算処理として、スピーカ２０８を介し検出音を出力してから、その反射音がマイク２０７に入力されるまでの所要時間を測定し、この所要時間が、来訪者Ｍまでの距離に比例するという関係（上記の式１を参照）から来訪者Ｍまでの距離を検出した。しかしながら、これに限らず、所定の演算処理として、出力した検出音と入力した反射音との位相差から来訪者Ｍまでの距離を検出するようにしてもよい。 In the above, as a predetermined calculation process, the time required from the detection sound being output through the speaker 208 until the reflected sound is input to the microphone 207 is measured. The distance to the visitor M was detected from the relationship proportional to the distance (see Equation 1 above). However, the present invention is not limited thereto, and as a predetermined calculation process, the distance to the visitor M may be detected from the phase difference between the output detection sound and the input reflection sound.

なお、以上において、図４及び図５等の各図中に示す矢印は信号の流れの一例を示すものであり、信号の流れ方向を限定するものではない。 In addition, in the above, the arrow shown in each figure of FIG.4, FIG.5 etc. shows an example of the flow of a signal, and does not limit the flow direction of a signal.

また、図１０、図１１、図１３、及び図１４に示すフローチャートは本発明を上記フローに示す手順に限定するものではなく、発明の趣旨及び技術的思想を逸脱しない範囲内で手順の追加・削除又は順番の変更等をしてもよい。 In addition, the flowcharts shown in FIGS. 10, 11, 13, and 14 do not limit the present invention to the procedure shown in the above-described flow, and the procedure can be added without departing from the spirit and technical idea of the invention. You may delete or change the order.

また、以上既に述べた以外にも、上記実施形態や各変形例による手法を適宜組み合わせて利用しても良い。 In addition to those already described above, the methods according to the above-described embodiments and modifications may be used in appropriate combination.

その他、一々例示はしないが、本発明は、その趣旨を逸脱しない範囲内において、種々の変更が加えられて実施されるものである。 In addition, although not illustrated one by one, the present invention is implemented with various modifications within a range not departing from the gist thereof.

２０受付端末（対話装置）
２０１ＣＰＵ
２０７マイク（音声入力手段）
２０８スピーカ（音声出力手段）
２１７ゲイン可変アンプ
Ｍ来訪者（操作者） 20 Reception terminal (dialogue device)
201 CPU
207 Microphone (voice input means)
208 Speaker (Audio output means)
217 Variable gain amplifier M Visitor (operator)

Claims

An interactive device that an operator can operate in an interactive manner,
Voice input means for inputting voice;
Audio output means for outputting audio;
Source sound acquisition means for acquiring source sound information including a corresponding amplitude or frequency from a source sound that is input via the sound input means and serves as a generation source of detection sound for distance detection;
Detection sound output for outputting the detection sound generated based on the source sound information acquired by the source sound acquisition means via the audio output means within a predetermined time after the sound input means inputs the sound. Means,
Reflected sound acquisition means for acquiring corresponding reflected sound information from the reflected sound of the detected sound at the object input via the voice input means;
Based on the reflected sound information acquired by the reflected sound acquisition means, distance calculation means for performing a predetermined calculation process, detecting that the object is the operator and detecting the distance to the operator;
And a sensitivity adjusting means for adjusting a gain of the voice input means based on a detection result of the distance detecting means.

The source sound acquisition means is
An utterance voice as the source sound is obtained by an utterance voice as the source sound uttered by the operator in the dialogue processing of the dialogue apparatus,
The detection sound output means includes
2. The dialogue apparatus according to claim 1, wherein the detection sound generated based on the utterance voice information acquired by the source sound acquisition means is output via the voice output means.

The source sound acquisition means is
The apparatus sound information as the source sound information is acquired from the apparatus sound as the source sound or the reflected sound of the apparatus sound output from the sound output means and input by the sound input means in the dialogue process with the operator. ,
The detection sound output means includes
2. The dialogue apparatus according to claim 1, wherein the detection sound generated based on the apparatus sound information acquired by the source sound acquisition means is output via the sound output means.

The source sound acquisition means is
The ambient sound information as the source sound information is obtained by the ambient sound as the source sound generated around the device and input by the voice input means,
The detection sound output means includes
2. The dialogue apparatus according to claim 1, wherein the detection sound generated based on the ambient sound information acquired by the source sound acquisition means is output via the voice output means.

The start control means for starting the adjustment of the gain by the sensitivity adjustment means when the distance to the operator detected by the distance detection means becomes a predetermined value or less. Item 5. The interactive device according to any one of items 4 to 5.

6. The apparatus according to claim 5, further comprising setting control means for setting a gain of the voice input means to a predetermined value or more when a distance to the operator detected by the distance detection means is larger than the predetermined value. Interactive device.

After a dialogue process is completed using the voice input unit whose gain is adjusted by the sensitivity adjustment unit, a termination control unit is configured to end gain adjustment of the voice input unit by the sensitivity adjustment unit when a predetermined period has elapsed. The interactive apparatus according to claim 6.