JP2007201658A

JP2007201658A - Intercom system

Info

Publication number: JP2007201658A
Application number: JP2006015835A
Authority: JP
Inventors: Hironori Kuroshita; 広則黒下; Jun Nishimura; 順西村; Yasuhiro Ouchi; 靖弘大内
Original assignee: MegaChips System Solutions Inc
Current assignee: MegaChips System Solutions Inc
Priority date: 2006-01-25
Filing date: 2006-01-25
Publication date: 2007-08-09
Anticipated expiration: 2026-01-25
Also published as: JP4968663B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an intercom system that properly responds to a visitor in voice even when the visitor is unregistered, by analyzing voice of the visitor. <P>SOLUTION: The intercom system includes: a voice acquisition section for acquiring external voice from a voice signal; a storage section 31 for storing voice discrimination data 40 and a plurality of predetermined response voice data 50; a voice analysis section 32 for analyzing the voice signal and discriminating whether or not the voice signal is coincident with the discrimination data 50 stored in the storage section 31; and a control section 33 that controls an external voice output on the basis of one of the predetermined response voice data in the storage section according to the determination result of the voice analysis section. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、インターホンシステムであって、来訪者に応じて音声で適切に対応するインターホンシステムに関する。 The present invention relates to an intercom system that appropriately responds by voice according to a visitor.

インターホンシステムでは、セキュリティーの観点から様々な工夫が施されている。例えば、対応を拒否するメッセージを若い男性の音声で予め登録したインターホンシステムがある。この登録された音声を用いれば、留守番をしている小さな子供などであっても、来訪者の対応の際に来訪者に性別や年齢を特定されることなく、安心して対応を拒否することができる。また、外出時に来訪者の画像撮影、及び、音声録音を行うインターホンシステムがある。この撮影、録音機能を用いれば、帰宅時に不審者が訪れていないかを確認することができる。 In the intercom system, various ideas are taken from the viewpoint of security. For example, there is an intercom system in which a message for refusing correspondence is pre-registered with young male voice. Using this registered voice, even a small child with an answering machine can refuse to respond without worrying about the gender or age being specified by the visitor. it can. There is also an intercom system that captures images of visitors and records audio when they go out. By using this shooting and recording function, it is possible to confirm whether or not a suspicious person has visited when returning home.

特許文献に記載されたインターホンシステムでは、来訪者のアイリスを解析し、予め装置に登録されたアイリスデータをもとに来訪者を識別する。そして、識別結果に応じて、来訪者に応答するために予め登録された音声の出力、来訪者の画像撮影、音声録音、通知音の鳴動を自動的に行なう発明が記載されている。 In the intercom system described in the patent document, the iris of the visitor is analyzed, and the visitor is identified based on the iris data registered in advance in the apparatus. An invention is described that automatically outputs voices registered in advance for responding to visitors, captures images of the visitors, records voices, and sounds notification sounds according to the identification results.

特開平１１−８８５１６号公報Japanese Patent Laid-Open No. 11-88516

上記のアイリスの識別によるインターホンシステムでは、アイリスの画像認識により来訪者を識別している。しかし、アイリスの画像認識を行なうためには高度な計算処理や高精度のカメラが必要であり、高コストなシステムとなる。また、人の顔の近くに設置する必要があるため、人の手などが容易に触ることができ、故障する頻度も多くなることも考えられる。 In the intercom system based on the iris identification described above, the visitor is identified by the image recognition of the iris. However, in order to perform iris image recognition, advanced calculation processing and a high-accuracy camera are required, resulting in a high-cost system. Moreover, since it is necessary to install it near the face of a person, it is possible that the hand of a person can be easily touched and the frequency of failure increases.

また、上記発明は、来訪者が未登録来訪者に対して同じ音声で画一的な対応を行なうため、ユーザが外出時であっても在宅であるかのような対応をすることができなかった。また、来訪者を区別しても対応を拒否する音声を用いないので、ユーザが対応を望まない来訪者に対して、自動的に拒否することができないという問題があった。 Further, in the above invention, since the visitor performs a uniform response to the unregistered visitor with the same voice, it is not possible to cope as if the user is at home even when going out. It was. Moreover, since the voice which refuses a response is not used even if it distinguishes a visitor, there existed a problem that it cannot refuse automatically with respect to the visitor whom a user does not want to respond.

本発明に係るインターホンシステムは、外部からの音声を音声信号として取得する音声取得部と、前記音声の判定用のデータと、複数の予め定められた応答音声のデータとを記憶する記憶部と、前記音声信号を解析し、前記記憶部に記憶した前記判定用のデータと一致するか否かを判定する音声解析部と、前記音声解析部の判定結果に応じて、前記記憶部における一の前記予め定められた応答音声のデータに基づいて、外部への音声出力を制御する制御部とを備える。 An intercom system according to the present invention includes a sound acquisition unit that acquires external sound as a sound signal, a storage unit that stores the sound determination data, and a plurality of predetermined response sound data; Analyzing the audio signal and determining whether or not it matches the determination data stored in the storage unit, and according to the determination result of the audio analysis unit, one of the storage unit And a control unit that controls the output of the sound to the outside based on the data of the response sound determined in advance.

本発明のインターホンシステムによれば、来訪者の音声の解析結果に応じて、来訪者に応答するために予め登録された音声を自動的に出力する。これにより、コストや故障頻度を下げることができる。また、解析の判定に、例えば、来訪者の音声に含まれるキーワード、および、来訪者の音声が有する声紋といった２つの判定を用いるため、来訪者を正確に特定して、例えば、対応を拒否するための音声、および、対応するための音声を出力することができる。 According to the intercom system of the present invention, in accordance with the analysis result of the visitor's voice, the voice registered in advance for automatically responding to the visitor is output. Thereby, cost and failure frequency can be reduced. Further, since two determinations such as a keyword included in the visitor's voice and a voiceprint included in the visitor's voice are used for the determination of analysis, the visitor is accurately identified, for example, the response is rejected. Voice and voice to respond can be output.

＜実施の形態１＞
図１は、本実施の形態に係るインターホンシステムの構成をブロック図で示したものである。本実施の形態に係る通信端末は、子機１、表示操作端末２、制御装置３を備えている。主な動作として、子機１は来訪者に対応するための端末であり、制御装置３によって制御される。ユーザは表示操作端末２から制御装置３を制御するための設定や命令の入力を行う。 <Embodiment 1>
FIG. 1 is a block diagram showing the configuration of the intercom system according to the present embodiment. The communication terminal according to the present embodiment includes a handset 1, a display operation terminal 2, and a control device 3. As a main operation, the handset 1 is a terminal for responding to a visitor, and is controlled by the control device 3. The user inputs settings and commands for controlling the control device 3 from the display operation terminal 2.

子機１の詳細な構成を図２に示す。子機１は、来訪者がユーザに訪問したことを知らせるためのボタン１１と、来訪者の音声を取得し、音声信号に変換するマイク１２と、音声信号を音声に変換し、音声を出力するスピーカ１３と、画像を撮影し、画像を画像信号に変換するカメラ１４と、ボタン１１の入力信号、音声信号、画像信号を制御する制御部１５と、制御部１５と制御装置３の間の信号を送受信する通信部１６を備える。以上の構成により、子機１は、来訪者に対して、制御装置３から受信した音声信号に基づいて音声を出力する。その一方で、来訪者のボタン１１の入力、音声、画像を信号に変換し、制御装置３へ送信する。なお、本実施の形態では、ユーザに来訪者が来訪したことを知らせるためにボタン１１を入力するとしているが、これに限られず、外部から来訪した旨を報知する手段（来訪報知手段）により入力信号を取得することが可能であれば他のものを用いてもよい。例えば、同じような入力であればタッチパネルでもよいし、音声による入力であればマイク１２を用いてもよい。 A detailed configuration of the slave unit 1 is shown in FIG. The subunit | mobile_unit 1 acquires the button 11 for notifying that a visitor visited the user, the microphone 12 which acquires a visitor's audio | voice, and converts it into an audio | voice signal, converts an audio | voice signal into an audio | voice, and outputs an audio | voice. A speaker 13, a camera 14 that takes an image and converts the image into an image signal, a control unit 15 that controls an input signal, an audio signal, and an image signal of the button 11, and a signal between the control unit 15 and the control device 3 The communication part 16 which transmits / receives is provided. With the above configuration, handset 1 outputs a sound to the visitor based on the sound signal received from control device 3. On the other hand, the input of the visitor's button 11, sound, and image are converted into signals and transmitted to the control device 3. In this embodiment, the button 11 is input to inform the user that the visitor has visited, but the present invention is not limited to this, and the input is performed by means for notifying that the visitor has been visited from the outside (visit notification means). Others may be used as long as the signal can be acquired. For example, a touch panel may be used for similar inputs, and a microphone 12 may be used for voice inputs.

次に、表示操作端末２の詳細な構成を図３に示す。表示操作端末２は、ユーザから命令や設定の入力を受けるための操作部２１と、音声を取得し、音声信号に変換するマイク２２と、音声信号を音声に変換し、音声を出力するスピーカ２３と、画像信号を画像として表示する表示部２４と、操作部２１の入力信号、音声信号、画像信号を制御する制御部２５と、制御部２５と制御装置３の間の信号を送受信する通信部２６を備える。以上の構成により、表示操作端末２は、ユーザに対して、制御装置３から受信した音声信号に基づいて音声、画像信号に基づいて画像を出力する。その一方で、ユーザの操作部２１への入力、音声を信号に変換し、制御装置３へ送信する。 Next, the detailed structure of the display operation terminal 2 is shown in FIG. The display operation terminal 2 includes an operation unit 21 for receiving commands and setting inputs from a user, a microphone 22 that acquires sound and converts it into a sound signal, and a speaker 23 that converts the sound signal into sound and outputs the sound. A display unit 24 that displays an image signal as an image, a control unit 25 that controls an input signal, an audio signal, and an image signal of the operation unit 21, and a communication unit that transmits and receives signals between the control unit 25 and the control device 3. 26. With the above configuration, the display operation terminal 2 outputs an image based on the sound and the image signal based on the sound signal received from the control device 3 to the user. On the other hand, the user's input to the operation unit 21 and voice are converted into signals and transmitted to the control device 3.

次に、制御装置３の詳細な構成を図４に示す。制御装置３は、本実施の形態のインターホンシステムの動作を担うメインとなる部分である。その構成は、音声、画像および設定についてのデータを記憶する記憶部３１と、子機１から制御装置３に入力された音声信号を解析し、音声信号が記憶部３１に記憶されている判定用のデータ４０と一致するか否かを判定する音声解析部３２と、子機１または表示操作端末２から各種信号を受信し、音声解析部３２の判定結果や記憶部３１の各種データに基づいて、子機１、表示操作端末２および各種信号を制御する制御部３３からなる。 Next, a detailed configuration of the control device 3 is shown in FIG. The control device 3 is a main part that is responsible for the operation of the intercom system of the present embodiment. The configuration includes a storage unit 31 that stores data on sound, images, and settings, and an analysis of the sound signal input from the slave unit 1 to the control device 3 and the determination of the sound signal stored in the storage unit 31. The voice analysis unit 32 that determines whether or not the data 40 coincides with each other, and various signals are received from the handset 1 or the display operation terminal 2, and based on the determination result of the voice analysis unit 32 and the various data in the storage unit 31. The slave unit 1, the display operation terminal 2, and a control unit 33 for controlling various signals.

記憶部３１は、音声解析部３２の判定基準に用いられる判定用のデータ４０と、来訪者に応答するための音声のデータである応答音声のデータ５０と、表示操作端末２のスピーカ２３から様々なパターンの通知音を出力するためのデータである通知音のデータ６０と、来訪者に質問するための質問音声のデータ７０と、来訪者の特徴を示す来訪者のデータ８０を記憶している。 The storage unit 31 has various data from the determination data 40 used as the determination criterion of the voice analysis unit 32, the response voice data 50 which is voice data for responding to the visitor, and the speaker 23 of the display operation terminal 2. Notification sound data 60, which is data for outputting a notification sound of a simple pattern, question voice data 70 for asking a visitor a question, and visitor data 80 indicating the characteristics of the visitor are stored. .

判定用のデータ４０は、来訪者の音声に含まれるとユーザが予想して予め設定されたキーワードのデータ４１と、すでに一度来訪した来訪者の音声から抽出されユーザに登録された声紋のデータ４２からなる。例えば、キーワードのデータ４１は図５、声紋のデータ４２は図６のようなテーブルで記憶されている。図５のキーワードおよび図６の声紋はそれぞれ、応答音声および通知音を対応させたテーブルで記憶されている。キーワードは、操作部２１から入力したものでもよいし、マイク２２から入力した音声信号を音声解析部３２で抽出したものでもよい。 The determination data 40 includes keyword data 41 preset by the user in anticipation of being included in the voice of the visitor, and voiceprint data 42 extracted from the voice of the visitor who has already visited and registered to the user. Consists of. For example, the keyword data 41 is stored in a table as shown in FIG. 5, and the voiceprint data 42 is stored in a table as shown in FIG. The keywords in FIG. 5 and the voiceprints in FIG. 6 are stored in tables in which response voices and notification sounds are associated with each other. The keyword may be input from the operation unit 21, or may be extracted from the audio signal input from the microphone 22 by the audio analysis unit 32.

応答音声のデータ５０は、対応が必要な来訪者に対して用いられる対応音声のデータ５１と、対応が不要な来訪者に対して用いられる拒否音声のデータ５２からなる。例えば、対応音声のデータ５１は図７、拒否音声のデータ５２は図８のようなテーブルで記憶されている。対応音声は来訪者に留守などを告げる内容からなり、拒否音声は来訪者に留守を知らせずに対応を拒否する内容からなる。 The response voice data 50 includes response voice data 51 used for visitors who need to respond and rejection voice data 52 used for visitors who do not need to respond. For example, the corresponding voice data 51 is stored in a table as shown in FIG. 7, and the reject voice data 52 is stored in a table as shown in FIG. Corresponding voice consists of contents that inform visitors of absence, etc., and rejected voice consists of contents that refuse to respond without notifying visitors of absence.

通知音のデータ６０は、様々な音のパターンが図９のテーブル用のように記憶されている。対応通知音のデータ６１と、拒否通知音のデータ６２は、その中からユーザによって選ばれ、区別されたものである。 The notification sound data 60 stores various sound patterns as shown in the table of FIG. Corresponding notification sound data 61 and rejection notification sound data 62 are selected and distinguished by the user.

質問音声のデータ７０は来訪者に質問する内容からなる音声のデータである。質問音声は、例えば、「はい、どちら様ですか？」あるいは「どのようなご用件でしょうか？」という内容の音声である。 The question voice data 70 is voice data composed of contents for asking a visitor. The question voice is, for example, a voice with the content “Yes, who is it?” Or “What is your business?”

来訪者のデータ８０は、来訪者の音声のデータ８１と、来訪者の画像のデータ８２と、声紋が登録されていない来訪者の音声の声紋のデータ８３と、来訪者ごとの一定期間の間に来訪した回数のデータ８４からなる。 The visitor data 80 includes the visitor voice data 81, the visitor image data 82, the visitor voice voice data 83 in which no voiceprint is registered, and a certain period for each visitor. Data 84 of the number of visits to.

動作制御装置３の動作を図１０のフローチャートを用いて説明する。 The operation of the operation control device 3 will be described with reference to the flowchart of FIG.

来訪者が子機１のボタン１１を押すなどの操作によって（ＳＴ−１１）、入力信号が制御部３３に伝えられる。制御部３３は、記憶部３１に記憶された質問音声のデータ７０に応じて子機１のスピーカ１３から音声を出力する（ＳＴ−１２）。 An input signal is transmitted to the control unit 33 by an operation such as a visitor pressing the button 11 of the handset 1 (ST-11). The control unit 33 outputs a sound from the speaker 13 of the handset 1 according to the question sound data 70 stored in the storage unit 31 (ST-12).

質問音声は、「はい、どちら様ですか？」あるいは「どのようなご用件でしょうか？」という内容の音声である。この音声を用いて、来訪者に発声を促す。 The question voice is a voice with a content of “Yes, who is it?” Or “What is your business?” This voice is used to urge the visitor to speak.

そして、来訪者は質問音声に対して音声で回答する（ＳＴ−１３）。子機１のマイク１２は来訪者の音声を取得し、音声信号に変換する。 The visitor answers the question voice by voice (ST-13). The microphone 12 of the handset 1 acquires the voice of the visitor and converts it into a voice signal.

その音声信号は音声解析部３２に伝えられ、解析され、記憶部３１の判定用のデータ４０と一致するか否かについて判定される（ＳＴ−１４）。この判定用のデータ４０にはキーワードのデータ４１と声紋のデータ４２の２つ有しているが、本実施の形態では、２つの判定用のデータのうちいずれか一方のみを用いて判定する。両方の判定を用いたものについては、実施の形態２で説明する。 The audio signal is transmitted to the audio analysis unit 32, analyzed, and it is determined whether or not it matches the determination data 40 in the storage unit 31 (ST-14). The determination data 40 has two data, ie, keyword data 41 and voiceprint data 42. In this embodiment, the determination data 40 is determined using only one of the two determination data. A method using both determinations will be described in the second embodiment.

判定用のデータ４０として、キーワードのデータ４１が用いられた場合には、音声解析部３２は、音声信号を解析し、記憶部３１に記憶されたキーワード（図５）が含まれているか否かを判定する。一方、判定用のデータ４０として、声紋のデータ４２が用いられた場合には、音声解析部３２は、音声信号を解析し、記憶部３１に記憶された声紋（図６）を有しているか否かを判定する。いずれの判定であっても、これらの判定結果は制御部３３に伝えられる。 When the keyword data 41 is used as the determination data 40, the voice analysis unit 32 analyzes the voice signal and determines whether or not the keyword (FIG. 5) stored in the storage unit 31 is included. Determine. On the other hand, when the voiceprint data 42 is used as the determination data 40, the voice analysis unit 32 analyzes the voice signal and has the voiceprint (FIG. 6) stored in the storage unit 31. Determine whether or not. In any determination, these determination results are transmitted to the control unit 33.

制御部３３はＳＴ−１４の判定結果に基づいて、記憶部３１に記憶された複数の応答音声のデータ５０から１つの応答音声データ５０を選ぶ。例えば、来訪者の音声が「しんぶん」というキーワードを含んでいたと判定された場合、１つの応答音声のデータ５０は図５に基づいて「拒否音声ＮＯ．１」が選ばれる。そして、１つの応答音声のデータ５０に基づいて、制御部３３は子機１のスピーカ１３から音声を出力するように子機１を制御する（ＳＴ−１５）。応答音声に「拒否音声ＮＯ．１」が選ばれている場合には、図８に基づいて子機１のスピーカ１３から「結構です。」という音声が出力されることになる。 The control unit 33 selects one response voice data 50 from the plurality of response voice data 50 stored in the storage unit 31 based on the determination result of ST-14. For example, if it is determined that the visitor's voice includes the keyword “shinbu”, “reject voice NO.1” is selected as one response voice data 50 based on FIG. Then, based on the data 50 of one response voice, the control unit 33 controls the handset 1 to output the sound from the speaker 13 of the handset 1 (ST-15). When “Reject Voice NO.1” is selected as the response voice, the voice “It is fine” is output from the speaker 13 of the handset 1 based on FIG.

なお、応答音声は、図７のような対応音声と図８のような拒否音声からなるが、対応音声と拒否音声の区分は、音声解析部３２での判定が一致、不一致か区別によるものではない。なぜなら、音声解析部３２の判定で一致した来訪者の中には、対応したい来訪者である場合もあれば、対応をしたくない来訪者である場合もあるからである。そのため、図５および図６のように、音声解析部３２の判定結果が一致した場合に用いられる応答音声には、対応音声、拒否音声が混在することになる。結局、対応音声と拒否音声は、音声解析部３２で一致と判定された来訪者が誰であるか、どのような者か、をユーザが想定することにより設定されることになる。そのため、音声解析部３２で不一致と判定された場合は、来訪者が誰であるか等を想定できないことになる。この場合は、ユーザの判断によって設定された対応音声または拒否音声のどちらかが、子機１のスピーカ１３から出力される（ＳＴ−１６）。 The response voice is composed of the corresponding voice as shown in FIG. 7 and the rejected voice as shown in FIG. 8, but the classification of the corresponding voice and the rejected voice is based on whether the determination in the voice analysis unit 32 is the same or not. Absent. This is because some of the visitors that are matched by the determination of the voice analysis unit 32 may be visitors who want to respond or may not want to respond. Therefore, as shown in FIG. 5 and FIG. 6, the response voice and the reject voice are mixed in the response voice used when the determination results of the voice analysis unit 32 match. Eventually, the corresponding voice and the reject voice are set by the user assuming who is the visitor determined to be the same by the voice analysis unit 32 and who is the person. Therefore, if the voice analysis unit 32 determines that there is a mismatch, it cannot be assumed who the visitor is. In this case, either the corresponding voice or the reject voice set by the user's judgment is output from the speaker 13 of the handset 1 (ST-16).

以上のように、キーワードまたは声紋によって来訪者の音声を判定することにより、来訪者を特定し、ユーザの設定に応じて音声による応答を自動的に行なうことができる。これにより、従来手法のように来訪者識別の為に高精度のカメラを用いることなく、マイク１２を用いることができ、システムコストを抑えることができる。また、ユーザが外出中であっても、まるでユーザが在宅して応答しているかの印象を与えることができるので、外出中の盗難を減らす防犯効果が期待できる。 As described above, by determining a visitor's voice based on a keyword or a voiceprint, a visitor can be identified and a voice response can be automatically made according to the user's settings. Accordingly, the microphone 12 can be used without using a high-precision camera for visitor identification as in the conventional method, and the system cost can be reduced. In addition, even when the user is out, it is possible to give an impression that the user is at home and responding, so a crime prevention effect that reduces theft while out can be expected.

また、質問音声、対応音声、拒否音声は装置に初期から記憶させた固定音声を用いもよいが、予め表示操作端末２のマイク２２から入力し、記憶部３１にデータとして記憶したユーザの音声を用いても良い。このようにすれば、ユーザの音声を用いて音声内容を自由に変更することができる。これにより、ユーザが在宅して応答しているかの印象をさらに与えることができるので、外出中の盗難を減らす防犯効果がさらに期待できる。 The question voice, the corresponding voice, and the reject voice may be fixed voices stored in the apparatus from the beginning. However, user voices that are input in advance from the microphone 22 of the display operation terminal 2 and stored as data in the storage unit 31 are used. It may be used. If it does in this way, a voice content can be changed freely using a user's voice. Thereby, since the impression that the user is staying at home and responding can be given, the crime prevention effect which reduces theft while going out can be further expected.

この防犯効果について、拒否音声をランダムに使用し、毎回異なる内容の音声で対応すれば、さらに在宅しているかのような印象を与えることができる。なお、以上の動作はユーザが在宅、外出しているに係わらず、常時行なうようにする。 With respect to this crime prevention effect, if the rejection voice is used at random and is handled with a voice having a different content each time, it is possible to give an impression that the person is at home. The above operation is always performed regardless of whether the user is at home or going out.

また、質問音声、対応音声、拒否音声として録音されたユーザの音声を声質変換できるようにする。これは、制御部３３の信号制御機能にこの機能を追加するだけで実現することが可能である。この声質変換は、例えば、図８の拒否音声のデータのテーブルのように性別、年齢などの設定ができる機能をもつ。このようにすれば、ユーザが若い一人暮らしの女性であっても、野太い男性の声で「もう来るな！」と強く対応を拒否することができる。 Also, the voice of the user recorded as the question voice, the corresponding voice, and the reject voice can be converted. This can be realized simply by adding this function to the signal control function of the control unit 33. This voice quality conversion has a function for setting gender, age, etc. as in the table of rejected voice data in FIG. 8, for example. In this way, even if the user is a young woman living alone, it is possible to strongly refuse to respond with “Don't come again!” In the voice of a thick man.

次に通知音のデータ６０について説明する。通知音は、様々な音のパターンが図９のテーブル用のように記憶されている。ユーザはこの通知音のテーブルから、対応が必要な来訪者が訪問したときに屋内に鳴る通知音のパターンのデータである対応通知音のデータ６１と、対応が不要な来訪者が訪問したときに屋内に鳴る通知音のパターンのデータである拒否通知音のデータ６２を選ぶ。そして、対応音声を出力するのと同じタイミングで、屋内の表示操作端末２のスピーカ２３から対応通知音が鳴動し、拒否音声を出力するのと同じタイミングで、屋内の表示操作端末２のスピーカ２３から拒否通知音が鳴動するようにする。このようにすれば、ユーザが在宅している時に、ユーザはどのような来訪者が来訪したかをインターホンで対応する前に知ることができる。これらの通知音は、常に鳴動される必要は無く、装置に切り替え用のスイッチを設けて、ユーザがそのスイッチを切り替えることで、通知音が鳴動しないようにしてもよい。なお、本実施の形態では、表示操作端末２から、通知音という音のパターンで音声解析部３２の判定結果をユーザに通知しているが、これに限られず、音声解析部３２の判定結果をユーザに通知する手段（判定通知手段）を有するものであれば他のものを用いてもよい。例えば、ライトの点灯、ライトの点滅、あるいは、画面による表示でユーザに通知してもよい。 Next, the notification sound data 60 will be described. As the notification sound, various sound patterns are stored as in the table of FIG. From the notification sound table, the user responds to the corresponding notification sound data 61 that is the data of the notification sound pattern that sounds indoors when a visitor who needs to respond visits. Rejection notification sound data 62, which is the notification sound pattern data that rings indoors, is selected. Then, at the same timing when the corresponding sound is output, the corresponding notification sound sounds from the speaker 23 of the indoor display operation terminal 2 and at the same timing as when the rejection sound is output, the speaker 23 of the indoor display operation terminal 2. Make a refusal notification sound. In this way, when the user is at home, the user can know what visitors have visited before using the intercom. These notification sounds do not always have to be sounded, and a switch for switching may be provided in the apparatus so that the notification sound does not sound when the user switches the switch. In the present embodiment, the display operation terminal 2 notifies the user of the determination result of the voice analysis unit 32 with a sound pattern called a notification sound. However, the present invention is not limited to this, and the determination result of the voice analysis unit 32 is displayed. Other devices may be used as long as they have means (determination notification means) for notifying the user. For example, the user may be notified by turning on the light, blinking the light, or displaying on the screen.

次に来訪者のデータ８０について説明する。来訪者のデータ８０は、ユーザの外出時に子機１より来訪者の音声や画像を取得し、来訪者の特徴を表すデータとして記憶部３１に記憶したものである。来訪者の音声のデータ８１と、来訪者の画像のデータ８２と、判定用データ４０に登録されていない来訪者の音声の声紋のデータ８３と、外出している間などの一定期間に来訪者に応答した回数を、来訪者の声紋ごとに記憶した来訪回数のデータ８４からなる。 Next, the visitor data 80 will be described. The visitor data 80 is obtained by acquiring the voice and image of the visitor from the handset 1 when the user goes out, and storing the visitor data 80 in the storage unit 31 as data representing the characteristics of the visitor. Visitor voice data 81, visitor image data 82, visitor voice voice print data 83 not registered in the determination data 40, and visitors during a certain period of time, such as while going out The number of visits is stored for each visitor's voiceprint.

対応が必要な来訪者が訪問したとき、つまり、対応音声が出力されているときに、子機１のマイク１２から来訪者の音声を取得する。そして、その音声を音声信号に変換し、制御部３３を介して記憶部３１に来訪者の音声のデータ８１として記憶する。同時に、子機１のカメラ１４から来訪者の画像を取得する。そして、その画像を画像信号に変換し、制御部３３を介して記憶部３１に来訪者の画像のデータ８２として記憶する。来訪者の音声のデータ８１や来訪者の画像のデータ８２を記憶することにより、ユーザは、外出時に来訪した対応必要な来訪者の様子や来訪目的などを帰宅後に確認することができる。 When a visitor who requires handling visits, that is, when the corresponding voice is output, the voice of the visitor is acquired from the microphone 12 of the handset 1. Then, the voice is converted into a voice signal and stored as voice data 81 of the visitor in the storage unit 31 via the control unit 33. At the same time, an image of a visitor is acquired from the camera 14 of the child device 1. Then, the image is converted into an image signal and stored as image data 82 of the visitor image in the storage unit 31 via the control unit 33. By storing the visitor's voice data 81 and the visitor's image data 82, the user can confirm the state of the visitor who needs to respond when he is out, the purpose of visit, and the like after returning home.

音声解析部３２の声紋の判定で登録されていないと判定したときに、その音声の声紋のデータを来訪者の音声の声紋のデータ８３として記憶部３１に記憶する。それと同時に、子機１のマイク１２から取得した来訪者の音声を来訪者の音声のデータ８１として、さらに、子機１のカメラ１４から取得した来訪者の画像を来訪者の画像のデータ８２として、記憶部３１に記憶する。ユーザはこのようにして記憶された来訪者の音声と画像を確認し、必要に応じてこの来訪者の音声の声紋のデータ８３を判定用の声紋のデータ４２として登録する。こうして、判定用のデータ４０を蓄積することにより、ＳＴ−１４の判定をより正確なものとすることができる。 When it is determined that the voice analysis unit 32 has not registered the voiceprint, the voiceprint data of the voice is stored in the storage unit 31 as voiceprint data 83 of the visitor's voice. At the same time, the visitor's voice acquired from the microphone 12 of the slave unit 1 is used as the visitor's voice data 81, and the visitor image acquired from the camera 14 of the slave unit 1 is used as the visitor's image data 82. And stored in the storage unit 31. The user confirms the voice and image of the visitor stored in this manner, and registers the voice print data 83 of the visitor as the voice print data 42 for determination as necessary. Thus, by accumulating the determination data 40, the determination of ST-14 can be made more accurate.

来訪回数のデータ８４は、声紋のデータごとに対応させた数字からなるデータである。一定期間、音声解析部３２による声紋解析で一致と判定されるたびに、その数字は制御部３３によって積算される。これにより、ユーザが外出時に訪れた来訪者ごとに、来訪した回数をユーザは知ることができる。ユーザは、この回数を来訪者が不審者であるか否か、および、来訪者の来訪目的が緊急か否かについての目安にすることができる。 The number-of-visits data 84 is data composed of numbers corresponding to each voiceprint data. Each time the voice analysis by the voice analysis unit 32 determines that they match, the number is integrated by the control unit 33 for a certain period. Thus, the user can know the number of visits for each visitor the user visits when going out. The user can use this number of times as a guideline as to whether or not the visitor is a suspicious person and whether or not the visitor's visit purpose is urgent.

＜実施の形態２＞
本実施の形態に係るインターホンシステムの構成は、実施の形態１と同じ構成（図４）である。実施の形態１では、キーワードまたは声紋による判定のどちらか一方のみを用いて来訪者の音声を判定していた。そして、判定された来訪者が誰であるか等をユーザが想定し、来訪者に対して出力する対応音声および拒否音声を設定した。この対応音声および拒否音声を適切に使い分けることができるか否かは、来訪者が誰であるか等を上記の判定により、どれだけ正確に特定できるかに依存する。さらに、音声解析部３２で不一致と判定された場合、対応音声または拒否音声のどちらを用いるかは、ユーザの判断に委ねられるという不完全なものである。そこで、本実施の形態は、キーワードによる判定、声紋による判定の両方用いることにする。実施の形態１で同じように扱ったキーワードによる判定と声紋による判定の間には、長所と短所に差異があるためである。その長所と短所は以下の通りである。 <Embodiment 2>
The configuration of the intercom system according to the present embodiment is the same as that of the first embodiment (FIG. 4). In the first embodiment, the voice of the visitor is determined using only one of the determination based on the keyword or the voiceprint. Then, the user assumes who the determined visitor is, and the like, and the corresponding voice and rejection voice to be output to the visitor are set. Whether or not the corresponding voice and rejected voice can be properly used depends on how accurately the visitor can be specified by the above determination. Furthermore, when it is determined by the voice analysis unit 32 that they do not match, it is incomplete that it is left to the user to decide whether to use the corresponding voice or the rejected voice. Therefore, in this embodiment, both the determination by the keyword and the determination by the voiceprint are used. This is because there is a difference between advantages and disadvantages between the determination based on the keywords handled in the same manner as in the first embodiment and the determination based on the voiceprint. The advantages and disadvantages are as follows.

キーワードで判定する方法は、一度も来訪したことがない者に対しても音声にキーワードを含めば、ある程度来訪者を特定できる点、また、キーワードの判定を行なう回数を増やすほど、来訪者を特定できる点が長所である。ただし、来訪者の音声がたまたまキーワードを含んでいる場合があるなど偶然性に左右される点、正確に来訪者を特定するためには、来訪者に何度も回答させなければならないという点が短所である。 As for the method of judging by keyword, it is possible to identify visitors to some extent if the keyword is included in the voice even for those who have never visited, and the more the number of times the keyword is judged, the more visitors are identified What can be done is an advantage. However, there are disadvantages in that the voice of the visitor may happen to include a keyword, which is influenced by chance, and in order to accurately identify the visitor, the visitor must answer repeatedly. It is.

声紋で判定する方法は、１回判定を行なうだけで、登録した来訪者であれば正確に誰であるか等を特定できる点が長所である。人の音声はほぼ個人特有のものだからである。この点では、キーワードで識別する方法よりも優れている。ただし、声紋の判定を行なう回数を増やしても、それ以上、来訪者を特定することができないという、従来と同じ問題を有する点が短所である。 The method of determining by voiceprint is advantageous in that it is possible to specify exactly who the registered visitor is, for example, by performing determination only once. This is because human voice is almost individual-specific. In this respect, it is superior to the method of identifying by keyword. However, there is a disadvantage in that it has the same problem as in the prior art that a visitor cannot be specified even if the number of times of voiceprint determination is increased.

以上の理由のため、キーワードによる判定、声紋による判定のどちらか一方だけを用いるのではなく、それぞれの長所を活かしつつ、短所をなくすために、図１１のフローチャートのように両方の判定を用いる。 For these reasons, instead of using only one of determination by keyword and determination by voiceprint, both determinations are used as shown in the flowchart of FIG. 11 in order to eliminate the disadvantages while taking advantage of the respective advantages.

来訪者が子機１のボタン１１を押すなどの操作によって（ＳＴ−２１）、入力信号が制御部３３に伝えられる。制御部３３は、記憶部３１に記憶された質問音声のデータ７０に応じて子機１のスピーカ１３から音声を出力する（ＳＴ−２２）。 An input signal is transmitted to the control unit 33 by an operation such as a visitor pressing the button 11 of the handset 1 (ST-21). The control unit 33 outputs a sound from the speaker 13 of the handset 1 according to the question sound data 70 stored in the storage unit 31 (ST-22).

そして、来訪者は質問音声に対して音声で回答する（ＳＴ−２３）。子機１のマイク１２は来訪者の音声を取得し、音声信号に変換する。 Then, the visitor answers the question voice by voice (ST-23). The microphone 12 of the handset 1 acquires the voice of the visitor and converts it into a voice signal.

その音声信号は音声解析部３２に伝えられ、解析され、キーワードのデータ４１と一致するか否かについて判定される（ＳＴ−２４）。 The voice signal is transmitted to the voice analysis unit 32 and analyzed to determine whether or not it matches the keyword data 41 (ST-24).

そして、キーワードによる判定によって選ばれた応答音声のデータ５０に応じた音声を子機１のスピーカ１３で出力するように、制御部３３は子機１を制御する（ＳＴ−２５）。 And the control part 33 controls the subunit | mobile_unit 1 so that the audio | voice according to the data 50 of the response voice selected by the determination by a keyword may be output with the speaker 13 of the subunit | mobile_unit 1 (ST-25).

キーワードによる判定で一致しないと判定された音声信号については、さらに音声解析部３２によって解析され、声紋のデータ４２と一致するか否かについて判定される（ＳＴ−２６）。 The voice signal determined not to match by the determination by the keyword is further analyzed by the voice analysis unit 32 to determine whether or not it matches the voiceprint data 42 (ST-26).

そして、声紋による判定によって選ばれた応答音声のデータ５０に応じた音声を子機１のスピーカ１３で出力するように、制御部３３は子機１を制御する（ＳＴ−２７）。 And the control part 33 controls the subunit | mobile_unit 1 so that the audio | voice according to the data 50 of the response voice selected by the determination by a voiceprint may be output with the speaker 13 of the subunit | mobile_unit 1 (ST-27).

このようにすれば、キーワードをたまたま音声に含まなかったために、来訪者を特定できなかったとしても、その来訪者が一度来訪しユーザに声紋が登録された来訪者であれば、その来訪者を適切に特定することができる。 In this way, even if the visitor could not be identified because the keyword was not included in the voice, if the visitor was a visitor and the voiceprint was registered for the user, the visitor was selected. Can be identified appropriately.

一方、声紋による判定によって一致するデータがなかった場合は、ユーザの判断によって設定された対応音声または拒否音声のどちらかが、子機１のスピーカ１３から出力される（ＳＴ−２８）。しかし、先にキーワードによる判定を行なっているため、実施の形態１よりも判定が一致しないという状況を減らすことができる。 On the other hand, if there is no matching data as determined by the voiceprint, either the corresponding voice or the rejected voice set by the user's judgment is output from the speaker 13 of the handset 1 (ST-28). However, since the determination by the keyword is performed first, it is possible to reduce the situation where the determination does not match that of the first embodiment.

以上より、来訪者を特定する精度を向上させることができることに加え、判定で一致しない来訪者の数を従来技術よりも減らすことできる。なお、図示はしていないが、キーワードが偶然一致した場合を想定して、ＳＴ−２５の後に声紋による音声の判定を必要に応じて加えても良い。 From the above, in addition to improving the accuracy of identifying visitors, the number of visitors that do not match in the determination can be reduced as compared with the prior art. Although not shown, assuming that the keywords coincide by chance, voice determination based on a voiceprint may be added as necessary after ST-25.

また、判定の別の組み合わせとして、図１２ようなフローチャートも考えられる。 Further, as another combination of determinations, a flowchart as shown in FIG. 12 is also conceivable.

来訪者が子機１のボタン１１を押すなどの操作によって（ＳＴ−３１）、入力信号が制御部３３に伝えられる。制御部３３は、記憶部３１に記憶された質問音声のデータ７０に応じて子機１のスピーカ１３から音声を出力する（ＳＴ−３２）。 An input signal is transmitted to the control unit 33 by an operation such as a visitor pressing the button 11 of the handset 1 (ST-31). The control unit 33 outputs a sound from the speaker 13 of the handset 1 according to the question sound data 70 stored in the storage unit 31 (ST-32).

そして、来訪者は質問音声に対して音声で回答する（ＳＴ−３３）。子機１のマイク１２は来訪者の音声を取得し、音声信号に変換する。 Then, the visitor answers the question voice by voice (ST-33). The microphone 12 of the handset 1 acquires the voice of the visitor and converts it into a voice signal.

その音声信号は音声解析部３２に伝えられ、解析され、声紋のデータ４２と一致するか否かについて判定される（ＳＴ−３４）。 The voice signal is transmitted to the voice analysis unit 32, analyzed, and it is determined whether or not it matches the voiceprint data 42 (ST-34).

そして、声紋による判定によって選ばれた応答音声のデータ５０に応じた音声を子機１のスピーカ１３で出力するように、制御部３３は子機１を制御する（ＳＴ−３５）。 And the control part 33 controls the subunit | mobile_unit 1 so that the audio | voice according to the response audio | voice data 50 selected by the determination by a voiceprint may be output with the speaker 13 of the subunit | mobile_unit 1 (ST-35).

声紋による判定で一致しないと判定された音声信号については、さらに音声解析部３２によって解析され、キーワードのデータ４１と一致するか否かについて判定される（ＳＴ−３６）。 The voice signal determined not to match in the determination by the voiceprint is further analyzed by the voice analysis unit 32 to determine whether or not it matches the keyword data 41 (ST-36).

そして、キーワードによる判定によって選ばれた応答音声のデータ５０に応じた音声を子機１のスピーカ１３で出力するように、制御部３３は子機１を制御する（ＳＴ−３７）。 And the control part 33 controls the subunit | mobile_unit 1 so that the audio | voice according to the data 50 of the response voice selected by the determination by a keyword may be output with the speaker 13 of the subunit | mobile_unit 1 (ST-37).

まだ一致しないと判定された音声信号については、さらに先に行なった質問音声とは別の質問内容の質問音声を出力し（ＳＴ−３８）、来訪者らに音声で回答させる（ＳＴ−３９）。そして、再度キーワードによる判定を行なうことにより、キーワードのデータ４１と一致するか否かを判定する。これを繰り返すことにより、全ての来訪者を正確に特定すし、適切な応答音声を用いることができる。 For the voice signal determined not to match yet, a question voice having a question content different from the question voice performed earlier is output (ST-38), and the visitors are answered by voice (ST-39). . Then, by determining again by keyword, it is determined whether or not it matches the keyword data 41. By repeating this, it is possible to accurately identify all visitors and use an appropriate response voice.

この判定の組み合わせによれば、最初に声紋の判定を用いているので、一度来訪して音声声紋を登録された来訪者を１回の判定で特定し、すぐに対応音声を出力することができる。さらに、声紋の判定で一致していないと判定された来訪者であっても、キーワードの判定を行なうことにより、来訪者を特定することができる。キーワードの判定によっても一致しない場合には、質問音声を出力し、来訪者に回答させて、再度キーワードの判定を行なう。このキーワードの判定を数回行なうことにより、来訪者をさらに正確に特定することができる。 According to this combination of determinations, since the determination of the voice print is used first, it is possible to identify the visitor who has once visited and registered the voice voice print by one determination, and to output the corresponding voice immediately. . Furthermore, even if the visitor is determined not to match in the voiceprint determination, the visitor can be specified by performing the keyword determination. If the keywords do not match, a question voice is output, the visitor is answered, and the keyword is determined again. By performing this keyword determination several times, a visitor can be specified more accurately.

以上より、来訪者を特定する精度を向上させることができることに加え、判定で一致しない来訪者の数を従来技術よりも大幅に減らすことできる。 From the above, in addition to improving the accuracy of identifying visitors, the number of visitors that do not match in the determination can be greatly reduced as compared with the prior art.

以上のように、キーワードによる判定と声紋による判定を組み合わせて用いることにより、インターホンシステムは留守を伝える必要がない来訪者であるか否かを正確に特定することができる。これは、応答音声、拒否音声の使い分けを正確に用いることができることを意味する。 As described above, by using a combination of determination by keyword and determination by voiceprint, the intercom system can accurately specify whether or not the visitor does not need to report absence. This means that the proper use of response voice and rejection voice can be used accurately.

これにより、留守を伝える必要がない来訪者に対して間違って対応音声を用いることや、留守を伝える必要がある来訪者に対して間違って拒否音声を用いることを減らすことができる。 As a result, it is possible to reduce the use of the corresponding voice by mistake for a visitor who does not need to report absence and the use of the rejection voice by mistake for a visitor who needs to report absence.

実施の形態１に係るインターホンシステムのブロック図である。1 is a block diagram of an intercom system according to Embodiment 1. FIG. 実施の形態１に係る子機のブロック図である。3 is a block diagram of a child device according to Embodiment 1. FIG. 実施の形態１に係る表示操作端末のブロック図である。3 is a block diagram of a display operation terminal according to Embodiment 1. FIG. 実施の形態１に係る制御装置のブロック図である。2 is a block diagram of a control device according to Embodiment 1. FIG. 実施の形態１に係るインターホンシステムの対応関係を表す図である。It is a figure showing the correspondence of the intercom system which concerns on Embodiment 1. FIG. 実施の形態１に係るインターホンシステムの対応関係を表す図である。It is a figure showing the correspondence of the intercom system which concerns on Embodiment 1. FIG. 実施の形態１に係るインターホンシステムの対応関係を表す図である。It is a figure showing the correspondence of the intercom system which concerns on Embodiment 1. FIG. 実施の形態１に係るインターホンシステムの対応関係を表す図である。It is a figure showing the correspondence of the intercom system which concerns on Embodiment 1. FIG. 実施の形態１に係るインターホンシステムの対応関係を表す図である。It is a figure showing the correspondence of the intercom system which concerns on Embodiment 1. FIG. 実施の形態１に係るインターホンシステムの動作を示すフローチャートである。3 is a flowchart showing an operation of the intercom system according to Embodiment 1. 実施の形態２に係るインターホンシステムの動作を示すフローチャートである。6 is a flowchart showing an operation of the intercom system according to the second embodiment. 実施の形態２に係るインターホンシステムの動作を示すフローチャートである。6 is a flowchart showing an operation of the intercom system according to the second embodiment.

Explanation of symbols

１子機
２表示操作端末
３制御装置
１１ボタン
１２マイク
１３スピーカ
１４カメラ
１５制御部
１６通信部
２１操作部
２２マイク
２３スピーカ
２４表示部
２５制御部
２６通信部
３１記憶部
３２音声解析部
３３制御部
４０判定用のデータ
４１キーワードのデータ
４２声紋のデータ
５０応答音声のデータ
５１対応音声のデータ
５２拒否音声のデータ
６０通知音のデータ
６１対応通知音のデータ
６２拒否通知音のデータ
７０質問音声のデータ
８０来訪者のデータ
８１音声のデータ
８２画像のデータ
８３声紋のデータ
８４来訪回数のデータ
DESCRIPTION OF SYMBOLS 1 Child machine 2 Display operation terminal 3 Control apparatus 11 Button 12 Microphone 13 Speaker 14 Camera 15 Control part 16 Communication part 21 Operation part 22 Microphone 23 Speaker 24 Display part 25 Control part 26 Communication part 31 Memory | storage part 32 Audio | voice analysis part 33 Control part 40 Judgment Data 41 Keyword Data 42 Voiceprint Data 50 Response Voice Data 51 Corresponding Voice Data 52 Rejected Voice Data 60 Notification Sound Data 61 Corresponding Notification Sound Data 62 Rejection Notification Sound Data 70 Question Voice Data 80 Visitor data 81 Voice data 82 Image data 83 Voiceprint data 84 Number of visits data

Claims

An audio acquisition unit for acquiring external audio as an audio signal;
A storage unit for storing data for determining the voice and a plurality of predetermined response voice data;
A voice analysis unit that analyzes the voice signal and determines whether or not the data matches the determination data stored in the storage unit;
In accordance with the determination result of the voice analysis unit, a control unit that controls voice output to the outside based on the data of the one predetermined response voice in the storage unit,
Intercom system equipped with.

The determination data is
Including keyword data contained in the voice,
The intercom system according to claim 1.

The determination data is
Including voiceprint data of the voice,
The intercom system according to claim 1.

The determination data is
Including keyword data included in the voice and voiceprint data included in the voice;
The intercom system according to claim 1.

Further equipped with a visit notification means,
The storage unit
Further storing data of a first question voice predetermined to urge a visitor to speak,
The controller is
According to the operation of the visit notification means, based on the predetermined first question voice data in the storage unit, to control the voice output to the outside.
The intercom system according to any one of claims 1 to 4.

The response voice data is:
Including at least one of rejected voice data used for visitors who do not need to handle and supported voice data used for visitors who need to handle
The intercom system according to any one of claims 1 to 5.

The response voice data is:
Including predetermined second question voice data to prompt the visitor to speak,
The intercom system according to claim 6.

At least one of the question voice data and the response voice data is:
Including user voice data,
The intercom system according to claim 6 or 7.

The user's voice data is
Including data converted from the voice of the user,
The intercom system according to claim 8.

A determination notification means for notifying a user of the determination result of the voice analysis unit;
The intercom system according to any one of claims 6 to 9.

The storage unit stores data of the number of times visited in a predetermined period for each of the voiceprint data, and the control unit accumulates the number of times when the data matches the voiceprint data,
The intercom system according to claim 3 or 4.

The image acquisition unit further acquires an external image as an image signal, and when the voice analysis unit determines that the voice print data is inconsistent by the voice analysis unit, the voice print data and the voice acquisition unit The interphone system according to claim 3 or 4, wherein the acquired audio signal data and the image signal data acquired from the image acquisition unit are stored in the storage unit.

The image acquisition unit further acquires an external image as an image signal, and the control unit controls the audio output based on the corresponding audio data, and the audio signal data acquired from the audio acquisition unit And storing the image signal data acquired from the image acquisition unit in the storage unit,
The intercom system according to claim 6.