JP2023123165A

JP2023123165A - intercom device

Info

Publication number: JP2023123165A
Application number: JP2022027069A
Authority: JP
Inventors: 華妃山下; Haruhi Yamashita; 沙也加齋藤; Sayaka Saito; 太貴浅野; Taiki Asano; 友喜桜木; Tomoki Sakuragi; 大樹森; Daiki Mori; 紀行 ▲高▼橋; Noriyuki Takahashi
Original assignee: JVCKenwood Corp
Current assignee: JVCKenwood Corp
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2023-09-05

Abstract

To provide an intercom device that can ensure the privacy and security of residents.SOLUTION: An intercom device 1 includes an image acquisition unit 32 that acquires a captured image captured by an imaging unit 11 that captures an image of a visitor, a determination unit 33 that determines at least one of whether the visitor has been recognized and whether the visitor is a registrant who has been registered in advance, on the basis of the captured image acquired by the image acquisition unit 32, and a response voice selection unit 34 that selects a response voice to be output from a slave unit side audio output unit 14 of a slave unit 10 according to a result of the determination from the determination unit 33, and outputs the selected response voice to the slave unit side audio output unit 14.SELECTED DRAWING: Figure 1

Description

本開示は、インターホン装置に関する。 The present disclosure relates to an intercom device.

住居外に設置されたインターホン子機を利用して居住者を呼び出すための呼出操作を行う訪問者をカメラにより撮像した撮像画像に基づいて訪問者の認証を行うインターホン装置が知られている。 2. Description of the Related Art There is known an intercom device that authenticates a visitor based on a captured image captured by a camera of a visitor performing a calling operation for calling a resident using an intercom handset installed outside the residence.

特許文献１には、呼出操作を受けて居住者の応答が無く一定時間が経過したら、親機ＣＰＵは特別動作モードに移行して玄関子機に設けられたカメラの撮像画像により訪問者の認証を行い、訪問者を認証したら玄関子機とハンディ親機との間の通話路を形成し、その後応答無く更に一定時間が経過したら電気錠を解錠するインターホン装置が開示されている。特許文献１に記載の技術によれば、訪問者が福祉関係者の場合、玄関子機から呼出操作をした際に居住者の応答操作が無くても通話を可能とし、更に玄関の電気錠の解錠も可能として、居住者の家族等の対応を待つことなく素早い対応を可能とする。 In Patent Document 1, after a certain period of time has passed without a response from the resident after receiving a calling operation, the CPU of the base unit shifts to a special operation mode and authenticates the visitor using the captured image of the camera provided in the entrance cordless handset. and, after authenticating the visitor, forming a communication path between the entrance slave unit and the handy base unit, and then unlocking the electric lock after a certain period of time without any response. According to the technique described in Patent Document 1, if the visitor is a welfare worker, when the call is made from the entrance handset, the call can be made even without the resident's answering operation, and the electric lock of the entrance is enabled. Unlocking is also possible, enabling quick response without waiting for the response of the resident's family or the like.

また、特許文献２には、インターホンシステムは、第１管理領域に設けられたインターホン親機と、第２管理領域に設けられたインターホン子機と、信号出力部とを備えるインターホンシステム、制御方法、及びプログラムが開示されている。信号出力部は、第１管理領域と第２管理領域とを隔てる開閉部材を解錠するための解錠信号を出力する。インターホン親機は、親機側表示部と録画部との少なくとも一方を有する。親機側表示部は、第１映像と第２映像との少なくとも一方を表示し、録画部は、第１映像と第２映像との少なくとも一方を録画する。インターホン親機は、信号出力部が解錠信号を出力した場合に、延長期間において親機側表示部による表示と録画部による録画との少なくとも一方を行う。特許文献２に記載の技術によれば、第２管理領域の監視を強化することができる。 Further, in Patent Document 2, an interphone system includes an interphone base unit provided in a first management area, an interphone slave unit provided in a second management area, and a signal output unit, a control method, and a program are disclosed. The signal output unit outputs an unlock signal for unlocking the opening/closing member that separates the first management area and the second management area. The interphone base unit has at least one of a base unit side display unit and a recording unit. The master-side display unit displays at least one of the first image and the second image, and the recording unit records at least one of the first image and the second image. When the signal output unit outputs the unlocking signal, the intercom master unit performs at least one of display by the master unit side display unit and recording by the recording unit during the extension period. According to the technology described in Patent Literature 2, it is possible to strengthen the monitoring of the second management area.

特開２０１６－３２２１２号公報JP 2016-32212 A

特開２０１９－１４０６１８号公報JP 2019-140618 A

しかしながら、特許文献１に記載の技術では、訪問者が予め登録されている場合に、居住者の応答がなくても通話を可能とするものであるため、訪問者が不審者である場合の対応について開示されていない。 However, in the technique described in Patent Document 1, when a visitor is registered in advance, it is possible to make a call even if the resident does not respond. has not been disclosed.

特許文献２に記載の技術では、顔認証結果により、撮像した映像を録画することが記載されているが、訪問者が居住者の知り合いではない人物である場合や顔が映像に写らないような行動をとる不審者である場合の対応について開示されていない。 In the technique described in Patent Document 2, it is described that the captured video is recorded based on the result of face authentication. It does not disclose how to deal with a suspicious person who takes action.

したがって、これらの技術では、訪問者が予め登録されていない人物（知り合いではない人物や不審者）である場合に、居住者が発した音声を聞かれることにより、性別等についての居住者の個人情報を推測されてしまう虞や、居住者が在宅していることを知られてしまう虞があるというセキュリティ上の問題があった。 Therefore, with these technologies, if the visitor is a person who is not registered in advance (a person who is not an acquaintance or a suspicious person), by listening to the voice uttered by the resident, it is possible to identify the resident's individual such as gender. There is a security problem that the information may be guessed and the fact that the resident is at home may be known.

本開示は、このような問題を解決するためになされたものであり、居住者のプライバシーやセキュリティを確保可能なインターホン装置を提供することを目的とするものである。 The present disclosure has been made to solve such problems, and aims to provide an intercom device capable of ensuring the privacy and security of residents.

一実施の形態にかかるインターホン装置は、訪問者が利用するインターホン子機を備えたインターホン装置であって、訪問者を撮像する撮像部が撮像した撮像画像を取得する画像取得部と、画像取得部により取得された撮像画像に基づいて、訪問者を認識できたか、及び、訪問者が予め登録された登録者であるか、の少なくとも一方を判定する判定部と、判定部の判定結果に応じてインターホン子機の子機側音声出力部から出力する応答音声を選択し、選択した応答音声を子機側音声出力部に出力する応答音声選択部と、を有する。 An intercom device according to one embodiment is an intercom device that includes an intercom slave unit used by a visitor, and includes an image acquisition unit that acquires an image captured by an imaging unit that captures an image of the visitor, and an image acquisition unit. a determination unit that determines at least one of whether the visitor has been recognized and whether the visitor is a pre-registered registrant based on the captured image acquired by; a response voice selector that selects a response voice to be output from the slave unit side voice output unit of the intercom slave unit and outputs the selected response voice to the slave unit side voice output unit.

本開示により、居住者のプライバシーやセキュリティを確保可能なインターホン装置を提供することができる。 According to the present disclosure, it is possible to provide an intercom device that can ensure the privacy and security of residents.

実施の形態１にかかるインターホン装置の構成を示すブロック図である。1 is a block diagram showing a configuration of an intercom device according to Embodiment 1; FIG. 図１に示すインターホン装置の動作を説明するフローチャートである。2 is a flowchart for explaining the operation of the intercom device shown in FIG. 1;

実施の形態１
以下、図面を参照して本実施の形態について説明する。また、説明を明確にするため、以下の記載及び図面は、適宜、簡略化されている。図中に示したものは、全体の一部であり、図示しないその他の構成が実際には多く含まれる。さらに、以下の説明において同一又は同等の要素には、同一の符号を付し、重複する説明は省略する。 Embodiment 1
Hereinafter, this embodiment will be described with reference to the drawings. Also, for clarity of explanation, the following description and drawings are simplified as appropriate. What is shown in the drawing is a part of the whole, and actually includes many other configurations not shown. Furthermore, in the following description, the same or equivalent elements are denoted by the same reference numerals, and overlapping descriptions are omitted.

本実施形態にかかるインターホン装置の好適な実施形態の一つとして、戸建住宅に適用されるインターホン装置１に具体化して説明する。なお、インターホン装置１は、戸建住宅以外に、マンション等の集合住宅に適用してもよく、非住宅施設等に適用してもよい。 As one preferred embodiment of the intercom device according to the present embodiment, an intercom device 1 applied to a detached house will be described. In addition, the intercom device 1 may be applied to collective housing such as condominiums, and may be applied to non-residential facilities, etc., in addition to detached houses.

まず、図１を参照して、インターホン装置１の構成を説明する。図１は、実施の形態１にかかるインターホン装置の構成を示すブロック図である。図１に示すように、インターホン装置１は、住居内の居室等に設置されるインターホン親機（以下、親機２０）と、住居外の玄関等に設置されるインターホン子機（以下、子機１０）と、を有している。親機２０は、子機１０からの呼び出しに居住者が応答するため等に利用するユーザインタフェースである。子機１０は、訪問者が居住者を呼び出して通話するため等に利用するユーザインタフェースである。 First, the configuration of the intercom device 1 will be described with reference to FIG. FIG. 1 is a block diagram showing the configuration of an intercom device according to a first embodiment. As shown in FIG. 1, the intercom device 1 includes an intercom master unit (hereinafter referred to as master unit 20) installed in a living room or the like in a dwelling and an intercom slave unit (hereinafter referred to as slave unit 20) installed in an entrance or the like outside the dwelling. 10) and The parent device 20 is a user interface used by the resident to respond to a call from the child device 10 or the like. The handset 10 is a user interface used by a visitor to call and talk to a resident.

親機２０と子機１０とは、互いに通信可能に構成されている。また、インターホン装置１は、１又は複数の子機１０を備えていてもよい。インターホン装置１では、訪問者が子機１０の呼出ボタンを押下した後、居住者が親機２０の通話開始ボタンを押下して呼び出しに応答すると、通話路が形成されて親機２０と子機１０との間で通話が可能となる。 Parent device 20 and child device 10 are configured to be able to communicate with each other. Also, the intercom device 1 may include one or a plurality of slave units 10 . In the intercom device 1, after the visitor presses the call button of the handset 10, when the resident presses the call start button of the master device 20 to answer the call, a call path is formed to connect the master device 20 and the slave device. It is possible to talk with 10.

本実施形態にかかるインターホン装置１は、画像取得部３２、判定部３３、及び応答音声選択部３４を含む応答音声制御装置３０を有する。画像取得部３２は、訪問者を撮像する撮像部１１が撮像した撮像画像を取得する。判定部３３は、画像取得部３２により取得された撮像画像に基づいて、訪問者を認識できたか、及び、訪問者が予め登録された登録者であるか、の少なくとも一方を判定する。応答音声選択部３４は、判定部３３の判定結果に応じて子機１０の子機側音声出力部１４から出力する応答音声を選択し、選択した応答音声を子機側音声出力部１４に出力する。 The intercom device 1 according to the present embodiment has a response voice control device 30 including an image acquisition section 32 , a determination section 33 and a response voice selection section 34 . The image acquisition unit 32 acquires a captured image captured by the imaging unit 11 that captures an image of the visitor. Based on the captured image acquired by the image acquisition unit 32, the determination unit 33 determines at least one of whether the visitor has been recognized and whether the visitor is a registrant registered in advance. The response voice selection unit 34 selects a response voice to be output from the slave device side voice output unit 14 of the slave device 10 according to the determination result of the determination unit 33, and outputs the selected response voice to the slave device side voice output unit 14. do.

以下、応答音声制御装置３０の機能が親機２０と一体に構成されている場合を例示してインターホン装置１の詳細について説明する。ただし、応答音声制御装置３０の機能は親機２０及び子機１０のそれぞれと独立した装置に設けられてもよく、複数の装置に分散して設けられてもよい。応答音声制御装置３０の少なくとも一部の機能は、例えばクラウド（クラウドコンピューティング）等によって実現されてもよい。 Details of the intercom device 1 will be described below by exemplifying a case where the function of the response voice control device 30 is configured integrally with the master device 20 . However, the function of the response voice control device 30 may be provided in a device independent of each of the parent device 20 and the child device 10, or may be distributed and provided in a plurality of devices. At least part of the functions of the response voice control device 30 may be implemented by, for example, a cloud (cloud computing) or the like.

（子機１０）
子機１０は、撮像部１１、子機側操作部１２、子機側収音部１３、子機側音声出力部１４、子機側通信部１５、及び子機側制御部１６を有している。 (Handset 10)
The child device 10 includes an imaging unit 11, a child device side operation unit 12, a child device side sound pickup unit 13, a child device side audio output unit 14, a child device side communication unit 15, and a child device side control unit 16. there is

撮像部１１は、撮像エリアを撮像するためのカメラである。撮像部１１は、ＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅｓ）、又はＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌ－ＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）等の撮像素子及びレンズ等の光学系を有している。撮像部１１は、子機１０を操作する訪問者の顔を撮像可能な位置に設けられている。撮像部１１は、呼出中の撮像画像を生成する。呼出中とは、呼び出しの開始から終了までの期間であって、例えば訪問者より呼出ボタンが押下されてから所定の期間が経過するまでの期間に設定される。撮像部１１は、撮像した撮像画像を画像信号として親機２０又は応答音声制御装置３０へ子機側通信部１５を介して送信する。 The imaging unit 11 is a camera for imaging an imaging area. The imaging unit 11 has an imaging element such as a CCD (Charge Coupled Devices) or a CMOS (Complementary Metal-Oxide Semiconductor) and an optical system such as a lens. The imaging unit 11 is provided at a position where the face of the visitor who operates the handset 10 can be imaged. The imaging unit 11 generates a captured image of calling. "During a call" is a period from the start to the end of a call, and is set, for example, to a period from when the call button is pressed by the visitor until a predetermined period elapses. The imaging unit 11 transmits the captured image as an image signal to the parent device 20 or the response voice control device 30 via the child device side communication unit 15 .

また、本実施形態の撮像部１１は、動画を撮像するカメラであるが、静止画を撮像するカメラであってもよい。さらに、本実施形態の撮像部１１はカラー画像を撮像するカメラであるが、モノクローム画像を撮像するカメラであってもよい。なお、撮像部１１は、子機１０の筐体とは別体に設けられていてもよい。 Further, although the imaging unit 11 of the present embodiment is a camera that captures moving images, it may be a camera that captures still images. Furthermore, although the imaging unit 11 of the present embodiment is a camera that captures color images, it may be a camera that captures monochrome images. Note that the imaging unit 11 may be provided separately from the housing of the child device 10 .

子機側操作部１２は、訪問者の入力操作を受け付ける。子機側操作部１２は、呼出ボタンを含むボタン等の入力装置を有している。子機側操作部１２は、訪問者は、子機側操作部１２に含まれる呼出ボタンを押下することにより、居住者を呼び出すことができる。子機１０は、訪問者による呼出ボタンの操作を受け付けると、住居内の居住者を呼び出すための呼出信号（制御信号）を子機側通信部１５から親機２０へ送信する。なお、子機側操作部１２は、タッチパネルディスプレイにより構成されていてもよい。 The handset-side operation unit 12 accepts the visitor's input operation. The handset side operation unit 12 has input devices such as buttons including a call button. The handset-side operation unit 12 allows the visitor to call the resident by pressing a call button included in the handset-side operation unit 12 . When accepting the operation of the call button by the visitor, the handset 10 transmits a call signal (control signal) for calling the resident in the residence from the handset side communication unit 15 to the master device 20 . Note that the handset-side operation unit 12 may be configured by a touch panel display.

子機側収音部１３は、マイクロホンを含む。子機側収音部１３は、マイクロホンにより訪問者が発した音声を収音した音声を親機２０に子機側通信部１５を介して送信する。 The slave-side sound pickup unit 13 includes a microphone. The child device-side sound pickup unit 13 transmits the sound obtained by collecting the voice uttered by the visitor with a microphone to the parent device 20 via the child device-side communication unit 15 .

子機側音声出力部１４は、子機側通信部１５を介して親機２０から受信した音声信号又は応答音声制御装置３０の出力音声に基づく応答音声をスピーカから出力する。子機側音声出力部１４は、入力された音声信号をスピーカから出力する。 The handset-side audio output unit 14 outputs a response sound based on the sound signal received from the base unit 20 via the handset-side communication unit 15 or the output sound of the response sound control device 30 from the speaker. The child device side audio output unit 14 outputs the input audio signal from the speaker.

子機側通信部１５は、親機２０又は応答音声制御装置３０との間で各種信号（映像信号、制御信号、音声信号等）を授受するために用いられる。子機側通信部１５は、有線又は無線によって親機２０又は応答音声制御装置３０と電気的に接続され、親機２０又は応答音声制御装置３０との間で双方向に通信可能に構成されている。 The child device side communication unit 15 is used to exchange various signals (video signal, control signal, audio signal, etc.) with the parent device 20 or the response voice control device 30 . The child device side communication unit 15 is electrically connected to the master device 20 or the response voice control device 30 by wire or wirelessly, and is configured to be able to communicate bidirectionally with the master device 20 or the response voice control device 30. there is

子機側制御部１６は、例えばプロセッサ及びメモリを主構成とするマイクロコンピュータで構成されている。そして、プロセッサがメモリに格納されているプログラムを実行することにより、マイクロコンピュータが子機側制御部１６として機能する。これにより、撮像部１１、子機側操作部１２、子機側収音部１３、子機側音声出力部１４、及び子機側通信部１５を制御する機能が実現される。プロセッサが実行するプログラムは、ここではマイクロコンピュータのメモリに予め記録されているが、メモリカード等の非一時的な記録媒体に記録されて提供されてもよいし、インターネット等の電気通信回線を通じて提供されてもよい。 The handset-side controller 16 is composed of, for example, a microcomputer mainly composed of a processor and a memory. Then, the microcomputer functions as the child device side control section 16 by executing the program stored in the memory by the processor. Thereby, a function of controlling the imaging unit 11, the slave-side operation unit 12, the slave-side sound pickup unit 13, the slave-side audio output unit 14, and the slave-side communication unit 15 is realized. The program executed by the processor is recorded in advance in the memory of the microcomputer here, but may be recorded in a non-temporary recording medium such as a memory card and provided, or may be provided through an electric communication line such as the Internet. may be

子機１０は、訪問者による呼出ボタンの操作を受け付けると、撮像部１１による撮像を開始するとともに、撮像部１１が撮像した撮像画像を親機２０又は応答音声制御装置３０に送信する。また、居住者から通話のための操作（通話開始ボタンの操作）を受け付けた場合、子機１０は、居住者が発した音声を親機２０から受信して出力することにより、訪問者に対して居住者が発した音声を伝達する。また、居住者から通話のための操作を受け付けない場合、子機１０は、応答音声制御装置３０において選択された応答音声を出力することにより、訪問者に対して居住者側の応答音声（例えば音声メッセージ）を伝達する。 Upon receiving the operation of the call button by the visitor, slave device 10 starts imaging by imaging unit 11 and transmits the captured image captured by imaging unit 11 to master device 20 or response voice control device 30 . Further, when receiving an operation for a call (operation of a call start button) from the resident, the child device 10 receives the voice uttered by the resident from the master device 20 and outputs it to the visitor. to transmit the voices uttered by the residents. Further, when not accepting an operation for making a call from the resident, the handset 10 outputs the response voice selected by the response voice control device 30 to the visitor by outputting the response voice of the resident (for example, voice messages).

（親機２０）
親機２０は、表示部２１、親機側操作部２２、親機側収音部２３、親機側音声出力部２４、親機側通信部２５、記憶部２６、及び親機側制御部２７を有している。これらに加えて、親機２０は、応答音声制御装置３０をさらに有している。 (Master device 20)
The base unit 20 includes a display unit 21 , a base unit side operation unit 22 , a base unit side sound pickup unit 23 , a base unit side audio output unit 24 , a base unit side communication unit 25 , a storage unit 26 , and a base unit side control unit 27 . have. In addition to these, the master device 20 further has a response voice control device 30 .

表示部２１は、子機１０から受信した撮像画像、保存メモリに保存された録画画像等の各種画像を表示するＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）等のディスプレイである。表示部２１は、各種画像の他に各種操作メニュー等を表示してもよい。 The display unit 21 is a display such as an LCD (Liquid Crystal Display) that displays various images such as captured images received from the child device 10 and recorded images stored in the storage memory. The display unit 21 may display various operation menus and the like in addition to various images.

親機側操作部２２は、通話開始ボタン及び通話終了ボタン等のボタン、テンキー等の入力装置を有し、居住者の入力操作を受け付ける。居住者は、親機側操作部２２を介して、種々の設定及び操作を親機２０に入力することができる。例えば、居住者は、親機側操作部２２を通じて住居内における居住者の在宅又は不在を示す在宅情報を親機２０に入力する。具体的には、居住者は、親機側操作部２２に含まれる在宅ボタン又は不在ボタンを押下することにより、在宅情報を親機２０に入力することができる。 The main unit side operation unit 22 has buttons such as a call start button and a call end button, and an input device such as a numeric keypad, and receives input operations of the resident. The resident can input various settings and operations to the master device 20 via the master device-side operation unit 22 . For example, the resident inputs to the master device 20 through the master device side operation unit 22 home information indicating whether the resident is at home or not in the house. Specifically, the resident can input the stay-at-home information to the master device 20 by pressing an at-home button or an absent button included in the master device-side operation unit 22 .

また、居住者は、親機側操作部２２に含まれる通話開始ボタンを押下することにより、訪問者との通話が可能となる。親機２０は、居住者による通話開始ボタンの操作を受け付けると、親機２０と子機１０との間の通話路を形成し、親機側収音部２３が収音した居住者が発した音声の音声信号を、子機１０に親機側通信部２５を介して送信する。親機２０から音声信号を受信した子機１０は、受信した音声信号に基づく居住者が発した音声を子機側音声出力部１４から出力する。 Also, the resident can make a call with the visitor by pressing a call start button included in the base unit side operation unit 22 . When receiving the operation of the call start button by the resident, the master device 20 forms a communication path between the master device 20 and the slave device 10, and the resident whose sound is collected by the master device side sound collection unit 23 emits. An audio signal of voice is transmitted to the slave device 10 via the master device side communication unit 25 . The child device 10 that has received the audio signal from the parent device 20 outputs the voice uttered by the resident based on the received audio signal from the child device side audio output section 14 .

また、居住者は、通話終了ボタンを押下することにより、訪問者との通話を終了（親機２０と子機１０との間の通話路を遮断）することができる。親機２０は、居住者による通話終了ボタンの操作を受け付けると、表示部２１による撮像画像の表示を終了する。これと同時に、子機１０は、撮像部１１による撮像を停止する。 Also, the resident can end the call with the visitor (cut off the call path between the master device 20 and the slave device 10) by pressing the call termination button. Master device 20 terminates the display of the captured image by display unit 21 upon receiving the operation of the call end button by the resident. At the same time, the child device 10 stops the imaging by the imaging unit 11 .

なお、表示部２１及び親機側操作部２２は、タッチパネルディスプレイにより一体に構成されてもよい。 Note that the display unit 21 and the parent device side operation unit 22 may be integrally configured by a touch panel display.

親機側収音部２３は、マイクロホンを含む。親機側収音部２３は、マイクロホンにより居住者が発した音声を収音する。 Base-side sound pickup unit 23 includes a microphone. The master-side sound pickup unit 23 picks up the voice uttered by the resident with a microphone.

親機側音声出力部２４は、親機側通信部２５を介して子機１０から受信した音声信号に基づく訪問者が発した音声をスピーカから出力する。 Base-side audio output unit 24 outputs, from a speaker, the visitor's voice based on the audio signal received from child device 10 via base-side communication unit 25 .

親機側通信部２５は、子機１０又は応答音声制御装置３０との間で各種信号（制御信号、音声信号、画像信号等）を授受するために用いられる。親機側通信部２５は、有線又は無線によって子機１０及び応答音声制御装置３０と電気的に接続され、子機１０との間で双方向に通信可能に構成されている。 The parent device side communication unit 25 is used to exchange various signals (control signal, audio signal, image signal, etc.) with the child device 10 or the response voice control device 30 . The parent device side communication unit 25 is electrically connected to the child device 10 and the response voice control device 30 by wire or wirelessly, and is configured to be able to communicate bidirectionally with the child device 10 .

親機側制御部２７は、例えばプロセッサ及びメモリを主構成とするマイクロコンピュータで構成されている。そして、プロセッサがメモリに格納されているプログラムを実行することにより、マイクロコンピュータが親機側制御部２７として機能する。これにより、表示部２１、親機側操作部２２、親機側収音部２３、親機側音声出力部２４、及び親機側通信部２５を制御する機能が実現される。プロセッサが実行するプログラムは、ここではマイクロコンピュータのメモリに予め記録されているが、メモリカード等の非一時的な記録媒体に記録されて提供されてもよいし、インターネット等の電気通信回線を通じて提供されてもよい。 The master-side controller 27 is composed of, for example, a microcomputer mainly composed of a processor and a memory. The microcomputer functions as the master-side controller 27 by executing the program stored in the memory by the processor. Thereby, the function of controlling the display unit 21, the master-side operation unit 22, the master-side sound pickup unit 23, the master-side audio output unit 24, and the master-side communication unit 25 is realized. The program executed by the processor is recorded in advance in the memory of the microcomputer here, but may be recorded in a non-temporary recording medium such as a memory card and provided, or may be provided through an electric communication line such as the Internet. may be

インターホン装置１では、居住者が親機２０の通話開始ボタンを押下すると、親機２０と子機１０との間の通話路が形成される。これにより、親機２０と子機１０との間で通話が開始される。インターホン装置１は、親機２０と子機１０との間で通話路が形成された状態である場合には、訪問者が発した音声の音声信号及び居住者が発した音声の音声信号を送受信する。 In the intercom device 1 , when the resident presses the call start button of the master device 20 , a communication path is formed between the master device 20 and the slave device 10 . Thereby, a call is started between the master device 20 and the slave device 10 . When a communication path is formed between the base unit 20 and the slave unit 10, the intercom device 1 transmits and receives an audio signal of the voice uttered by the visitor and an audio signal of the voice uttered by the resident. do.

応答音声制御装置３０は、在宅情報取得部３１、画像取得部３２、判定部３３、応答音声選択部３４、及び録画制御部３５を有する。応答音声制御装置３０には、自動応答モードと通話応答モードとの２つの動作モードがあり、通常は自動応答モードに設定されており、通話開始ボタンが押下されると通話応答モードに切り替わるようになっている。 The response voice control device 30 has an at-home information acquisition unit 31 , an image acquisition unit 32 , a judgment unit 33 , a response voice selection unit 34 and a recording control unit 35 . The answering voice control device 30 has two operation modes, an automatic answering mode and a call answering mode. Normally, the automatic answering mode is set, and when the call start button is pressed, the mode is switched to the call answering mode. It's becoming

自動応答モードの応答音声制御装置３０は、子機１０から受信した撮像画像に基づいて訪問者の認識及び認証を行なった後、認識及び認証の少なくとも一方の結果に応じて選択した応答音声を子機１０に送信する。一方、通話応答モードに移行した応答音声制御装置３０は、処理を終了する。 The response voice control device 30 in the automatic response mode recognizes and authenticates the visitor based on the captured image received from the child device 10, and then transmits the response voice selected according to the result of at least one of the recognition and authentication. to the machine 10. On the other hand, the response voice control device 30 that has shifted to the call response mode ends the processing.

在宅情報取得部３１は、住居内における居住者の在宅又は不在を示す在宅情報を取得する。例えば、在宅情報取得部３１は、親機側操作部２２を介して居住者が予め入力した在宅情報を取得する。 The stay-at-home information acquisition unit 31 acquires stay-at-home information indicating whether a resident is at home or not in the residence. For example, the stay-at-home information acquisition unit 31 acquires the stay-at-home information previously input by the resident via the base unit-side operation unit 22 .

又は、応答音声制御装置３０が予め登録された携帯端末と無線によって通信が可能に構成されている場合、在宅情報取得部３１は、ネットワーク等を介して居住者が携行する携帯端末の現在位置情報を取得してもよい。携帯端末から現在位置情報を取得する場合、在宅情報取得部３１は、携帯端末の現在位置情報が、居住者住所から所定の距離範囲以内であることを示す場合に、居住者が在宅であると判定し、携帯端末の現在位置情報が、居住者住所から所定の距離範囲以内でないことを示す場合に、居住者が不在であると判定するように構成するとよい。 Alternatively, if the response voice control device 30 is configured to be able to communicate wirelessly with a pre-registered mobile terminal, the home information acquisition unit 31 obtains the current location information of the mobile terminal carried by the resident via a network or the like. may be obtained. When acquiring the current location information from the mobile terminal, the presence information acquisition unit 31 determines that the resident is at home when the current location information of the mobile terminal indicates that the resident is within a predetermined distance range from the address of the resident. If the current location information of the portable terminal indicates that the mobile terminal is not within a predetermined distance range from the resident address, it may be determined that the resident is absent.

携帯端末としては、例えば、ノートＰＣ、タブレット端末、スマートフォン等の電子機器が挙げられる。在宅情報取得部３１は、居住者が携行する携帯端末から現在位置情報を取得する。現在位置情報は、例えば携帯端末に内蔵される位置情報取得機能により取得される携帯端末の位置情報である。位置情報取得機能としては、ＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）等のＧＮＳＳ（ＧｌｏｂａｌＮａｖｉｇａｔｉｏｎＳａｔｅｌｌｉｔｅＳｙｓｔｅｍ）が挙げられる。位置情報は、携帯端末が通信するアクセスポイント等の位置情報であってもよい。 Portable terminals include, for example, electronic devices such as notebook PCs, tablet terminals, and smart phones. The at-home information acquisition unit 31 acquires current location information from a mobile terminal carried by the resident. The current location information is, for example, location information of the mobile terminal acquired by a location information acquisition function built into the mobile terminal. The position information acquisition function includes GNSS (Global Navigation Satellite System) such as GPS (Global Positioning System). The location information may be location information such as an access point with which the mobile terminal communicates.

また、応答音声制御装置３０が玄関ドアに設けられている電気錠と有線又は無線によって通信が可能に構成されている場合、在宅情報取得部３１は、電気錠の状態に関する情報を在宅情報として電気錠から取得してもよい。なお、玄関ドアは、インターホン装置１が設置される住居の玄関ドアである。電気錠から在宅情報を取得する場合、在宅情報取得部３１は、電気錠が住居の外側から施錠されてから住居の外側から解錠されるまでの期間中は居住者の不在を示す在宅情報を取得するように構成するとよい。この場合、当該期間外は居住者の在宅を示す在宅情報を取得する。 In addition, when the response voice control device 30 is configured to be able to communicate with an electric lock provided at the entrance door by wire or wirelessly, the presence-at-home information acquisition unit 31 obtains information regarding the state of the electric lock as presence-at-home information. Can be obtained from tablets. The entrance door is the entrance door of the residence where the intercom device 1 is installed. When the stay-at-home information is acquired from the electric lock, the stay-at-home information acquiring unit 31 acquires the stay-at-home information indicating the absence of the resident during the period from when the electric lock is locked from the outside of the residence to when it is unlocked from the outside of the residence. It should be configured to obtain In this case, stay-at-home information indicating that the resident is at home is acquired outside the period.

画像取得部３２は、子機１０から受信した撮像画像を取得する。 The image acquisition unit 32 acquires the captured image received from the child device 10 .

判定部３３は、撮像画像に基づいて訪問者の認識を行う認識部４１と、撮像画像に基づいて訪問者の認証を行う認証部４２と、を有する。例えば、認識部４１は、訪問者の顔の特徴部分である目、鼻、口等の特徴量を撮像画像から抽出し、これらの特徴量が抽出された領域を人物の顔画像として認識する顔認識を行い、訪問者の顔を認識できたか否か、言い換えると、訪問者を人物であると認識できたか否かの判定を行う。認識部４１は、人物の顔画像が認識可能である場合に認識が成立したと判定する。一方、認識部４１は、訪問者が撮像エリア外にいたり顔を隠していたりすること等によって人物の顔画像が認識不能である場合や、撮像画像中に顔（人物）が認識不能である場合に、認識が不成立であると判定する。 The determination unit 33 has a recognition unit 41 that recognizes the visitor based on the captured image, and an authentication unit 42 that authenticates the visitor based on the captured image. For example, the recognizing unit 41 extracts the feature amounts of the eyes, nose, mouth, etc., which are the feature parts of the visitor's face, from the captured image, and recognizes the region where these feature amounts are extracted as the face image of the person. Recognition is performed, and it is determined whether or not the visitor's face has been recognized, in other words, whether or not the visitor has been recognized as a person. The recognition unit 41 determines that the recognition is established when the face image of the person can be recognized. On the other hand, the recognizing unit 41 recognizes when the face image of the person cannot be recognized because the visitor is outside the imaging area or the face is hidden, or when the face (person) cannot be recognized in the captured image. Then, it is determined that the recognition is not established.

認証部４２は、認識部４１により抽出された顔画像と記憶部２６から読み出した登録者データとを比較して、訪問者の顔認証を行う。認証部４２は、登録者データの中に撮像画像から抽出した顔画像と一致する顔画像が含まれていれば、認証が成立したと判定する。一方、認証部４２は、登録者データの中に撮像画像から抽出した顔画像と一致する顔画像が含まれていなければ、認証が不成立であると判定する。 The authentication unit 42 compares the face image extracted by the recognition unit 41 with the registrant data read from the storage unit 26 to perform face authentication of the visitor. If the registrant data includes a face image that matches the face image extracted from the captured image, the authentication unit 42 determines that the authentication is successful. On the other hand, if the registrant data does not include a face image that matches the face image extracted from the captured image, the authentication unit 42 determines that the authentication has failed.

認証部４２は、登録者データとして後述する知り合いデータを用いて訪問者が居住者の知り合いであることを認証する。例えば認証部４２は、撮像画像と記憶部２６から読み出した居住者の知り合いデータとを比較して数値化した両者の一致度合いが所定の閾値を超える場合に、訪問者が居住者の知り合いであると認証することができる。認証部４２は、例えばニューラルネットワークによる機械学習によって学習された知り合い学習済モデルを用いて、撮像画像を知り合い学習済モデルに入力し、撮像画像が居住者の知り合いである度合いを数値として出力とするディープラーニング技術によって訪問者が居住者の知り合いであると認証してもよい。知り合い学習済モデルは、例えば、撮影画像と、その画像に知り合いの人物が含まれているか否かの教師データをセットにした入力データをニューラルネットワークに入力して機械学習した、学習済モデルである。このほか、公知の技術を用いて訪問者が居住者の知り合いであることの認証を行うことができる。 The authentication unit 42 authenticates that the visitor is an acquaintance of the resident by using acquaintance data described later as registrant data. For example, the authentication unit 42 compares the captured image with acquaintance data of the resident read out from the storage unit 26, and if the degree of matching between the two, which is quantified, exceeds a predetermined threshold, the visitor is an acquaintance of the resident. can be authenticated. The authentication unit 42 uses a trained acquaintance model learned by machine learning using a neural network, for example, inputs the captured image to the trained acquaintance model, and outputs the degree of acquaintance of the resident in the captured image as a numerical value. A visitor may be authenticated as an acquaintance of a resident by deep learning technology. An acquaintance trained model is, for example, a trained model that has been machine-learned by inputting input data, which is a set of a photographed image and teacher data indicating whether or not an acquaintance is included in the image, into a neural network. . In addition, known techniques can be used to authenticate that the visitor is an acquaintance of the resident.

また、認証部４２は、登録者データとして後述する特定業者データを用いて訪問者が配達業者等の特定業者であることを認証することもできる。例えば認証部４２は、撮像画像と記憶部２６から読み出した特定業者データとを比較して数値化した両者の一致度合いが所定の閾値を超える場合に、訪問者が特定業者であると認証することができる。認証部４２は、上述したニューラルネットワークによる機械学習によって学習された特定業者学習済モデルを用いたディープラーニング技術によって、訪問者が特定業者であると認証してもよい。 The authentication unit 42 can also authenticate that the visitor is a specific trader, such as a delivery company, by using specific trader data, which will be described later, as registrant data. For example, the authentication unit 42 authenticates that the visitor is the specific trader when the degree of matching between the captured image and the specific trader data read out from the storage unit 26 exceeds a predetermined threshold. can be done. The authentication unit 42 may authenticate that the visitor is the specific trader by deep learning technology using the specific trader trained model learned by machine learning using the neural network described above.

さらに、認証部４２は、登録者データとして後述する不審者データを用いて訪問者が不審者であることを認証することもできる。例えば認証部４２は、撮像画像と記憶部２６から読み出した不審者データとを比較して数値化した両者の一致度合いが所定の閾値を超える場合に、訪問者が不審者であると認証することができる。認証部４２は、上述したニューラルネットワークによる機械学習によって学習された不審者学習済モデルを用いたディープラーニング技術によって、訪問者が不審者であると認証してもよい。 Furthermore, the authentication unit 42 can also authenticate that the visitor is a suspicious person using suspicious person data, which will be described later, as registrant data. For example, the authentication unit 42 compares the captured image with the suspicious person data read from the storage unit 26 and authenticates the visitor as a suspicious person when the quantified degree of matching between the two exceeds a predetermined threshold. can be done. The authentication unit 42 may authenticate that the visitor is a suspicious person by deep learning technology using a suspicious person trained model learned by machine learning using the neural network described above.

そして、本実施形態の認証部４２は、予め登録された登録者の内容に応じて下記の４段階にレベル分けされたセキュリティレベルを設定することができる。 The authentication unit 42 of the present embodiment can set the security level divided into the following four levels according to the content of the registrant registered in advance.

レベル１：知り合い
レベル２：知り合い＋特定業者
レベル３：知り合い＋特定業者＋非不審者
レベル４：知り合い＋特定業者＋非不審者＋不審者 Level 1: acquaintance Level 2: acquaintance + specific trader Level 3: acquaintance + specific trader + non-suspicious person Level 4: acquaintance + specific trader + non-suspicious person + suspicious person

セキュリティレベルをレベル１に設定すると、認証部４２は、訪問者が知り合いである場合に、登録者であると判定する。 When the security level is set to level 1, the authentication unit 42 determines that the visitor is a registrant if the visitor is an acquaintance.

セキュリティレベルをレベル２に設定すると、認証部４２は、訪問者が知り合い又は特定業者である場合に、登録者であると判定する。 When the security level is set to level 2, the authentication unit 42 determines that the visitor is a registrant if the visitor is an acquaintance or a specific trader.

セキュリティレベルをレベル３に設定すると、認証部４２は、訪問者が知り合い、特定業者、又は非不審者である場合に、登録者であると判定する。 When the security level is set to level 3, the authentication unit 42 determines that the visitor is a registrant if the visitor is an acquaintance, a specific trader, or a non-suspicious person.

セキュリティレベルをレベル４に設定すると、認証部４２は、全ての訪問者（知り合い、特定業者、非不審者、及び不審者）に対して、認証を行わない。 When the security level is set to level 4, the authentication unit 42 does not authenticate any visitors (acquaintances, specific traders, non-suspicious persons, and suspicious persons).

録画制御部３５は、撮像部１１が撮像した撮像画像を録画した録画画像を保存又は破棄する処理を行なう。自動応答モードにおいて、判定部３３により訪問者が登録者ではない未登録者であると判定された場合、又は在宅情報取得部３１が取得した在宅情報が居住者の不在を示す場合、録画制御部３５は、録画画像を記憶部２６の保存メモリに保存する。 The recording control unit 35 performs a process of storing or discarding a recorded image obtained by recording the captured image captured by the imaging unit 11 . In the automatic response mode, if the determination unit 33 determines that the visitor is not a registered person but an unregistered person, or if the stay-at-home information acquired by the stay-at-home information acquisition unit 31 indicates that the resident is absent, the recording control unit 35 stores the recorded image in the storage memory of the storage unit 26 .

そして、応答音声制御装置３０は、居住者の入力操作により再生要求がなされると、記憶部２６から録画画像を読み出して表示部２１に表示するとともに、録画画像に連動して子機側収音部１３を介して録音された訪問者の音声を親機側音声出力部２４から出力する。これにより、居住者は、録画画像及び訪問者の用件を確認することができる。本実施形態では、通話応答モードに切り替わると、録画制御部３５が録画画像を破棄するように構成されるが、通話応答モードにおいても録画画像を保存するように構成してもよい。 Then, when a reproduction request is made by an input operation of the resident, the response voice control device 30 reads the recorded image from the storage unit 26 and displays it on the display unit 21, and interlocks with the recorded image to pick up the slave unit side sound. The voice of the visitor recorded via the unit 13 is output from the main unit side voice output unit 24 . This allows the resident to check the recorded image and the visitor's business. In this embodiment, the recording control unit 35 is configured to discard the recorded image when switching to the call response mode, but may be configured to save the recorded image even in the call response mode.

応答音声選択部３４は、判定部３３の判定結果に応じて子機側音声出力部１４から出力する応答音声を選択し、選択した応答音声を子機側音声出力部１４に親機側通信部２５を介して送信（出力）する。応答音声選択部３４は、判定部３３の認識及び認証の少なくとも一方により、訪問者が認識できたか否か、訪問者が登録者であるか否かを判定する。例えば、応答音声選択部３４は、訪問者が認識できない場合や訪問者が登録者でない場合、合成音声を子機側音声出力部１４に出力する。また、例えば、応答音声選択部３４は、訪問者が登録者である場合、居住者が発した音声又は居住者の録音音声を子機側音声出力部１４に出力する。 The response voice selection unit 34 selects a response voice to be output from the child device side voice output unit 14 according to the determination result of the determination unit 33, and transmits the selected response voice to the child device side voice output unit 14. 25 (output). The response voice selection unit 34 determines whether or not the visitor has been recognized and whether or not the visitor is a registrant based on at least one of recognition and authentication by the determination unit 33 . For example, when the visitor cannot be recognized or when the visitor is not a registrant, the response voice selection unit 34 outputs the synthesized voice to the child device side voice output unit 14 . Further, for example, when the visitor is a registrant, the response voice selection unit 34 outputs the voice uttered by the resident or the recorded voice of the resident to the slave unit side voice output unit 14 .

応答音声選択部３４は、判定部３３による認識及び認証により判別された訪問者の種類（登録者、特定業者、非不審者、不審者）に応じて、複数種類の音声メッセージの中からそれぞれ異なる音声メッセージを選択して子機側音声出力部１４に出力してもよい。なお、応答音声制御装置３０が通話応答モードの時は、親機２０と子機１０との間で通話が可能な状態であるため、居住者が発した音声が子機側音声出力部１４から出力される。 The response voice selection unit 34 selects different voice messages from multiple types of voice messages according to the type of visitor (registrant, specific trader, non-suspicious person, suspicious person) determined by recognition and authentication by the determination unit 33. A voice message may be selected and output to the handset side voice output section 14 . When the response voice control device 30 is in the call response mode, a call can be made between the master device 20 and the slave device 10. Therefore, the voice uttered by the resident is output from the voice output unit 14 on the slave device side. output.

記憶部２６は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、不揮発性メモリ等から選択されるメモリにより構成される。 The storage unit 26 is configured by a memory selected from ROM (Read Only Memory), RAM (Random Access Memory), nonvolatile memory, and the like.

記憶部２６は、録画画像を保存する保存メモリを有する。記憶部２６は、録音された訪問者の音声と対応付けて録画画像を保存メモリに記憶する。記憶部２６は、親機２０に備えられるほか、応答音声制御装置３０に備えられてもよい。 The storage unit 26 has a storage memory that stores recorded images. The storage unit 26 stores the recorded image in the storage memory in association with the recorded voice of the visitor. The storage unit 26 may be provided in the master device 20 as well as in the response voice control device 30 .

また、記憶部２６は、登録者データを予め認証メモリに記憶している。登録者データは、携帯端末の画像フォルダに格納された登録者の各々の顔画像を含む写真のデータ等に基づいて自動生成される。居住者は、ネットワークを介して携帯端末から応答音声制御装置３０に登録者データを送信することで、記憶部２６に登録者データを記憶させることができる。応答音声制御装置３０が携帯端末から受信したデータは、親機側操作部２２を介して顔画像の追加、変更、及び削除を行なうことにより編集することができる。したがって、例えば携帯端末から受信したデータの中に登録者データに登録したくない人物の顔画像が含まれている場合、その人物の顔画像を削除して登録者データを管理することができる。 Further, the storage unit 26 stores registrant data in the authentication memory in advance. The registrant data is automatically generated based on photo data including face images of each registrant stored in the image folder of the mobile terminal. The resident can store the registrant data in the storage unit 26 by transmitting the registrant data from the portable terminal to the response voice control device 30 via the network. The data received by the response voice control device 30 from the portable terminal can be edited by adding, changing, and deleting facial images via the master device side operation unit 22 . Therefore, for example, when the data received from the portable terminal includes a face image of a person who is not desired to be registered in the registrant data, the face image of the person can be deleted and the registrant data can be managed.

また、記憶部２６は、配達業者等の特定業者の特定業者データを予め認証メモリに記憶している。特定業者データは、特定業者を識別するためのデータである。特定業者を識別するためのデータとしては、特定業者が着用する制服や帽子に関する画像情報（色、柄等）、特定業者が属する会社の特有のシンボルマーク等に関する画像情報等が好ましい。 In addition, the storage unit 26 stores in the authentication memory in advance specific trader data of a specific trader such as a delivery trader. The specific trader data is data for identifying the specific trader. As the data for identifying the specific trader, image information (color, pattern, etc.) related to uniforms and hats worn by the specific trader, image information related to the unique symbol mark of the company to which the specific trader belongs, etc. are preferable.

さらに、記憶部２６は、不審者に関する不審者データを予め認証メモリに記憶している。不審者の条件としては、例えば多くの人が「不審」、「危険」と判定する人物の身体的特徴（例えば、顔）が用いられる。不審者の具体例としては、マスク、サングラス、手等で自身の顔を覆うことにより顔を隠している人物、顔を背けている人物（横顔の人物）等が挙げられる。 Further, the storage unit 26 stores suspicious person data regarding suspicious persons in the authentication memory in advance. As the condition of the suspicious person, for example, a physical feature (for example, face) of the person who is judged by many people as "suspicious" or "dangerous" is used. Specific examples of the suspicious person include a person who hides his or her face by covering it with a mask, sunglasses, hands, or the like, and a person who turns his face away (a person with a profile).

また、記憶部２６は、録音音声と合成音声との少なくとも一方の音声を予め応答音声メモリに記憶している。録音音声は、例えば予め親機側収音部２３を介して居住者の発話を録音した音声である。合成音声は、例えば予め機械的に生成された音声である。録音音声及び合成音声は、応答音声制御装置３０が外部装置から取得するように構成してもよい。録音音声及び合成音声は、訪問者の要件を伺う内容の音声メッセージ、訪問者に対する要望を伝える内容の音声メッセージ等にするとよい。 Further, the storage unit 26 stores at least one of the recorded voice and the synthesized voice in the response voice memory in advance. The recorded voice is, for example, voice recorded in advance by the resident through the master unit side sound pickup unit 23 . Synthetic speech is, for example, speech mechanically generated in advance. The recorded voice and synthesized voice may be configured so that the response voice control device 30 acquires them from an external device. The recorded voice and synthesized voice may be a voice message asking the visitor's requirements, a voice message conveying the request to the visitor, or the like.

このような音声メッセージの具体例としては、「どちら様ですか」、「ご用件は」、「マスクを外してください」等が挙げられる。居住者は、親機側操作部２２を介して、記憶部２６の応答音声メモリに記憶されている複数種類の音声メッセージの中から、応答音声の候補となる音声メッセージを設定することができる。 Specific examples of such voice messages include "Who are you?", "What is your business?" A resident can set a voice message as a candidate for a response voice from among a plurality of types of voice messages stored in a response voice memory of a storage part 26 via the main unit side operation part 22.例文帳に追加

上記した保存メモリ、認証メモリ、及び応答音声メモリは、データを書き換え可能なメモリであって、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、フラッシュメモリ（ＳＳＤ：ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の不揮発性メモリであることが好ましい。また、インターホン装置１が人工知能（ＡＩ：ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）を搭載している場合、応答音声制御装置３０は、過去に蓄積された複数のデータや学習済みモデルを蓄積可能である。また、応答音声制御装置３０は、ニューラルネットワークやディープラーニング等のフレームワークによる機械学習も可能である。 The storage memory, authentication memory, and response voice memory described above are rewritable memories, and are preferably non-volatile memories such as HDDs (Hard Disk Drives) and flash memories (SSDs: Solid State Drives). . Moreover, when the intercom device 1 is equipped with artificial intelligence (AI: Artificial Intelligence), the response voice control device 30 can accumulate a plurality of data and learned models accumulated in the past. In addition, the response voice control device 30 is also capable of machine learning using frameworks such as neural networks and deep learning.

次に、インターホン装置１の動作の一例について、応答音声制御装置３０が行なう処理を中心に説明する。図２は、図１に示すインターホン装置の動作を説明するフローチャートである。図２に示す処理フローでは、訪問者が登録者ではない場合、在宅情報が不在を示す場合に、応答音声として合成音声を選択して出力する場合を例に挙げて説明する。ただし、セキュリティ上の問題がなければ合成音声以外の音声（例えば、居住者の録音音声）を選択して出力してもよい。 Next, an example of the operation of the intercom device 1 will be described, centering on the processing performed by the response voice control device 30. FIG. FIG. 2 is a flow chart for explaining the operation of the intercom device shown in FIG. In the processing flow shown in FIG. 2, an example will be described in which synthesized speech is selected and output as a response speech when the visitor is not a registrant and the home information indicates absence. However, if there is no security problem, voices other than synthesized voices (for example, recorded voices of residents) may be selected and output.

訪問者が呼出ボタンを押下することにより子機１０が呼出操作されると、子機１０から親機２０へ呼出信号が送信されて図２に示す処理フローが開始される。子機１０は撮像部１１を起動させて撮像を開始し、撮像した撮像画像の画像信号を応答音声制御装置３０の画像取得部３２に送信する。同時に、呼出信号及び画像信号を受信した親機２０は、親機側音声出力部２４のスピーカから呼出音を鳴動するとともに、表示部２１に撮像画像を表示する。 When the visitor presses the call button to perform a calling operation on the child device 10, a call signal is transmitted from the child device 10 to the parent device 20, and the processing flow shown in FIG. 2 is started. The slave device 10 activates the imaging unit 11 to start imaging, and transmits an image signal of the captured image to the image acquisition unit 32 of the response voice control device 30 . At the same time, the base unit 20 that has received the call signal and the image signal sounds the ring tone from the speaker of the base unit side audio output unit 24 and displays the captured image on the display unit 21 .

ステップＳ１１において、在宅情報取得部３１により在宅情報が取得されると、取得された在宅情報に基づいて居住者が在宅しているか否かが判別される。在宅情報が居住者の在宅を示す場合、ステップＳ１２に処理が進む。在宅情報が居住者の不在を示す場合、ステップＳ２１に処理が進む。 In step S11, when the stay-at-home information acquisition unit 31 acquires the stay-at-home information, it is determined whether or not the resident is at home based on the acquired stay-at-home information. If the at-home information indicates that the resident is at home, the process proceeds to step S12. If the presence-at-home information indicates that the resident is absent, the process proceeds to step S21.

在宅を示す在宅情報であれば、ステップＳ１２において、判定部３３は、画像取得部３２により取得された撮像画像に基づいて訪問者の認識及び認証を行なう。 If the at-home information indicates that the visitor is at home, the determination unit 33 recognizes and authenticates the visitor based on the captured image acquired by the image acquisition unit 32 in step S12.

判定部３３により、訪問者が認識できない又は訪問者が登録者でないと判定された場合（ステップＳ１２：ＹＥＳ）、ステップＳ１３に処理が進む。判定部３３により訪問者が登録者であると判定された場合（ステップＳ１２：ＮＯ）、ステップＳ３１に処理が進む。 If the determination unit 33 determines that the visitor cannot be recognized or that the visitor is not a registrant (step S12: YES), the process proceeds to step S13. If the determination unit 33 determines that the visitor is a registrant (step S12: NO), the process proceeds to step S31.

ステップＳ１３において、応答音声選択部３４は、記憶部２６に記憶されている応答音声の候補の中から合成音声を選択する。応答音声制御装置３０は、応答音声選択部３４により選択された合成音声を子機１０へ送信する。これと同時に、録画制御部３５は、撮像画像の録画を開始する。そして、ステップＳ１４に処理が進む。 In step S<b>13 , the response voice selection unit 34 selects a synthetic voice from the response voice candidates stored in the storage unit 26 . The response voice control device 30 transmits the synthesized voice selected by the response voice selection unit 34 to the child device 10 . At the same time, the recording control unit 35 starts recording the captured image. Then, the process proceeds to step S14.

ステップＳ１４において、親機２０と子機１０との間で通話が開始されたか否かを判定する。この判定は、居住者による通話開始ボタンの操作が行われたことを親機側操作部２２が受け付けたか否かに基づいて判定することができる。通話が開始されたと判定した場合（ステップＳ１４：ＹＥＳ）、ステップＳ１５に処理が進む。通話が開始されていないと判定した場合（ステップＳ１４：ＮＯ）、ステップＳ２２に処理が進む。 In step S14, it is determined whether or not a call has been started between the master device 20 and the slave device 10. FIG. This determination can be made based on whether or not the base unit side operation unit 22 has received that the call start button has been operated by the resident. If it is determined that the call has started (step S14: YES), the process proceeds to step S15. If it is determined that the call has not started (step S14: NO), the process proceeds to step S22.

通話が開始されたと判定されると、ステップＳ１５において、録画制御部３５は、録画画像を破棄する処理を行なう。そして、図２に示す処理フローが終了する。 When it is determined that the call has started, the recording control unit 35 performs processing of discarding the recorded image in step S15. Then, the processing flow shown in FIG. 2 ends.

一方、ステップＳ１１で不在を示す在宅情報であれば、ステップＳ２１において、応答音声選択部３４は、記憶部２６の応答音声メモリに記憶されている応答音声の候補の中から合成音声を選択する。応答音声制御装置３０は、応答音声選択部３４により選択された合成音声を子機１０へ送信する。これと同時に、録画制御部３５は、撮像画像の録画を開始する。そして、ステップＳ２２において、録画制御部３５は、録画画像を保存する処理を行ない、図２に示す処理フローが終了する。 On the other hand, if the at-home information indicates absence in step S11, the response voice selection unit 34 selects a synthetic voice from the response voice candidates stored in the response voice memory of the storage unit 26 in step S21. The response voice control device 30 transmits the synthesized voice selected by the response voice selection unit 34 to the child device 10 . At the same time, the recording control unit 35 starts recording the captured image. Then, in step S22, the recording control unit 35 performs processing for saving the recorded image, and the processing flow shown in FIG. 2 ends.

また、ステップＳ１２で訪問者が登録者であると判定されると、ステップＳ３１において、応答音声選択部３４は、自動応答モードから通話応答モードに切り替えることにより、親機側収音部２３が収音した居住者が発した音声が選択され、図２に示す処理フローが終了する。 Further, when it is determined in step S12 that the visitor is a registrant, in step S31, the response voice selection unit 34 switches from the automatic response mode to the call response mode, thereby The voice uttered by the resident who made the sound is selected, and the processing flow shown in FIG. 2 ends.

以上説明したように、本実施形態にかかるインターホン装置１は、上記した画像取得部３２、判定部３３、及び応答音声選択部３４を有する。 As described above, the intercom device 1 according to the present embodiment has the image acquiring section 32, the determining section 33, and the response voice selecting section 34 described above.

このような構成によれば、訪問者の種類に応じて異なる音声メッセージが出力され得る。そのため、訪問者に居住者が在宅であるか不在であるかを知られずに用件を聞き出すことができる。その結果、不審な訪問者から身を守ることができる。 With such a configuration, different voice messages can be output according to the type of visitor. Therefore, it is possible to ask the visitor what to do without knowing whether the resident is at home or not. As a result, you can protect yourself from suspicious visitors.

また、インターホン装置１において、応答音声選択部３４は、判定部３３により訪問者が登録者ではないと判定された場合、予め記憶された合成音声を応答音声とする。 Further, in the intercom device 1, when the determination unit 33 determines that the visitor is not the registrant, the response voice selection unit 34 uses the pre-stored synthesized voice as the response voice.

このような構成によれば、登録済みの訪問者である場合に居住者が発した音声又は居住者の録音音声による応答が実現され、認識不能な訪問者である場合や未登録の訪問者である場合には、合成音声による応答が実現される。そのため、認識可能な登録者以外の訪問者に居住者が発した音声を聞かれてしまう虞がなく、性別等についての居住者の個人情報を推測されてしまう虞や、居住者が在宅していること等の居住者の個人情報を知られてしまうことを回避できる。 According to such a configuration, a response by a resident's spoken voice or a resident's recorded voice is realized in the case of a registered visitor, and a response is realized in the case of an unrecognized visitor or an unregistered visitor. In some cases, synthetic speech responses are implemented. Therefore, there is no risk that visitors other than recognizable registrants will hear the voice uttered by the resident. It is possible to avoid revealing the personal information of the residents, such as the presence of the residents.

また、インターホン装置１において、判定部３３は、予め記憶された不審者に関する不審者データを用いて訪問者が不審者であることをさらに認証し、応答音声選択部３４は、判定部３３により訪問者が不審者であることが認証されると、合成音声を応答音声とする。 Further, in the intercom device 1, the determination unit 33 further authenticates that the visitor is a suspicious person using the suspicious person data related to the suspicious person stored in advance, and the response voice selection unit 34 causes the determination unit 33 to When the person is authenticated as a suspicious person, the synthetic voice is used as a response voice.

このような構成によれば、不審者の検出が可能であるとともに、不審者に居住者が発した音声を聞かれてしまう虞がなく、性別等についての居住者の個人情報を推測されてしまう虞や、居住者が在宅していること等の居住者の個人情報を知られしまうことを回避できる。その結果、不審者から一層身を守ることができる。 According to such a configuration, it is possible to detect a suspicious person, and there is no fear that the suspicious person will hear the voice uttered by the resident, and the personal information of the resident such as gender can be guessed. It is possible to avoid the possibility that the resident's personal information such as that the resident is at home will be known. As a result, it is possible to further protect oneself from suspicious persons.

また、インターホン装置１は、住居内における居住者の在宅又は不在を示す在宅情報を取得する在宅情報取得部３１を有する。また、インターホン装置は１は、判定部３３により訪問者が登録者ではないと判定された場合、又は在宅情報取得部３１が取得した在宅情報が不在を示す場合、撮像部１１が撮像した撮像画像を録画した録画画像を保存する録画制御部３５を有する。 The intercom device 1 also has an at-home information acquisition unit 31 that acquires at-home information indicating whether the resident is at home or not in the residence. Further, when the determination unit 33 determines that the visitor is not a registrant, or when the presence-at-home information acquired by the presence-at-home information acquisition unit 31 indicates that the visitor is absent, the intercom device 1 displays the captured image captured by the imaging unit 11. has a recording control unit 35 that saves a recorded image obtained by recording the

このような構成によれば、居住者は録画画像及び録音された訪問者の音声を確認することにより、未登録の訪問者の情報を把握することができる。或いは、訪問者の来訪時に居住者が住居内に不在であっても、居住者は録画画像及び録音された訪問者の音声を確認することにより、不在中に来訪した訪問者の情報を把握することができる。 According to such a configuration, the resident can grasp the information of the unregistered visitor by confirming the recorded image and the recorded voice of the visitor. Alternatively, even if the resident is not in the residence when the visitor visits, the resident can grasp the information of the visitor who visited during the absence by checking the recorded image and the recorded voice of the visitor. be able to.

また、インターホン装置１において、応答音声選択部３４は、判定部３３により訪問者が登録者であると判定された場合、且つ在宅情報取得部３１が取得した在宅情報が不在を示す場合、予め録音された録音音声を応答音声とする。 Further, in the intercom device 1, the response voice selection unit 34, when the determination unit 33 determines that the visitor is a registrant and when the presence-at-home information acquired by the presence-at-home information acquisition unit 31 indicates absence, pre-records The recorded voice is used as the response voice.

このような構成によれば、居住者が住居内に不在であっても、登録済みの訪問者である場合は用件を聞き出すことができるため。居住者は録画画像及び録音された訪問者の音声を確認することにより、訪問者の有無、訪問者が誰であるか、訪問者の用件等の不在中における訪問者の情報を把握することができる。 According to such a configuration, even if the resident is not in the residence, if the resident is a registered visitor, he/she can ask for the business. By checking the recorded images and the recorded voice of the visitor, the resident can grasp the information of the visitor during his/her absence, such as presence or absence of the visitor, who the visitor is, and what the visitor is doing. can be done.

以上詳細に説明したように、本実施形態によれば、居住者のプライバシーやセキュリティを確保可能なインターホン装置１を提供することができる。 As explained in detail above, according to this embodiment, it is possible to provide the intercom device 1 capable of ensuring the privacy and security of the residents.

なお、本開示は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。例えば、上記実施形態では、不審者を認証するために不審者データを用いたが、不審者データの代わりに非不審者に関する非不審者データを記憶部２６に記憶させてもよい。非不審者の条件としては、例えば多くの人が「穏やかそう」と判定する人物の身体的特徴（例えば、顔）を用いることができる。非不審者データを用いて非不審者の認証を行なう場合、例えば認証部４２は、撮像画像と記憶部２６から読み出した非不審者データとを比較して数値化した両者の一致度合いが所定の閾値を超えるか否かによって非不審者を認証することができる。 It should be noted that the present disclosure is not limited to the above embodiments, and can be modified as appropriate without departing from the scope of the present disclosure. For example, in the above embodiment, suspicious person data is used to authenticate a suspicious person, but non-suspicious person data related to a non-suspicious person may be stored in the storage unit 26 instead of suspicious person data. As a condition of a non-suspicious person, for example, a physical feature (for example, face) of a person who is judged by many people to be “calm” can be used. When authenticating a non-suspicious person using non-suspicious person data, for example, the authentication unit 42 compares the captured image with the non-suspicious person data read from the storage unit 26 and digitizes the degree of matching between the two. A non-suspicious person can be authenticated depending on whether or not the threshold is exceeded.

本開示は、ＳＤＧ‘ｓの「住み続けられるまちづくりを」の実現に貢献し、公共施設の安心・安全に寄与する事項を含む。 This disclosure includes matters that contribute to the realization of the SDGs's "Sustainable urban development" and contribute to the safety and security of public facilities.

１インターホン装置
１０子機
１１撮像部
１２子機側操作部
１３子機側収音部
１４子機側音声出力部
１５子機側通信部
１６子機側制御部
２０親機
２１表示部
２２親機側操作部
２３親機側収音部
２４親機側音声出力部
２５親機側通信部
２６記憶部
２７親機側制御部
３０応答音声制御装置
３１在宅情報取得部
３２画像取得部
３３判定部
３４応答音声選択部
３５録画制御部
４１認識部
４２認証部 1 intercom device 10 slave unit 11 imaging unit 12 slave unit side operation unit 13 slave unit side sound collection unit 14 slave unit side audio output unit 15 slave unit side communication unit 16 slave unit side control unit 20 master unit 21 display unit 22 master unit Side operation unit 23 Parent device side sound pickup unit 24 Parent device side audio output unit 25 Parent device side communication unit 26 Storage unit 27 Parent device side control unit 30 Response voice control device 31 Presence information acquisition unit 32 Image acquisition unit 33 Judgment unit 34 Response voice selection unit 35 Recording control unit 41 Recognition unit 42 Authentication unit

Claims

An intercom device equipped with an intercom slave unit used by a visitor,
an image acquisition unit that acquires an image captured by an imaging unit that captures an image of the visitor;
a determination unit that determines at least one of whether the visitor has been recognized and whether the visitor is a pre-registered registrant based on the captured image acquired by the image acquisition unit;
a response voice selection unit that selects a response voice to be output from the handset-side voice output unit of the intercom slave unit according to the determination result of the determination unit, and outputs the selected response voice to the handset-side voice output unit; ,
An intercom device having

The response voice selection unit
If the determination unit determines that the visitor is not the registrant, a pre-stored synthetic voice is used as the response voice.
The intercom device according to claim 1.

The determination unit is
further authenticating that the visitor is the suspicious person using pre-stored suspicious person data relating to the suspicious person;
The response voice selection unit
When the determination unit authenticates that the visitor is the suspicious person, a pre-stored synthetic voice is used as the response voice.
The interphone device according to claim 1 or 2.

a stay-at-home information acquisition unit that acquires stay-at-home information indicating whether a resident is at home or not in the residence;
When the determining unit determines that the visitor is not the registrant, or when the at-home information acquired by the at-home information acquiring unit indicates the absence, the captured image captured by the imaging unit is recorded. a recording control unit that saves recorded images;
The intercom device according to any one of claims 1 to 3, further comprising:

The response voice selection unit
If the determination unit determines that the visitor is the registrant and if the presence-at-home information acquired by the presence-at-home information acquisition unit indicates that the visitor is absent, pre-recorded voice is used as the response voice. ,
The intercom device according to claim 4.