JP2021140114A

JP2021140114A - Information terminal, intercom system, processing method and program

Info

Publication number: JP2021140114A
Application number: JP2020040037A
Authority: JP
Inventors: 奨士永井; Shoji Nagai; 俊彦八木; Toshihiko Yagi; 剛桑野; Takeshi Kuwano
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2020-03-09
Filing date: 2020-03-09
Publication date: 2021-09-16

Abstract

To provide an information terminal, an intercom system, a processing method and a program which can further improve accuracy of voice recognition.SOLUTION: An information terminal 10 operates as an intercom device. The information terminal 10 includes a sound acquisition unit 13, a voice recognition unit 182, and a control processing unit 183. The sound acquisition unit 13 acquires at least a sound including a user's voice. The voice recognition unit 182 performs voice recognition based on the sound acquired by the sound acquisition unit 13. The control processing unit 183 performs control on the basis of the voice recognition result of the voice recognition unit 182. The voice recognition unit 182 is configured so that information referred to by the voice recognition can be changed.SELECTED DRAWING: Figure 1

Description

本開示は、一般に情報端末、インターホンシステム、処理方法及びプログラムに関し、より詳細には音声認識可能に構成された情報端末、インターホンシステム、処理方法及びプログラムに関する。 The present disclosure relates generally to information terminals, intercom systems, processing methods and programs, and more specifically to information terminals, intercom systems, processing methods and programs configured to be voice-recognizable.

従来、集合住宅等で用いられるインターホンシステムが知られている（例えば、特許文献１参照）。 Conventionally, an intercom system used in an apartment house or the like is known (see, for example, Patent Document 1).

特許文献１のインターホンシステムは、集合住宅の共同玄関に設置されるロビーインターホン、各住戸内に設置されるインターホン親機、及び各住戸の戸外（玄関先）に設置されるドアホン子器を備える。 The intercom system of Patent Document 1 includes a lobby intercom installed at the common entrance of an apartment house, an intercom master unit installed in each dwelling unit, and a doorphone slave unit installed outdoors (entrance) of each dwelling unit.

このようなインターホンシステムでは、来訪者は、ロビーインターホンを用いてインターホン親機を呼び出す。この呼出に応じて住戸の住人がインターホン親機に対して所定の操作を行うことでロビーインターホンとインターホン親機との間で通話が開始される。ドアホン子器でインターホン親機を呼び出したときも同様に、呼出に応じて住戸の住人がインターホン親機に対して所定の操作を行うことでドアホン子器とインターホン親機との間で通話が開始される。 In such an intercom system, the visitor calls the intercom master unit using the lobby intercom. In response to this call, the resident of the dwelling unit performs a predetermined operation on the intercom master unit to start a call between the lobby intercom and the intercom master unit. Similarly, when the intercom master unit is called by the doorphone slave unit, the resident of the dwelling unit performs a predetermined operation on the intercom master unit in response to the call, and a call is started between the doorphone slave unit and the intercom master unit. Will be done.

特開２００４−６４２４９号公報Japanese Unexamined Patent Publication No. 2004-64249

ところで、所定の操作として住戸の住人が発した音声を基に音声認識を行うシステムが存在する。音声認識では、処理負荷を高めることなく、精度良く行われることが望まれている。 By the way, there is a system that performs voice recognition based on the voice emitted by the resident of the dwelling unit as a predetermined operation. It is desired that speech recognition be performed with high accuracy without increasing the processing load.

本開示は上記課題に鑑みてなされ、音声認識の処理負荷を高めることなく、精度良く行うことが可能な情報端末、インターホンシステム、処理方法及びプログラムを提供することを目的とする。 The present disclosure has been made in view of the above problems, and an object of the present disclosure is to provide an information terminal, an intercom system, a processing method, and a program that can be performed accurately without increasing the processing load of voice recognition.

本開示の一態様に係る情報端末は、インターホン装置として動作する情報端末である。前記情報端末は、音取得部と、音声認識部と、制御処理部と、を備える。前記音取得部は、少なくともユーザの音声を含む音を取得する。前記音声認識部は、前記音取得部が取得した前記音に基づいた音声認識を行う。前記制御処理部は、前記音声認識部の音声認識結果に基づいて制御を行う。前記音声認識部は、音声認識で参照する情報を変更可能に構成されている。 The information terminal according to one aspect of the present disclosure is an information terminal that operates as an intercom device. The information terminal includes a sound acquisition unit, a voice recognition unit, and a control processing unit. The sound acquisition unit acquires at least a sound including a user's voice. The voice recognition unit performs voice recognition based on the sound acquired by the sound acquisition unit. The control processing unit performs control based on the voice recognition result of the voice recognition unit. The voice recognition unit is configured so that the information referred to in voice recognition can be changed.

本開示の一態様に係るインターホンシステムは、前記情報端末と、前記情報端末と通信するインターホン玄関装置と、を備える。 The intercom system according to one aspect of the present disclosure includes the information terminal and an intercom entrance device that communicates with the information terminal.

本開示の一態様に係る処理方法は、インターホン装置として動作する情報端末で用いられる処理方法である。前記処理方法は、音取得ステップと、音声認識ステップと、制御処理ステップと、を含む。前記音取得ステップは、少なくともユーザの音声を含む音を取得する。前記音声認識ステップは、前記音取得ステップで取得した前記音に基づいた音声認識処理を行う。前記制御処理ステップは、前記音声認識ステップでの音声認識結果に基づいて制御を行う。前記音声認識処理は、音声認識で参照する情報を変更可能に構成されている。 The processing method according to one aspect of the present disclosure is a processing method used in an information terminal operating as an intercom device. The processing method includes a sound acquisition step, a voice recognition step, and a control processing step. The sound acquisition step acquires a sound including at least the user's voice. The voice recognition step performs voice recognition processing based on the sound acquired in the sound acquisition step. The control processing step controls based on the voice recognition result in the voice recognition step. The voice recognition process is configured so that the information referred to in the voice recognition can be changed.

本開示の一態様に係るプログラムは、コンピュータに、前記処理方法を実行させるためのプログラムである。 The program according to one aspect of the present disclosure is a program for causing a computer to execute the processing method.

本開示によると、音声認識の処理負荷を高めることなく、精度良く行うことが可能である。 According to the present disclosure, it is possible to perform the speech recognition with high accuracy without increasing the processing load.

図１は、一実施形態に係る情報端末の構成を説明するブロック図である。FIG. 1 is a block diagram illustrating a configuration of an information terminal according to an embodiment. 図２は、同上の情報端末を備えるインターホンシステムのシステム構成を説明する図である。FIG. 2 is a diagram illustrating a system configuration of an intercom system including the same information terminal. 図３は、同上の情報端末が辞書ファイルを追加する際の動作を説明する図である。FIG. 3 is a diagram illustrating an operation when the same information terminal adds a dictionary file. 図４は、同上の情報端末が音声認識を行う際の動作を説明する図である。FIG. 4 is a diagram illustrating an operation when the same information terminal performs voice recognition. 図５は、変形例１に係る情報端末を説明する図である。FIG. 5 is a diagram illustrating an information terminal according to the first modification.

以下に説明する実施形態及び変形例は、本開示の一例に過ぎず、本開示は、実施形態及び変形例に限定されない。以下の実施形態及び変形例以外であっても、本開示に係る技術的思想を逸脱しない範囲であれば、設計等に応じて種々の変更が可能である。 The embodiments and modifications described below are merely examples of the present disclosure, and the present disclosure is not limited to the embodiments and modifications. Other than the following embodiments and modifications, various changes can be made according to the design and the like as long as they do not deviate from the technical idea of the present disclosure.

（実施形態）
以下、本実施形態に係るインターホン装置として動作する情報端末１０を備えるインターホンシステム１について、図１〜図４を用いて説明する。 (Embodiment)
Hereinafter, the intercom system 1 including the information terminal 10 that operates as the intercom device according to the present embodiment will be described with reference to FIGS. 1 to 4.

（１）概要
以下、本実施形態に係る情報端末１０について、説明する。 (1) Outline Hereinafter, the information terminal 10 according to the present embodiment will be described.

本実施形態に係る情報端末１０は、図２に示すように、インターホンシステム１に適用される。インターホンシステム１は、例えば、マンション等の集合住宅５に適用される。本実施形態に係るインターホンシステム１は、情報端末１０を備える。本実施形態では、インターホンシステム１は、各々がインターホン装置として動作する複数（図２では２つ）の情報端末１０を備える。インターホンシステム１は、ロビーインターホン２０（インターホン玄関装置）と、制御装置３０と、複数（図２では２つ）の玄関子機４０（インターホン玄関装置）とを、更に備える。インターホンシステム１では、複数の情報端末１０の各々とロビーインターホン２０とが制御装置３０を介して通信を行うように構成されている。また、インターホンシステム１では、複数の情報端末１０と複数の玄関子機４０とが一対一に対応している。なお、本実施形態に係るインターホンシステム１は、集合住宅５以外に、戸建住宅に適用されてもよい。あるいは、インターホンシステム１は、事務所、店舗、学校若しくは介護施設等の非住宅施設等に適用されてもよい。 As shown in FIG. 2, the information terminal 10 according to the present embodiment is applied to the intercom system 1. The intercom system 1 is applied to, for example, an apartment house 5 such as a condominium. The intercom system 1 according to the present embodiment includes an information terminal 10. In the present embodiment, the intercom system 1 includes a plurality of information terminals 10 (two in FIG. 2), each of which operates as an intercom device. The intercom system 1 further includes a lobby intercom 20 (intercom entrance device), a control device 30, and a plurality of (two in FIG. 2) entrance slave units 40 (intercom entrance device). In the intercom system 1, each of the plurality of information terminals 10 and the lobby intercom 20 are configured to communicate with each other via the control device 30. Further, in the intercom system 1, a plurality of information terminals 10 and a plurality of entrance slave units 40 have a one-to-one correspondence. The intercom system 1 according to the present embodiment may be applied to a detached house in addition to the apartment house 5. Alternatively, the intercom system 1 may be applied to non-residential facilities such as offices, stores, schools or long-term care facilities.

複数の情報端末１０の各々は、例えば、集合住宅５に含まれる複数の住戸Ｅ２の各々に設けられている住戸端末（インターホン親機）である。各情報端末１０は、例えば、各住戸Ｅ２の内玄関に設けられている。各情報端末１０は、第２幹線６２、分岐線６３、及び分岐器５０を介して制御装置３０に接続されている。各情報端末１０は、制御装置３０を介して、ロビーインターホン２０との間で通信（例えば、通話、及び制御信号の送信等）を行うように構成されている。さらに、各情報端末１０は、接続線６４を介して対応する玄関子機４０に接続されている。各情報端末１０は、対応する玄関子機４０との間で通信（例えば、通話、及び制御信号の送信等）を行うように構成されている。 Each of the plurality of information terminals 10 is, for example, a dwelling unit terminal (intercom master unit) provided in each of the plurality of dwelling units E2 included in the apartment house 5. Each information terminal 10 is provided, for example, at the inner entrance of each dwelling unit E2. Each information terminal 10 is connected to the control device 30 via the second trunk line 62, the branch line 63, and the turnout 50. Each information terminal 10 is configured to perform communication (for example, a telephone call, transmission of a control signal, etc.) with the lobby intercom 20 via the control device 30. Further, each information terminal 10 is connected to the corresponding entrance slave unit 40 via the connection line 64. Each information terminal 10 is configured to perform communication (for example, a telephone call, transmission of a control signal, etc.) with the corresponding entrance slave unit 40.

ロビーインターホン２０は、例えば、集合住宅５の共用玄関（ロビー）Ｅ１に設けられている。ロビーインターホン２０は、第１幹線６１を介して制御装置３０に接続されている。ロビーインターホン２０は、制御装置３０を介して、各情報端末１０との間で通信（例えば、通話、及び映像信号の送信等）を行うように構成されている。ロビーインターホン２０は、例えば、共用玄関Ｅ１の壁に取り付けられている。ロビーインターホン２０が映像信号を情報端末１０に送信することで、情報端末１０は、映像（画像）を表示することができる。 The lobby intercom 20 is provided, for example, at the common entrance (lobby) E1 of the housing complex 5. The lobby intercom 20 is connected to the control device 30 via the first trunk line 61. The lobby intercom 20 is configured to perform communication (for example, a telephone call, transmission of a video signal, etc.) with each information terminal 10 via a control device 30. The lobby intercom 20 is attached to, for example, the wall of the common entrance E1. When the lobby intercom 20 transmits a video signal to the information terminal 10, the information terminal 10 can display a video (image).

制御装置３０は、例えば、集合住宅５の管理室Ｅ３に設けられている。制御装置３０は、第１幹線６１を介してロビーインターホン２０に接続され、かつ第２幹線６２を介して各情報端末１０に接続されている。つまり、制御装置３０は、各情報端末１０とロビーインターホン２０との間の通信を中継するように構成されている。 The control device 30 is provided in, for example, the management room E3 of the apartment house 5. The control device 30 is connected to the lobby intercom 20 via the first trunk line 61, and is connected to each information terminal 10 via the second trunk line 62. That is, the control device 30 is configured to relay the communication between each information terminal 10 and the lobby intercom 20.

複数の玄関子機４０の各々は、例えば、集合住宅５の住戸Ｅ２の外玄関に設けられている。各玄関子機４０は、接続線６４を介して対応する情報端末１０に接続されている。各玄関子機４０は、対応する情報端末１０との間で通信（例えば、通話、映像信号の送信等）を行うように構成されている。 Each of the plurality of entrance slave units 40 is provided, for example, at the outer entrance of the dwelling unit E2 of the apartment house 5. Each entrance slave unit 40 is connected to the corresponding information terminal 10 via a connection line 64. Each entrance slave unit 40 is configured to perform communication (for example, a call, transmission of a video signal, etc.) with the corresponding information terminal 10.

本実施形態では、第１幹線６１、第２幹線６２、分岐線６３、及び接続線６４は、いずれもツイストペア線である。つまり、実際には、第１幹線６１、第２幹線６２、分岐線６３、及び接続線６４は２本の電線で構成されるが、図面上は１本の線で表している。第１幹線６１、第２幹線６２、分岐線６３、及び接続線６４の少なくとも１つはツイストペア線以外の電線であってもよい。 In the present embodiment, the first trunk line 61, the second trunk line 62, the branch line 63, and the connecting line 64 are all twisted pair lines. That is, in reality, the first trunk line 61, the second trunk line 62, the branch line 63, and the connecting line 64 are composed of two electric wires, but are represented by one line in the drawing. At least one of the first trunk line 61, the second trunk line 62, the branch line 63, and the connecting line 64 may be an electric wire other than the twisted pair wire.

本実施形態に係る情報端末１０は、住戸Ｅ２内のユーザの音声を取得し、取得した音声に対して音声認識を施す。情報端末１０は、音声認識の結果に基づいて、インターホンシステム１の操作に関する制御を行う。すなわち、情報端末１０は、音声操作が可能に構成されている。例えば、情報端末１０は、共用玄関Ｅ１に設けられた玄関扉２００（扉）の開閉に係る制御を行うためのキーワード（制御用ワード）を住戸Ｅ２内のユーザの音声から取得すると、制御用ワードに応じた制御を行う。具体的には、情報端末１０は、制御用ワードとして“ドアを開けて”を、音声認識により検出すると、玄関扉２００（図２参照）を開くための制御を行う。ここで、玄関扉２００は、電気錠２０１（図２参照）で開閉されるように構成されている。 The information terminal 10 according to the present embodiment acquires the voice of the user in the dwelling unit E2, and performs voice recognition on the acquired voice. The information terminal 10 controls the operation of the intercom system 1 based on the result of voice recognition. That is, the information terminal 10 is configured to be capable of voice operation. For example, when the information terminal 10 acquires a keyword (control word) for controlling the opening and closing of the entrance door 200 (door) provided in the common entrance E1 from the voice of the user in the dwelling unit E2, the control word Control according to. Specifically, when the information terminal 10 detects "open the door" as a control word by voice recognition, it controls to open the entrance door 200 (see FIG. 2). Here, the entrance door 200 is configured to be opened and closed by an electric lock 201 (see FIG. 2).

本実施形態に係る情報端末１０は、ネットワークＮＴ１を介してサーバ７０と通信可能構成されている。サーバ７０は、情報端末１０が音声操作に係る音声認識を行うための、複数の辞書ファイルを記憶している。例えば、複数の辞書ファイルの各々は、発音の特徴に関するファイルである。複数の辞書ファイルは、男性に関し、かつＡ地方の方言に関する辞書ファイル、女性に関し、かつＡ地方の方言に関する辞書ファイル、男性に関し、かつ子供に関する辞書ファイル、男性に関し、かつ高齢者に関する辞書ファイル等を含む。さらに、辞書ファイルは、男性に関し、かつ母国語（例えば、英語）に関する辞書ファイル、電気錠の開錠に係る音声の辞書ファイル等を含む。 The information terminal 10 according to the present embodiment is configured to be able to communicate with the server 70 via the network NT1. The server 70 stores a plurality of dictionary files for the information terminal 10 to perform voice recognition related to voice operation. For example, each of the plurality of dictionary files is a file relating to pronunciation characteristics. Multiple dictionary files include a dictionary file for men and dialects of region A, a dictionary file for women and dialects of region A, a dictionary file for men and children, a dictionary file for men and elderly people, etc. include. Further, the dictionary file includes a dictionary file related to a man and a native language (for example, English), a dictionary file of a voice related to unlocking an electric lock, and the like.

本実施形態に係る情報端末１０は、複数の辞書ファイルのうち少なくとも１つの辞書ファイルをサーバ７０から取得し、記憶する。情報端末１０は、記憶している辞書ファイルを用いて音声認識処理を行い、玄関扉２００を開く操作（電気錠２０１の開錠操作）に係る制御等を行う。 The information terminal 10 according to the present embodiment acquires at least one dictionary file out of a plurality of dictionary files from the server 70 and stores it. The information terminal 10 performs voice recognition processing using the stored dictionary file, and controls operations related to opening the entrance door 200 (unlocking operation of the electric lock 201).

本実施形態に係る情報端末１０は、音取得部１３と、音声認識部１８２と、制御処理部１８３と、を備える。音取得部１３は、少なくともユーザの音声を含む音を取得する。音声認識部１８２は、音取得部１３が取得した音に基づいた音声認識を行う。制御処理部１８３は、音声認識部１８２の音声認識結果に基づいて制御を行う。音声認識部１８２は、音声認識で参照する情報を変更可能に構成されている。 The information terminal 10 according to the present embodiment includes a sound acquisition unit 13, a voice recognition unit 182, and a control processing unit 183. The sound acquisition unit 13 acquires at least a sound including the user's voice. The voice recognition unit 182 performs voice recognition based on the sound acquired by the sound acquisition unit 13. The control processing unit 183 controls based on the voice recognition result of the voice recognition unit 182. The voice recognition unit 182 is configured so that the information referred to in the voice recognition can be changed.

本実施形態の情報端末１０では、音声認識部１８２は、音声認識で参照する情報を変更可能に構成されている。そのため、ユーザの発音の特徴（アクセント、方言等）に応じて、音声認識で参照する情報を変更することができるので、音声認識の精度をより高めることができる。 In the information terminal 10 of the present embodiment, the voice recognition unit 182 is configured to be able to change the information referred to by the voice recognition. Therefore, the information referred to in the voice recognition can be changed according to the characteristics of the user's pronunciation (accent, dialect, etc.), so that the accuracy of the voice recognition can be further improved.

（２）構成
（２−１）情報端末
情報端末１０は、図１に示すように、第１通信部１１、第２通信部１２、音取得部１３、操作部１４、出力部１５、表示部１６、記憶部１７、制御部１８及び第３通信部１９を備える。 (2) Configuration (2-1) Information terminal As shown in FIG. 1, the information terminal 10 has a first communication unit 11, a second communication unit 12, a sound acquisition unit 13, an operation unit 14, an output unit 15, and a display unit. A storage unit 17, a control unit 18, and a third communication unit 19 are provided.

情報端末１０は、例えばプロセッサ及びメモリを有するマイクロコンピュータを有している。そして、プロセッサがメモリに格納されているプログラムを実行することにより、マイクロコンピュータが制御部１８として機能する。プロセッサが実行するプログラムは、ここではマイクロコンピュータのメモリに予め記録されているが、メモリカード等の非一時的な記録媒体に記録されて提供されてもよいし、インターネット等の電気通信回線を通じて提供されてもよい。 The information terminal 10 has, for example, a microcomputer having a processor and a memory. Then, when the processor executes the program stored in the memory, the microcomputer functions as the control unit 18. The program executed by the processor is recorded in advance in the memory of the microcomputer here, but may be recorded in a non-temporary recording medium such as a memory card and provided, or provided through a telecommunications line such as the Internet. May be done.

第１通信部１１は、ロビーインターホン２０（の通信部２１）と通信するための通信インタフェースである。第１通信部１１は、第２幹線６２、分岐線６３、及び分岐器５０を介して制御装置３０に接続されている。第１通信部１１は、制御装置３０を介して、ロビーインターホン２０に対して音声信号、及び制御信号等を送信する。さらに、第１通信部１１は、制御装置３０を介して、ロビーインターホン２０から音声信号、及び映像信号等を受信する。 The first communication unit 11 is a communication interface for communicating with the lobby intercom 20 (communication unit 21). The first communication unit 11 is connected to the control device 30 via the second trunk line 62, the branch line 63, and the turnout 50. The first communication unit 11 transmits an audio signal, a control signal, and the like to the lobby intercom 20 via the control device 30. Further, the first communication unit 11 receives an audio signal, a video signal, or the like from the lobby intercom 20 via the control device 30.

第２通信部１２は、玄関子機４０と通信するための通信インタフェースである。第２通信部１２は、接続線６４を介して玄関子機４０に接続されている。第２通信部１２は、玄関子機４０に対して音声信号、及び制御信号等を送信し、玄関子機４０から音声信号、及び映像信号等を受信する。 The second communication unit 12 is a communication interface for communicating with the entrance slave unit 40. The second communication unit 12 is connected to the entrance slave unit 40 via the connection line 64. The second communication unit 12 transmits an audio signal, a control signal, and the like to the entrance slave unit 40, and receives an audio signal, a video signal, and the like from the entrance slave unit 40.

第３通信部１９は、外部（ここでは、サーバ７０）と通信するための通信インタフェースである。第３通信部１９は、ネットワークＮＴ１を介してサーバ７０に接続されている。第３通信部１９は、辞書ファイルを要求する情報をサーバ７０に送信し、サーバ７０から複数の辞書ファイルのうち要求に応じた少なくとも１つの辞書ファイルを受信する。 The third communication unit 19 is a communication interface for communicating with the outside (here, the server 70). The third communication unit 19 is connected to the server 70 via the network NT1. The third communication unit 19 transmits the information requesting the dictionary file to the server 70, and receives at least one dictionary file corresponding to the request from the server 70 from the server 70.

音取得部１３は、少なくともユーザの音声を含む音を取得し、取得した音に係る音情報を制御部１８に出力する。音取得部１３は、１つのマイクロホン１３１を有している。マイクロホン１３１は、情報端末１０の前方に位置するユーザの音声（音）を含む周囲の音を取得し、取得した音をアナログの音信号（音情報）に変換して制御部１８に出力する。 The sound acquisition unit 13 acquires at least a sound including the user's voice, and outputs sound information related to the acquired sound to the control unit 18. The sound acquisition unit 13 has one microphone 131. The microphone 131 acquires ambient sounds including the user's sound (sound) located in front of the information terminal 10, converts the acquired sound into an analog sound signal (sound information), and outputs the acquired sound to the control unit 18.

操作部１４は、ユーザ（例えば、各住戸Ｅ２の住人等）の操作を受け付けるように構成されている。操作部１４は、少なくとも通話ボタンを有している。通話ボタンは、ロビーインターホン２０、又は玄関子機４０からの呼び出しに対して、ロビーインターホン２０、又は玄関子機４０との通信（訪問者等との通話）を開始するためのボタンである。つまり、第１通信部１１が住人を呼び出すための呼出信号を受信している状態で通話ボタンが押されると、ロビーインターホン２０、又は玄関子機４０と情報端末１０との間で音声通話が可能になる。 The operation unit 14 is configured to accept operations by a user (for example, a resident of each dwelling unit E2). The operation unit 14 has at least a call button. The call button is a button for starting communication (call with a visitor or the like) with the lobby intercom 20 or the entrance slave unit 40 in response to a call from the lobby intercom 20 or the entrance slave unit 40. That is, if the call button is pressed while the first communication unit 11 is receiving the call signal for calling the resident, a voice call can be made between the lobby intercom 20 or the entrance slave unit 40 and the information terminal 10. become.

出力部１５は、例えばスピーカである。出力部１５は、情報端末１０がロビーインターホン２０と通話可能な状態である場合には、ロビーインターホン２０から送信された音データに基づいた音（訪問者等の音声を含む）を出力する。出力部１５は、情報端末１０が玄関子機４０と通話可能な状態である場合には、玄関子機４０から送信された音データに基づいた音（訪問者等の音声を含む）を出力する。 The output unit 15 is, for example, a speaker. When the information terminal 10 is in a state where it can talk to the lobby intercom 20, the output unit 15 outputs a sound (including a voice of a visitor or the like) based on the sound data transmitted from the lobby intercom 20. When the information terminal 10 is in a state where it can talk to the entrance slave unit 40, the output unit 15 outputs a sound (including a voice of a visitor or the like) based on the sound data transmitted from the entrance slave unit 40. ..

表示部１６は、例えば、液晶ディスプレイである。表示部１６は、映像を表示するように構成されている。表示部１６は、情報端末１０がロビーインターホン２０と通話（通信）可能な状態である場合に、通信対象であるロビーインターホン２０が撮像した映像を表示する。表示部１６は、情報端末１０が玄関子機４０と通話（通信）可能な状態である場合に、通信対象である玄関子機４０が撮像した映像を表示する。なお、情報端末１０がタッチパネルディスプレイを備えている場合には、タッチパネルディスプレイが表示部１６と操作部１４とを兼ねてもよい。 The display unit 16 is, for example, a liquid crystal display. The display unit 16 is configured to display an image. The display unit 16 displays an image captured by the lobby intercom 20 to be communicated when the information terminal 10 is in a state of being able to talk (communicate) with the lobby intercom 20. The display unit 16 displays an image captured by the entrance slave unit 40, which is the communication target, when the information terminal 10 is in a state of being able to talk (communicate) with the entrance slave unit 40. When the information terminal 10 is provided with a touch panel display, the touch panel display may also serve as a display unit 16 and an operation unit 14.

記憶部１７は、読み書き可能なメモリで構成されている。記憶部１７は、例えば、フラッシュメモリである。記憶部１７は、例えば、サーバ７０から取得した少なくとも１つの辞書ファイルを記憶する。 The storage unit 17 is composed of a readable and writable memory. The storage unit 17 is, for example, a flash memory. The storage unit 17 stores, for example, at least one dictionary file acquired from the server 70.

辞書ファイルは、音声認識部１８２が音声認識可能な言語（日本語、英語等）の種類、同一言語における表現方法（方言、アクセント等）、同一言語における発音主体（男性、女性、子供、高齢者等）のうち、少なくとも１つに係る情報に基づく辞書ファイルを含む。以下、音声認識部１８２が音声認識可能な言語の種類、同一言語における表現方法、同一言語における発音主体のうち少なくとも１つに基づいた辞書ファイルを、言語情報に基づく辞書ファイルという。 The dictionary file contains the types of languages (Japanese, English, etc.) that the voice recognition unit 182 can recognize, the expression methods in the same language (dialogs, accents, etc.), and the pronunciation subjects (males, women, children, elderly people) in the same language. Etc.), including a dictionary file based on information relating to at least one. Hereinafter, a dictionary file based on at least one of a language type capable of voice recognition by the voice recognition unit 182, an expression method in the same language, and a pronunciation subject in the same language is referred to as a dictionary file based on language information.

さらに、辞書ファイルは、制御用ワードの検出のトリガーとなる所定のキーワードを含む辞書ファイル、及び制御用ワードを含む辞書ファイルを含む。 Further, the dictionary file includes a dictionary file containing a predetermined keyword that triggers the detection of the control word, and a dictionary file containing the control word.

制御部１８は、図１に示すように、音声処理部１８１、音声認識部１８２、制御処理部１８３、表示処理部１８４及び送出部１８５を有している。 As shown in FIG. 1, the control unit 18 includes a voice processing unit 181, a voice recognition unit 182, a control processing unit 183, a display processing unit 184, and a transmission unit 185.

音声処理部１８１は、音取得部１３が出力したアナログの音信号を取得する。音声処理部１８１は、取得したアナログの音信号を、デジタルの音信号に変換する。また、音声処理部１８１は、音取得部１３から取得した音信号に対して所定のフィルタリング処理等を行うように構成されている。音声処理部１８１は、例えばエコーキャンセラを含む。エコーキャンセラは、マイクロホン１３１から出力された音信号に対してエコーの抑制又は除去を行う。 The voice processing unit 181 acquires the analog sound signal output by the sound acquisition unit 13. The voice processing unit 181 converts the acquired analog sound signal into a digital sound signal. Further, the voice processing unit 181 is configured to perform a predetermined filtering process or the like on the sound signal acquired from the sound acquisition unit 13. The voice processing unit 181 includes, for example, an echo canceller. The echo canceller suppresses or removes echoes from the sound signal output from the microphone 131.

音声認識部１８２は、音取得部１３が取得した音声に基づいた音声認識処理を行う。音声認識部１８２は、音声処理部１８１が処理した音に対して、記憶部１７が記憶している少なくとも１つの辞書ファイルを用いた音声認識処理を行う。 The voice recognition unit 182 performs voice recognition processing based on the voice acquired by the sound acquisition unit 13. The voice recognition unit 182 performs voice recognition processing on the sound processed by the voice processing unit 181 using at least one dictionary file stored in the storage unit 17.

音声認識部１８２は、情報端末１０とインターホン玄関装置（ロビーインターホン２０、玄関子機４０）との間で通信中、つまりユーザが通話中（インターホン通話中）において、音声認識処理が実行可能に構成されている。 The voice recognition unit 182 is configured to be able to execute voice recognition processing during communication between the information terminal 10 and the intercom entrance device (lobby intercom 20, entrance slave unit 40), that is, while the user is talking (intercom call). Has been done.

音声認識部１８２は、インターホン玄関装置（ロビーインターホン２０、玄関子機４０）から呼び出しが行われている場合、つまりインターホンが非通話である場合に、音声認識処理により、音声処理部１８１が処理した音に所定のキーワードが含まれるか否かを判断する。音声認識部１８２は、音声処理部１８１が処理した音に所定のキーワードが含まれると判断する場合には、通信対象のインターホン玄関装置（ロビーインターホン２０、玄関子機４０）との通信（通話）を開始するよう、制御処理部１８３に指示する。音声認識部１８２は、情報端末１０とインターホン玄関装置との間の通信が終了するまで、音声認識処理を行う。例えば、音声認識部１８２は、音声処理部１８１が処理した音に所定のキーワードが含まれると判断する場合には、インターホンシステム１における制御に係るキーワード（制御用ワード）の検出に係る音声認識処理を開始する。 The voice recognition unit 182 was processed by the voice recognition unit 181 by the voice recognition process when the call was made from the intercom entrance device (lobby intercom 20, entrance slave unit 40), that is, when the intercom was non-calling. Determine if the sound contains a given keyword. When the voice recognition unit 182 determines that the sound processed by the voice processing unit 181 includes a predetermined keyword, the voice recognition unit 182 communicates (calls) with the intercom entrance device (lobby intercom 20, entrance slave unit 40) to be communicated. Is instructed to start the control processing unit 183. The voice recognition unit 182 performs voice recognition processing until the communication between the information terminal 10 and the intercom entrance device is completed. For example, when the voice recognition unit 182 determines that the sound processed by the voice processing unit 181 includes a predetermined keyword, the voice recognition process related to the detection of the keyword (control word) related to the control in the intercom system 1. To start.

音声認識部１８２は、情報端末１０とインターホン玄関装置との間の通信が終了するまで、制御用ワードの検出に係る音声認識処理を行う。具体的には、音声認識部１８２は、音声処理部１８１が処理した音に制御用ワードが含まれるか否かを判断する。 The voice recognition unit 182 performs voice recognition processing related to the detection of the control word until the communication between the information terminal 10 and the intercom entrance device is completed. Specifically, the voice recognition unit 182 determines whether or not the sound processed by the voice processing unit 181 includes a control word.

音声認識部１８２は、所定のキーワードを取得する前に操作部１４に対する操作によりインターホン玄関装置（ロビーインターホン２０、玄関子機４０）と情報端末１０との間で通話が行われている場合には、所定のキーワードを取得するまでは、制御用ワードの検出に係る音声認識処理は行わない。音声認識部１８２は、通話中において、音声処理部１８１が処理した音に所定のキーワードが含まれるか否かを判断する。音声認識部１８２は、音声処理部１８１が処理した音に所定のキーワードが含まれると判断する場合には、制御用ワードの検出に係る音声認識処理を開始する。 When the voice recognition unit 182 is operating the operation unit 14 before acquiring a predetermined keyword to make a call between the intercom entrance device (lobby intercom 20, entrance slave unit 40) and the information terminal 10. , The voice recognition process related to the detection of the control word is not performed until the predetermined keyword is acquired. The voice recognition unit 182 determines whether or not the sound processed by the voice processing unit 181 includes a predetermined keyword during a call. When the voice recognition unit 182 determines that the sound processed by the voice processing unit 181 includes a predetermined keyword, the voice recognition unit 182 starts the voice recognition process related to the detection of the control word.

音声認識部１８２は、音声認識で参照する情報である辞書ファイルを変更可能に構成されている。ここで、情報の変更は、音声認識部１８２が音声認識可能なキーワードの追加を含む。キーワードは、例えば、情報端末１０の動作及び情報端末１０により制御可能な機器の動作のうち少なくとも一方の一部の動作であって、現時点で登録されている制御用ワード又は所定のキーワードにより制御される当該一部の動作以外の動作を制御するためのワードを含む。上記一部の動作は、インターホンシステム１における通話（インターホン通話）に関する動作を含む。また、上記一部の動作は、情報端末１０が設置された施設５で入退するための玄関扉２００に設けられた電気錠２０１の開錠動作を含む。上記一部の動作以外の動作とは、例えばインターホン通話に関する動作及び電気錠２０１の開錠動作とは異なる動作である。これにより、音声認識で制御可能な動作を追加することができる。 The voice recognition unit 182 is configured so that the dictionary file, which is the information referred to in the voice recognition, can be changed. Here, the change of information includes the addition of a keyword that can be voice-recognized by the voice recognition unit 182. The keyword is, for example, the operation of at least one of the operation of the information terminal 10 and the operation of the device that can be controlled by the information terminal 10, and is controlled by a control word registered at present or a predetermined keyword. Includes words to control actions other than some of the actions. Some of the above operations include operations related to a call (intercom call) in the intercom system 1. In addition, some of the above operations include an unlocking operation of the electric lock 201 provided on the entrance door 200 for entering and exiting at the facility 5 where the information terminal 10 is installed. The operation other than the above-mentioned partial operation is, for example, an operation different from the operation related to the intercom call and the unlocking operation of the electric lock 201. As a result, it is possible to add an operation that can be controlled by voice recognition.

さらに、情報の変更は、音声認識部１８２が音声認識可能な言語の種類（日本語、英語等）の追加、同一言語における表現方法（方言等）の追加、及び同一言語における発音主体（男性、女性、子供、高齢者等）の追加のうち少なくとも１つを含む。具体的には、情報の変更は、音声認識部１８２が音声認識可能な言語の種類、同一言語における表現方法、同一言語における発音主体のうち少なくとも１つに基づいた辞書ファイル（言語情報に基づく辞書ファイル）の追加を含む。 Furthermore, the information can be changed by adding the types of languages (Japanese, English, etc.) that the voice recognition unit 182 can recognize, adding expression methods (dialects, etc.) in the same language, and pronouncing subjects (male, etc.) in the same language. Includes at least one of the additions (women, children, seniors, etc.). Specifically, the information is changed by a dictionary file (dictionary based on language information) based on at least one of the type of language that the voice recognition unit 182 can recognize voice, the expression method in the same language, and the pronunciation subject in the same language. Includes the addition of files).

音声認識部１８２は、情報の変更に係る操作を操作部１４が受け付けると、操作部１４が受け付けた操作に基づいて、音声認識で参照する情報の変更内容を決定する。例えば、音声認識部１８２は、音声認識で参照する情報の変更内容として、追加するキーワードを含む辞書ファイル及び言語情報に基づく辞書ファイルのうち少なくとも一方を決定する。音声認識部１８２は、決定した辞書ファイルをサーバ７０に要求する。音声認識部１８２は、サーバ７０に要求した辞書ファイルをサーバ７０から受信すると、受信した辞書ファイルを記憶部１７に記憶する。 When the operation unit 14 receives an operation related to the change of information, the voice recognition unit 182 determines the content of the change of the information referred to by the voice recognition based on the operation received by the operation unit 14. For example, the voice recognition unit 182 determines at least one of a dictionary file including a keyword to be added and a dictionary file based on language information as a change content of the information referred to in the voice recognition. The voice recognition unit 182 requests the determined dictionary file from the server 70. When the voice recognition unit 182 receives the dictionary file requested from the server 70 from the server 70, the voice recognition unit 182 stores the received dictionary file in the storage unit 17.

音声認識部１８２は、音声認識で参照する情報の変更内容として、記憶部１７で記憶している辞書ファイルの削除を決定する。音声認識部１８２は、削除対象の辞書ファイルを記憶部１７から削除する。 The voice recognition unit 182 decides to delete the dictionary file stored in the storage unit 17 as a change content of the information referred to by the voice recognition. The voice recognition unit 182 deletes the dictionary file to be deleted from the storage unit 17.

制御処理部１８３は、音声認識部１８２の音声認識結果に基づいて制御を行う。制御処理部１８３は、情報端末１０の動作、及び情報端末１０により制御可能な機器の動作のうち少なくとも一方の一部の動作を、音声認識部１８２の音声認識結果に基づいて制御可能に構成されている。 The control processing unit 183 controls based on the voice recognition result of the voice recognition unit 182. The control processing unit 183 is configured to be able to control the operation of the information terminal 10 and the operation of at least one part of the operation of the device controllable by the information terminal 10 based on the voice recognition result of the voice recognition unit 182. ing.

制御処理部１８３は、通信対象のインターホン玄関装置（ロビーインターホン２０、玄関子機４０）との通信（通話）を開始する指示を受け取ると、情報端末１０の動作を制御する。例えば、制御処理部１８３は、情報端末１０とインターホン玄関装置（ロビーインターホン２０、玄関子機４０）との間で通話が行えるように、情報端末１０とインターホン玄関装置との間の通信を確立するように、第１通信部１１又は第２通信部１２を制御する。 The control processing unit 183 controls the operation of the information terminal 10 when it receives an instruction to start communication (call) with the intercom entrance device (lobby intercom 20, entrance slave unit 40) to be communicated. For example, the control processing unit 183 establishes communication between the information terminal 10 and the intercom entrance device so that a call can be made between the information terminal 10 and the intercom entrance device (lobby intercom 20, entrance slave unit 40). As described above, the first communication unit 11 or the second communication unit 12 is controlled.

制御処理部１８３は、音声処理部１８１が処理した音において制御用ワードが含まれるか否かの音声認識部１８２による判断結果に応じて、情報端末１０により制御可能な機器の動作を制御する。例えば、制御処理部１８３は、音声認識部１８２による制御用ワードの検出の有無に応じて、インターホンシステム１における制御に係る処理を行う。より詳細には、制御処理部１８３は、音声処理部１８１が処理した音に制御用ワードが含まれると音声認識部１８２が判断すると、音声処理部１８１が処理した音に含まれる制御用ワードに応じた処理を行う。例えば、情報端末１０がロビーインターホン２０と通信中に、情報端末１０は、ユーザから制御用ワードとして“ドアを開けて”を含む音声を受け取る。この場合、音声認識部１８２は、音声処理部１８１が処理した音声に制御用ワード“ドアを開けて”が含まれると判断する。制御処理部１８３は、共用玄関Ｅ１から入室するための玄関扉２００の電気錠２０１の開錠動作を制御する。例えば、制御処理部１８３は、共用玄関Ｅ１から入室するための玄関扉２００の電気錠が開錠動作を行うように制御装置３０を制御する。 The control processing unit 183 controls the operation of the device that can be controlled by the information terminal 10 according to the determination result by the voice recognition unit 182 whether or not the sound processed by the voice processing unit 181 includes a control word. For example, the control processing unit 183 performs processing related to control in the intercom system 1 depending on whether or not the voice recognition unit 182 detects the control word. More specifically, when the voice recognition unit 182 determines that the sound processed by the voice processing unit 181 includes the control word, the control processing unit 183 sets the control word included in the sound processed by the voice processing unit 181. Perform the corresponding processing. For example, while the information terminal 10 is communicating with the lobby intercom 20, the information terminal 10 receives a voice including "open the door" as a control word from the user. In this case, the voice recognition unit 182 determines that the voice processed by the voice processing unit 181 includes the control word "open the door". The control processing unit 183 controls the unlocking operation of the electric lock 201 of the entrance door 200 for entering the room from the common entrance E1. For example, the control processing unit 183 controls the control device 30 so that the electric lock of the entrance door 200 for entering the room from the common entrance E1 performs an unlocking operation.

さらに、制御処理部１８３は、ユーザから操作部１４が所定の操作を受け付けた場合に、上記玄関扉２００の電気錠２０１の開錠を制御する。 Further, the control processing unit 183 controls the unlocking of the electric lock 201 of the entrance door 200 when the operation unit 14 receives a predetermined operation from the user.

表示処理部１８４は、表示部１６に通信対象であるインターホン玄関装置（ロビーインターホン２０、玄関子機４０）が撮像した画像を表示させるための処理を行う。 The display processing unit 184 performs processing for displaying the image captured by the intercom entrance device (lobby intercom 20, entrance slave unit 40) to be communicated on the display unit 16.

送出部１８５は、音取得部１３が取得した音に係る音データ（音情報）を、通話先の人が操作する装置に出力する。具体的には、送出部１８５は、音声処理部１８１でノイズの抑制又は除去がされた音の信号を、第１通信部１１又は第２通信部１２を介して、通信対象のインターホン玄関装置（ロビーインターホン２０、玄関子機４０）に送信する。例えば、情報端末１０がロビーインターホン２０と通信を行っている場合には、送出部１８５は、音声処理部１８１でノイズの抑制又は除去がされた音の信号を、第１通信部１１を介してロビーインターホン２０に送信する。 The transmission unit 185 outputs sound data (sound information) related to the sound acquired by the sound acquisition unit 13 to a device operated by the person at the other end of the call. Specifically, the transmission unit 185 transmits the sound signal whose noise has been suppressed or removed by the voice processing unit 181 via the first communication unit 11 or the second communication unit 12 to the intercom entrance device (communication target). It is transmitted to the lobby intercom 20 and the entrance slave unit 40). For example, when the information terminal 10 is communicating with the lobby intercom 20, the transmission unit 185 transmits a sound signal whose noise is suppressed or removed by the voice processing unit 181 via the first communication unit 11. It is transmitted to the lobby intercom 20.

（２−２）ロビーインターホン
ロビーインターホン２０は、図２に示すように、通信部２１と、制御部２２と、通話部２３と、表示部２４と、操作部２５と、記憶部２６と、撮像部２７と、を備えている。 (2-2) Lobby intercom As shown in FIG. 2, the lobby intercom 20 has a communication unit 21, a control unit 22, a call unit 23, a display unit 24, an operation unit 25, a storage unit 26, and an image pickup. It is provided with a part 27.

ロビーインターホン２０は、例えばプロセッサ及びメモリを有するマイクロコンピュータを有している。そして、プロセッサがメモリに格納されているプログラムを実行することにより、マイクロコンピュータが制御部２２として機能する。プロセッサが実行するプログラムは、ここではマイクロコンピュータのメモリに予め記録されているが、メモリカード等の非一時的な記録媒体に記録されて提供されてもよいし、インターネット等の電気通信回線を通じて提供されてもよい。 The lobby intercom 20 has, for example, a microcomputer having a processor and a memory. Then, when the processor executes the program stored in the memory, the microcomputer functions as the control unit 22. The program executed by the processor is recorded in advance in the memory of the microcomputer here, but may be recorded in a non-temporary recording medium such as a memory card and provided, or provided through a telecommunications line such as the Internet. May be done.

通信部２１は、情報端末１０（の第１通信部１１）と通信するための通信インタフェースである。通信部２１は、第１幹線６１を介して制御装置３０（の通信部３１）に接続されている。通信部２１は、制御装置３０を介して、情報端末１０に対して音声信号、及び映像信号等を送信する。さらに、通信部２１は、制御装置３０を介して、情報端末１０から音声信号、及び制御信号等を受信する。ここで、ロビーインターホン２０からの通信信号には、情報端末１０を特定するための情報（例えば、アドレス情報等）が含まれている。そのため、この通信信号に含まれるアドレス情報と一致するアドレス情報が割り当てられた情報端末１０のみが通信信号を受信することができる。 The communication unit 21 is a communication interface for communicating with the information terminal 10 (the first communication unit 11). The communication unit 21 is connected to the control device 30 (communication unit 31) via the first trunk line 61. The communication unit 21 transmits an audio signal, a video signal, and the like to the information terminal 10 via the control device 30. Further, the communication unit 21 receives an audio signal, a control signal, and the like from the information terminal 10 via the control device 30. Here, the communication signal from the lobby intercom 20 includes information (for example, address information, etc.) for identifying the information terminal 10. Therefore, only the information terminal 10 to which the address information matching the address information included in the communication signal is assigned can receive the communication signal.

制御部２２は、通信部２１、通話部２３、及び撮像部２７等を制御するように構成されている。 The control unit 22 is configured to control the communication unit 21, the communication unit 23, the image pickup unit 27, and the like.

通話部２３は、スピーカ及びマイクロホンを含み、情報端末１０との間で通話可能に構成されている。 The call unit 23 includes a speaker and a microphone, and is configured to be able to make a call with the information terminal 10.

表示部２４は、例えば、液晶ディスプレイである。表示部２４は、撮像部２７が撮像した映像を表示するように構成されている。また、表示部２４は、訪問者等に対してメッセージを表示するように構成されている。メッセージは、例えば、訪問者等に発話を促すためのメッセージである。表示部２４は、例えば、「お話しください」等のメッセージを表示する。この場合において、同様の音声メッセージを通話部２３のスピーカ、又は通話部２３のスピーカとは別に設けられたスピーカから出力（報知）してもよい。また、表示部２４とスピーカとを併用してもよい。なお、情報端末１０がタッチパネルディスプレイを備えている場合には、タッチパネルディスプレイが表示部２４と操作部２５とを兼ねてもよい。 The display unit 24 is, for example, a liquid crystal display. The display unit 24 is configured to display the image captured by the imaging unit 27. Further, the display unit 24 is configured to display a message to a visitor or the like. The message is, for example, a message for encouraging a visitor or the like to speak. The display unit 24 displays a message such as "Please talk". In this case, the same voice message may be output (notified) from the speaker of the telephone unit 23 or a speaker provided separately from the speaker of the telephone unit 23. Further, the display unit 24 and the speaker may be used together. When the information terminal 10 is provided with a touch panel display, the touch panel display may also serve as a display unit 24 and an operation unit 25.

操作部２５は、ユーザ（例えば、集合住宅５への訪問者、住人等）の操作を受け付けるように構成されている。操作部２５は、例えば、複数の押ボタンスイッチ、及びタッチパネル等を有する入力インタフェースである。 The operation unit 25 is configured to accept operations by a user (for example, a visitor to the housing complex 5, a resident, etc.). The operation unit 25 is, for example, an input interface having a plurality of push button switches, a touch panel, and the like.

記憶部２６は、読み書き可能なメモリで構成されている。記憶部２６は、例えば、フラッシュメモリである。記憶部２６は、例えば、撮像部２７で撮像された映像（画像）の映像データを記憶する。 The storage unit 26 is composed of a readable and writable memory. The storage unit 26 is, for example, a flash memory. The storage unit 26 stores, for example, video data of a video (image) captured by the imaging unit 27.

撮像部２７は、撮像素子を有し、被写体（ユーザ）を撮像するためのカメラである。本実施形態では、撮像部２７の撮像エリア（視野）は、情報端末１０の前方に設定されている。本実施形態では、撮像部２７は動画を撮像するカメラである。さらに、本実施形態では、撮像部２７はカラー画像を撮像するカメラである。なお、撮像部２７は、静止画を撮像するカメラ（スチルカメラ）であってもよいし、モノクロ画像を撮像するカメラであってもよい。 The image pickup unit 27 is a camera having an image pickup element and for taking an image of a subject (user). In the present embodiment, the imaging area (field of view) of the imaging unit 27 is set in front of the information terminal 10. In the present embodiment, the imaging unit 27 is a camera that captures a moving image. Further, in the present embodiment, the imaging unit 27 is a camera that captures a color image. The image pickup unit 27 may be a camera (still camera) that captures a still image, or may be a camera that captures a monochrome image.

撮像素子は、例えば、ＣＣＤ（Charge Coupled Devices）イメージセンサ、又はＣＭＯＳ（Complementary Metal-Oxide Semiconductor）イメージセンサ等の二次元イメージセンサである。撮像部２７は、被写体からの光をレンズ等の光学系によって撮像素子の撮像面（受光面）上に結像させ、撮像素子にて被写体からの光を電気信号に変換する。そして、撮像部２７は、撮像素子の出力信号を映像信号として制御部２２に出力する。 The image sensor is, for example, a two-dimensional image sensor such as a CCD (Charge Coupled Devices) image sensor or a CMOS (Complementary Metal-Oxide Semiconductor) image sensor. The image pickup unit 27 forms an image of light from the subject on the image pickup surface (light receiving surface) of the image pickup element by an optical system such as a lens, and the image pickup element converts the light from the subject into an electric signal. Then, the image pickup unit 27 outputs the output signal of the image pickup element to the control unit 22 as a video signal.

（２−３）制御装置
制御装置３０は、図２に示すように、通信部３１と、制御部３２と、記憶部３３と、を備えている。 (2-3) Control device As shown in FIG. 2, the control device 30 includes a communication unit 31, a control unit 32, and a storage unit 33.

制御装置３０は、例えばプロセッサ及びメモリを有するマイクロコンピュータを有している。そして、プロセッサがメモリに格納されているプログラムを実行することにより、マイクロコンピュータが制御部３２として機能する。プロセッサが実行するプログラムは、ここではマイクロコンピュータのメモリに予め記録されているが、メモリカード等の非一時的な記録媒体に記録されて提供されてもよいし、インターネット等の電気通信回線を通じて提供されてもよい。 The control device 30 includes, for example, a microcomputer having a processor and a memory. Then, when the processor executes the program stored in the memory, the microcomputer functions as the control unit 32. The program executed by the processor is recorded in advance in the memory of the microcomputer here, but may be recorded in a non-temporary recording medium such as a memory card and provided, or provided through a telecommunications line such as the Internet. May be done.

通信部３１は、各情報端末１０、及びロビーインターホン２０と通信するための通信インタフェースを含んでいる。通信部３１は、第１幹線６１を介してロビーインターホン２０に接続され、第２幹線６２を介して各情報端末１０に接続されている。つまり、通信部３１は、各情報端末１０とロビーインターホン２０との間の通信を中継するように構成されている。さらに、通信部３１は、玄関扉２００の電気錠２０１に接続され、電気錠２０１と通信可能に構成されている。 The communication unit 31 includes a communication interface for communicating with each information terminal 10 and the lobby intercom 20. The communication unit 31 is connected to the lobby intercom 20 via the first trunk line 61, and is connected to each information terminal 10 via the second trunk line 62. That is, the communication unit 31 is configured to relay the communication between each information terminal 10 and the lobby intercom 20. Further, the communication unit 31 is connected to the electric lock 201 of the entrance door 200 and is configured to be able to communicate with the electric lock 201.

制御部３２は、通信部３１を制御するように構成されている。制御部３２は、情報端末１０からの指示により、玄関扉２００の電気錠２０１の開錠動作を制御する。例えば、制御部３２は、開錠を指示する開錠信号を電気錠２０１に通信部３１を介して出力する。 The control unit 32 is configured to control the communication unit 31. The control unit 32 controls the unlocking operation of the electric lock 201 of the entrance door 200 according to the instruction from the information terminal 10. For example, the control unit 32 outputs an unlocking signal instructing unlocking to the electric lock 201 via the communication unit 31.

記憶部３３は、読み書き可能なメモリで構成されている。記憶部３３は、例えば、フラッシュメモリである。記憶部３３は、例えば、各住戸Ｅ２に割り当てられた部屋番号と、各情報端末１０に割り当てられたアドレス情報との対応関係を表す対応テーブルを記憶する。つまり、制御装置３０では、制御部３２は、対応テーブルを参照して、ロビーインターホン２０からの信号に含まれる部屋番号を対応する情報端末１０のアドレス情報に置き換えた信号を作成し、この信号を通信部３１から各情報端末１０に送信させる。そして、各情報端末１０では、制御部１８は、第１通信部１１が受信した信号に含まれるアドレス情報が、記憶部１７に記憶されているアドレス情報と一致する場合には、この信号に含まれる情報を取得する。また、各情報端末１０では、制御部１８は、第１通信部１１が受信した信号に含まれるアドレス情報が、記憶部１７に記憶されているアドレス情報と一致しない場合には、この信号に含まれる情報を破棄する。 The storage unit 33 is composed of a readable and writable memory. The storage unit 33 is, for example, a flash memory. The storage unit 33 stores, for example, a correspondence table showing the correspondence between the room number assigned to each dwelling unit E2 and the address information assigned to each information terminal 10. That is, in the control device 30, the control unit 32 refers to the corresponding table, creates a signal in which the room number included in the signal from the lobby interphone 20 is replaced with the address information of the corresponding information terminal 10, and uses this signal. The communication unit 31 transmits the information to each information terminal 10. Then, in each information terminal 10, when the address information included in the signal received by the first communication unit 11 matches the address information stored in the storage unit 17, the control unit 18 includes the address information in this signal. Get the information. Further, in each information terminal 10, when the address information included in the signal received by the first communication unit 11 does not match the address information stored in the storage unit 17, the control unit 18 includes the address information in this signal. Discard the information.

なお、本実施形態では、通信部３１が電気錠２０１と接続されている構成としたが、この構成に限定されない。制御装置３０は、通信部３１とは別の通信部を有し、当該別の通信部が電気錠２０１と接続される構成であってもよい。 In the present embodiment, the communication unit 31 is connected to the electric lock 201, but the configuration is not limited to this. The control device 30 may have a communication unit different from the communication unit 31, and the other communication unit may be connected to the electric lock 201.

（２−４）玄関子機
各玄関子機４０は、図２に示すように、接続線６４を介して対応する情報端末１０に接続されている。玄関子機４０は、情報端末１０に対して音声信号、及び映像信号等を送信する。さらに、玄関子機４０は、情報端末１０から音声信号、及び制御信号等を受信する。 (2-4) Entrance Slave Unit As shown in FIG. 2, each entrance slave unit 40 is connected to the corresponding information terminal 10 via a connection line 64. The entrance slave unit 40 transmits an audio signal, a video signal, and the like to the information terminal 10. Further, the entrance slave unit 40 receives an audio signal, a control signal, and the like from the information terminal 10.

（３）動作
ここでは、情報端末１０の動作について説明する。 (3) Operation Here, the operation of the information terminal 10 will be described.

（３−１）辞書ファイルの設定処理
まず、情報端末１０が辞書ファイルの設定（変更）する際の処理、特に辞書ファイルを追加する際の処理について、図３を用いて説明する。 (3-1) Dictionary File Setting Process First, a process when the information terminal 10 sets (changes) a dictionary file, particularly a process when adding a dictionary file will be described with reference to FIG.

情報端末１０の音声認識部１８２は、情報の変更に係る操作を操作部１４が受け付けると、操作部１４が受け付けた操作に基づいて、音声認識で参照する情報の変更内容を決定する（ステップＳ１）。具体的には、表示部１６は、追加可能な辞書ファイルの一覧を表示する。操作部１４は、表示部１６で表示された辞書ファイルの一覧から、追加対象となる少なくとも１つの辞書ファイルの選択に係る操作を受け付ける。音声認識部１８２は、選択された少なくとも１つの辞書ファイルを特定することで、音声認識で参照する情報の変更内容を決定する。音声認識部１８２は、選択された少なくとも１つの辞書ファイルを、サーバ７０から取得する（ステップＳ２）。具体的には、音声認識部１８２は、選択された少なくとも１つの辞書ファイルをサーバ７０に要求し、要求した少なくとも１つの辞書ファイルをサーバ７０から受信する。 When the operation unit 14 receives an operation related to the change of information, the voice recognition unit 182 of the information terminal 10 determines the content of the change of the information referred to by the voice recognition based on the operation received by the operation unit 14 (step S1). ). Specifically, the display unit 16 displays a list of dictionary files that can be added. The operation unit 14 accepts an operation related to selection of at least one dictionary file to be added from the list of dictionary files displayed on the display unit 16. The voice recognition unit 182 determines the content of the change of the information referred to in the voice recognition by specifying at least one selected dictionary file. The voice recognition unit 182 acquires at least one selected dictionary file from the server 70 (step S2). Specifically, the voice recognition unit 182 requests the server 70 for at least one selected dictionary file, and receives the requested at least one dictionary file from the server 70.

音声認識部１８２は、設定処理を行う（ステップＳ３）。具体的には、音声認識部１８２は、取得した少なくとも１つの辞書ファイルを、記憶部１７に記憶する。 The voice recognition unit 182 performs the setting process (step S3). Specifically, the voice recognition unit 182 stores at least one acquired dictionary file in the storage unit 17.

（３−２）通話時の動作
ここでは、通話時における情報端末１０の動作について、図４を用いて説明する。 (3-2) Operation during a call Here, the operation of the information terminal 10 during a call will be described with reference to FIG.

音取得部１３は、インターホン玄関装置（ロビーインターホン２０、玄関子機４０）から呼び出しが行われている場合、住戸Ｅ２のユーザからの音声を含む音を取得する（ステップＳ１１）。 The sound acquisition unit 13 acquires the sound including the voice from the user of the dwelling unit E2 when the call is made from the intercom entrance device (lobby intercom 20, entrance slave unit 40) (step S11).

音声認識部１８２は、インターホン玄関装置から呼び出しが行われている場合、つまりインターホンが非通話である場合に、音声認識処理により、音声処理部１８１が処理した音に所定のキーワードが含まれるか否かを判断する（ステップＳ１２）。 When the voice recognition unit 182 is called from the intercom entrance device, that is, when the intercom is non-calling, whether or not the sound processed by the voice recognition unit 181 includes a predetermined keyword by the voice recognition process. (Step S12).

音声処理部１８１が処理した音に所定のキーワードが含まれないと音声認識部１８２が判断する場合（ステップＳ１２における「Ｎｏ」）、処理はステップＳ１１に戻る。 When the voice recognition unit 182 determines that the sound processed by the voice processing unit 181 does not include a predetermined keyword (“No” in step S12), the process returns to step S11.

音声処理部１８１が処理した音に所定のキーワードが含まれると音声認識部１８２が判断する場合（ステップＳ１２における「Ｙｅｓ」）、音声認識部１８２は、音声認識処理を行う。具体的には、音声認識部１８２は、インターホン玄関装置との通信（通話）を開始するよう、制御処理部１８３を制御する。音声認識部１８２は、情報端末１０とインターホン玄関装置との間の通信が終了するまで、音声認識処理を行う。より詳細には、音声認識部１８２は、通話開始後において、インターホンシステム１における制御に係るキーワード（制御用ワード）の検出に係る音声認識処理を行う。 When the voice recognition unit 182 determines that the sound processed by the voice processing unit 181 includes a predetermined keyword (“Yes” in step S12), the voice recognition unit 182 performs the voice recognition process. Specifically, the voice recognition unit 182 controls the control processing unit 183 so as to start communication (call) with the intercom entrance device. The voice recognition unit 182 performs voice recognition processing until the communication between the information terminal 10 and the intercom entrance device is completed. More specifically, the voice recognition unit 182 performs voice recognition processing related to the detection of the keyword (control word) related to the control in the intercom system 1 after the start of the call.

音声認識部１８２は、制御用ワードを検出したか否かを判断する（ステップＳ１４）。具体的には、音声認識部１８２は、音声処理部１８１が処理した音に制御用ワードが含まれるか否かを判断する。音声認識部１８２は、音声処理部１８１が処理した音に制御用ワードが含まれると判断する場合、制御用ワードを検出したと判断する。音声認識部１８２は、音声処理部１８１が処理した音に制御用ワードが含まれないと判断する場合、制御用ワードを検出していないと判断する。 The voice recognition unit 182 determines whether or not the control word has been detected (step S14). Specifically, the voice recognition unit 182 determines whether or not the sound processed by the voice processing unit 181 includes a control word. When the voice recognition unit 182 determines that the sound processed by the voice processing unit 181 includes the control word, the voice recognition unit 182 determines that the control word has been detected. When the voice recognition unit 182 determines that the sound processed by the voice processing unit 181 does not include the control word, it determines that the control word has not been detected.

制御用ワードを検出していないと音声認識部１８２が判断した場合（ステップＳ１４における「Ｎｏ」）、処理はステップＳ１３に戻る。音声認識部１８２は、通話中に音声処理部１８１が処理した音を用いて制御用ワードの検出に係る音声認識処理を行う。 When the voice recognition unit 182 determines that the control word has not been detected (“No” in step S14), the process returns to step S13. The voice recognition unit 182 performs voice recognition processing related to the detection of the control word using the sound processed by the voice processing unit 181 during the call.

制御用ワードを検出したと音声認識部１８２が判断した場合（ステップＳ１４における「Ｙｅｓ」）、制御処理部１８３は、制御処理を行う（ステップＳ１５）。制御処理部１８３は、音声認識部１８２が検出した制御用ワードに応じて、情報端末１０により制御対象の機器の動作を制御する。例えば、情報端末１０がロビーインターホン２０と通信中に、情報端末１０は、ユーザから制御用ワードとして“ドアを開けて”を含む音声を受け取る。この場合、制御処理部１８３は、共用玄関Ｅ１から入室するための玄関扉２００の電気錠２０１の開錠を行うように制御装置３０を制御する。 When the voice recognition unit 182 determines that the control word has been detected (“Yes” in step S14), the control processing unit 183 performs the control process (step S15). The control processing unit 183 controls the operation of the device to be controlled by the information terminal 10 according to the control word detected by the voice recognition unit 182. For example, while the information terminal 10 is communicating with the lobby intercom 20, the information terminal 10 receives a voice including "open the door" as a control word from the user. In this case, the control processing unit 183 controls the control device 30 so as to unlock the electric lock 201 of the entrance door 200 for entering the room from the common entrance E1.

制御処理部１８３は、通話が終了したか否かを判断する（ステップＳ１６）。通話が終了したと制御処理部１８３が判断する場合（ステップＳ１６における「Ｙｅｓ」）、処理は終了する。通話が終了したと制御処理部１８３が判断する場合（ステップＳ１６における「Ｙｅｓ」）、処理はステップＳ１３に戻る。 The control processing unit 183 determines whether or not the call has ended (step S16). When the control processing unit 183 determines that the call has ended (“Yes” in step S16), the processing ends. When the control processing unit 183 determines that the call has ended (“Yes” in step S16), the process returns to step S13.

（４）利点
以上説明したように、本実施形態の情報端末１０は、インターホン装置（インターホン親機）として動作する。情報端末１０は、音取得部１３と、音声認識部１８２と、制御処理部１８３と、を備える。音取得部１３は、少なくともユーザの音声を含む音を取得する。音声認識部１８２は、音取得部１３が取得した音に基づいた音声認識を行う。制御処理部１８３は、音声認識部１８２の音声認識結果に基づいて制御を行う。音声認識部１８２は、音声認識で参照する情報を変更可能に構成されている。 (4) Advantages As described above, the information terminal 10 of the present embodiment operates as an intercom device (intercom master unit). The information terminal 10 includes a sound acquisition unit 13, a voice recognition unit 182, and a control processing unit 183. The sound acquisition unit 13 acquires at least a sound including the user's voice. The voice recognition unit 182 performs voice recognition based on the sound acquired by the sound acquisition unit 13. The control processing unit 183 controls based on the voice recognition result of the voice recognition unit 182. The voice recognition unit 182 is configured so that the information referred to in the voice recognition can be changed.

この構成によると、音声認識で参照する情報を変更することが可能であるので、ユーザの音声の特徴に応じた情報を音声認識で参照することが可能になる。これにより、本実施形態の情報端末１０は、音声認識の精度をより高めることが可能である。 According to this configuration, it is possible to change the information referred to by voice recognition, so that it is possible to refer to information according to the characteristics of the user's voice by voice recognition. As a result, the information terminal 10 of the present embodiment can further improve the accuracy of voice recognition.

本実施形態において、音声認識で参照する情報の変更は、音声認識部１８２が音声認識可能な言語の種類の追加、同一言語における表現方法の追加、及び同一言語における発音主体の追加のうち少なくとも１つを含む。 In the present embodiment, the change of the information referred to by the voice recognition is at least one of the addition of the type of language that the voice recognition unit 182 can recognize by voice, the addition of the expression method in the same language, and the addition of the pronunciation subject in the same language. Including one.

これにより、音声認識に用いる情報をユーザに適した情報に変更することができるので、音声認識の処理負荷を高めることなく、精度良く行うことが可能である。
音声認識の精度をより高めることできる。 As a result, the information used for voice recognition can be changed to information suitable for the user, so that the information can be performed accurately without increasing the processing load of voice recognition.
The accuracy of voice recognition can be further improved.

（５）変形例
上記実施形態は、本開示の様々な実施形態の一つに過ぎない。上記実施形態は、本開示の目的を達成できれば、設計等に応じて種々の変更が可能である。 (5) Modified Example The above embodiment is only one of various embodiments of the present disclosure. The above-described embodiment can be changed in various ways depending on the design and the like as long as the object of the present disclosure can be achieved.

以下、上記の実施形態の変形例を列挙する。以下に説明する変形例は、適宜組み合わせて適用可能である。 Hereinafter, modifications of the above embodiment will be listed. The modifications described below can be applied in combination as appropriate.

（５−１）変形例１
インターホンシステム１は、住戸Ｅ２に設けられた警報システム８０（図５参照）と連動させてもよい。警報システム８０は、例えば、不審者の侵入を検知するシステム、火災等を検知するシステムを含む。情報端末１０は、警報システム８０が不審者の侵入を検知すると、警報音を出力する。さらに、情報端末１０は、警報システム８０が火災を検知すると、警報音を出力する。 (5-1) Modification 1
The intercom system 1 may be linked with the alarm system 80 (see FIG. 5) provided in the dwelling unit E2. The warning system 80 includes, for example, a system for detecting the intrusion of a suspicious person, a system for detecting a fire, and the like. When the alarm system 80 detects the intrusion of a suspicious person, the information terminal 10 outputs an alarm sound. Further, the information terminal 10 outputs an alarm sound when the alarm system 80 detects a fire.

この場合、制御処理部１８３が制御可能な制御として警報システム８０の確認動作を制御してもよい。例えば、制御処理部１８３は、音声認識の結果に応じて、警報音の出力を確認するために当該警報音を出力するように、情報端末１０を制御する。また、制御処理部１８３は、警報音の出力中における音声認識の結果に応じて、警報音の出力を停止するために当該警報音の出力を停止するように、情報端末１０を制御する。 In this case, the confirmation operation of the alarm system 80 may be controlled as controllable control by the control processing unit 183. For example, the control processing unit 183 controls the information terminal 10 so as to output the alarm sound in order to confirm the output of the alarm sound according to the result of voice recognition. Further, the control processing unit 183 controls the information terminal 10 so as to stop the output of the alarm sound in order to stop the output of the alarm sound according to the result of voice recognition during the output of the alarm sound.

なお、音声認識部１８２が、情報の変更（追加）に係る処理時に、警報音の出力又は停止に係る辞書ファイル、つまり警報音の出力又は停止を行うための制御用ワードを含む辞書ファイルを取得した場合には、上述した一部の動作以外の動作として、警報音の出力又は停止が相当する。 The voice recognition unit 182 acquires a dictionary file related to the output or stop of the alarm sound, that is, a dictionary file containing a control word for outputting or stopping the alarm sound at the time of processing related to the change (addition) of the information. In this case, the output or stop of the alarm sound corresponds to the operation other than the above-mentioned partial operation.

（５−２）変形例２
上記実施形態では、音声認識の結果に応じた情報端末１０の動作を制御として、通信対象のインターホン玄関装置（ロビーインターホン２０、玄関子機４０）に応じた第１通信部１１又は第２通信部１２の制御を一例として説明した。しかしながら、情報端末１０の動作の制御は、これに限定されない。 (5-2) Modification 2
In the above embodiment, the operation of the information terminal 10 according to the result of voice recognition is controlled, and the first communication unit 11 or the second communication unit according to the intercom entrance device (lobby intercom 20, entrance slave unit 40) to be communicated is controlled. Twelve controls have been described as an example. However, the control of the operation of the information terminal 10 is not limited to this.

例えば、制御処理部１８３は、音声認識により、出力部１５が出力する音の音量を調整してもよい。なお、音声認識部１８２が、情報の変更（追加）に係る処理時に、音量調整に係る辞書ファイル、つまり音量調整を行うための制御用ワードを含む辞書ファイルを取得した場合には、上述した一部の動作以外の動作として、音量調整が相当する。 For example, the control processing unit 183 may adjust the volume of the sound output by the output unit 15 by voice recognition. When the voice recognition unit 182 acquires a dictionary file related to volume adjustment, that is, a dictionary file containing a control word for adjusting the volume at the time of processing related to information change (addition), the above-mentioned one Volume adjustment corresponds to the operation other than the operation of the part.

（５−３）変形例３
上記実施形態において、インターホン通話中において、音取得部１３が所定のキーワードを取得した場合に音声認識部１８２が制御用ワードの検出に係る音声認識処理を開始する構成としたが、この構成に限定されない。 (5-3) Modification 3
In the above embodiment, the voice recognition unit 182 starts the voice recognition process related to the detection of the control word when the sound acquisition unit 13 acquires a predetermined keyword during the intercom call, but the configuration is limited to this configuration. Not done.

インターホン通話中において、音取得部１３が所定のキーワードを取得することは必須ではない。すなわち、音声認識部１８２は、情報端末１０とインターホン玄関装置（ロビーインターホン２０、玄関子機４０）とを用いた通話が開始されると、音取得部１３が所定のキーワードを取得しなくても制御用ワードの検出に係る音声認識処理を開始する。言い換えると、音声認識部１８２は、インターホン通話中において、音取得部１３による所定のキーワードの取得に依存することなく、制御に係る音声認識として制御用ワードの検出に係る音声認識処理を開始する。 It is not essential for the sound acquisition unit 13 to acquire a predetermined keyword during an intercom call. That is, when the voice recognition unit 182 starts a call using the information terminal 10 and the intercom entrance device (lobby intercom 20, entrance slave unit 40), the sound acquisition unit 13 does not have to acquire a predetermined keyword. The voice recognition process related to the detection of the control word is started. In other words, the voice recognition unit 182 starts the voice recognition process related to the detection of the control word as the voice recognition related to the control without depending on the acquisition of a predetermined keyword by the sound acquisition unit 13 during the intercom call.

（５−４）変形例４
上記実施形態において、情報端末１０は、住戸端末（インターホン親機）とする構成としたが、この構成に限定されない。 (5-4) Modification 4
In the above embodiment, the information terminal 10 is configured to be a dwelling unit terminal (intercom master unit), but is not limited to this configuration.

情報端末１０は、インターホン玄関装置（ロビーインターホン２０、玄関子機４０）と通信可能に構成されている端末であればよく、例えばタブレット端末、スマートフォンであってもよい。 The information terminal 10 may be any terminal that is configured to be able to communicate with the intercom entrance device (lobby intercom 20, entrance slave unit 40), and may be, for example, a tablet terminal or a smartphone.

（５−５）変形例５
本実施形態では、制御装置３０が電気錠２０１の開錠動作を制御する構成としたが、この構成に限定されない。 (5-5) Modification 5
In the present embodiment, the control device 30 controls the unlocking operation of the electric lock 201, but the present invention is not limited to this configuration.

ロビーインターホン２０が電気錠２０１の開錠動作を制御してもよい。この場合、ロビーインターホン２０は、情報端末１０からの電気錠２０１の開錠に係る指示を受け取ると、開錠信号を電気錠２０１に出力する。 The lobby intercom 20 may control the unlocking operation of the electric lock 201. In this case, when the lobby intercom 20 receives the instruction relating to the unlocking of the electric lock 201 from the information terminal 10, the lobby intercom 20 outputs the unlocking signal to the electric lock 201.

（その他の変形例）
上記実施形態は、本開示の様々な実施形態の一つに過ぎない。上記実施形態は、本開示の目的を達成できれば、設計等に応じて種々の変更が可能である。また、情報端末１０と同様の機能は、処理方法、コンピュータプログラム、又はプログラムを記録した非一時的な記録媒体等で具現化されてもよい。一態様に係る情報端末１０の処理方法は、インターホン装置として動作する情報端末で用いられる処理方法である。処理方法は、音取得ステップと、音声認識ステップと、制御処理ステップと、を含む。音取得ステップは、少なくともユーザの音声を含む音を取得する。音声認識ステップは、音取得ステップで取得した音に基づいた音声認識処理を行う。制御処理ステップは、音声認識ステップでの音声認識結果に基づいて制御を行う。音声認識処理は、音声認識で参照する情報を変更可能に構成されている。一態様に係るプログラムは、コンピュータシステムを、上述した情報端末１０又は情報端末１０の処理方法として機能させるためのプログラムである。 (Other variants)
The above embodiment is only one of the various embodiments of the present disclosure. The above-described embodiment can be changed in various ways depending on the design and the like as long as the object of the present disclosure can be achieved. Further, the same function as that of the information terminal 10 may be realized by a processing method, a computer program, a non-temporary recording medium on which the program is recorded, or the like. The processing method of the information terminal 10 according to one aspect is the processing method used in the information terminal operating as an intercom device. The processing method includes a sound acquisition step, a voice recognition step, and a control processing step. The sound acquisition step acquires at least a sound including the user's voice. The voice recognition step performs voice recognition processing based on the sound acquired in the sound acquisition step. The control processing step controls based on the voice recognition result in the voice recognition step. The voice recognition process is configured so that the information referred to in the voice recognition can be changed. The program according to one aspect is a program for making a computer system function as the above-mentioned information terminal 10 or a processing method of the information terminal 10.

本開示における情報端末１０又は情報端末１０の処理方法の実行主体は、コンピュータシステムを含んでいる。コンピュータシステムは、ハードウェアとしてのプロセッサ及びメモリを有する。コンピュータシステムのメモリに記録されたプログラムをプロセッサが実行することによって、本開示における情報端末１０又は情報端末１０の処理方法の実行主体としての機能が実現される。プログラムは、コンピュータシステムのメモリに予め記録されていてもよいが、電気通信回線を通じて提供されてもよい。また、プログラムは、コンピュータシステムで読み取り可能なメモリカード、光学ディスク、ハードディスクドライブ等の非一時的な記録媒体に記録されて提供されてもよい。コンピュータシステムのプロセッサは、半導体集積回路（ＩＣ）又は大規模集積回路（ＬＳＩ）を含む１乃至複数の電子回路で構成される。ここでいうＩＣ又はＬＳＩ等の集積回路は、集積の度合いによって呼び方が異なっており、システムＬＳＩ、ＶＬＳＩ（Very Large Scale Integration）、又はＵＬＳＩ（Ultra Large Scale Integration）と呼ばれる集積回路を含む。さらに、ＬＳＩの製造後にプログラムされる、ＦＰＧＡ（Field-Programmable Gate Array）、又はＬＳＩ内部の接合関係の再構成若しくはＬＳＩ内部の回路区画の再構成が可能な論理デバイスについても、プロセッサとして採用することができる。複数の電子回路は、１つのチップに集約されていてもよいし、複数のチップに分散して設けられていてもよい。複数のチップは、１つの装置に集約されていてもよいし、複数の装置に分散して設けられていてもよい。 The execution subject of the information terminal 10 or the processing method of the information terminal 10 in the present disclosure includes a computer system. A computer system has a processor and memory as hardware. When the processor executes the program recorded in the memory of the computer system, the function as the execution subject of the information terminal 10 or the processing method of the information terminal 10 in the present disclosure is realized. The program may be pre-recorded in the memory of the computer system or may be provided through a telecommunication line. Further, the program may be provided by being recorded on a non-temporary recording medium such as a memory card, an optical disk, or a hard disk drive that can be read by a computer system. A processor in a computer system is composed of one or more electronic circuits including a semiconductor integrated circuit (IC) or a large scale integrated circuit (LSI). The integrated circuit such as IC or LSI referred to here has a different name depending on the degree of integration, and includes an integrated circuit called a system LSI, VLSI (Very Large Scale Integration), or ULSI (Ultra Large Scale Integration). Further, an FPGA (Field-Programmable Gate Array) programmed after the LSI is manufactured, or a logical device capable of reconfiguring the junction relationship inside the LSI or reconfiguring the circuit partition inside the LSI should also be adopted as a processor. Can be done. A plurality of electronic circuits may be integrated on one chip, or may be distributed on a plurality of chips. The plurality of chips may be integrated in one device, or may be distributed in a plurality of devices.

また、情報端末１０における複数の機能が、１つの筐体内に集約されていることは情報端末１０に必須の構成ではなく、情報端末１０の構成要素は、複数の筐体に分散して設けられていてもよい。さらに、情報端末１０の少なくとも一部の機能、例えば、情報端末１０の一部の機能がクラウド（クラウドコンピューティング）等によって実現されてもよい。 Further, it is not an essential configuration for the information terminal 10 that a plurality of functions of the information terminal 10 are integrated in one housing, and the components of the information terminal 10 are distributed and provided in the plurality of housings. You may be. Further, at least a part of the functions of the information terminal 10, for example, a part of the functions of the information terminal 10 may be realized by a cloud (cloud computing) or the like.

（まとめ）
以上説明したように、第１の態様の情報端末（１０）は、インターホン装置として動作する。情報端末（１０）は、音取得部（１３）と、音声認識部（１８２）と、制御処理部（１８３）と、を備える。音取得部（１３）は、少なくともユーザの音声を含む音を取得する。音声認識部（１８２）は、音取得部（１３）が取得した音に基づいた音声認識を行う。制御処理部（１８３）は、音声認識部（１８２）の音声認識結果に基づいて制御を行う。音声認識部（１８２）は、音声認識で参照する情報を変更可能に構成されている。 (summary)
As described above, the information terminal (10) of the first aspect operates as an intercom device. The information terminal (10) includes a sound acquisition unit (13), a voice recognition unit (182), and a control processing unit (183). The sound acquisition unit (13) acquires at least a sound including the user's voice. The voice recognition unit (182) performs voice recognition based on the sound acquired by the sound acquisition unit (13). The control processing unit (183) controls based on the voice recognition result of the voice recognition unit (182). The voice recognition unit (182) is configured so that the information referred to in the voice recognition can be changed.

この構成によると、音声認識の精度をより高めることが可能である。 According to this configuration, it is possible to further improve the accuracy of voice recognition.

第２の態様の情報端末（１０）は、第１の態様において、外部（例えば、サーバ７０）と通信する通信部（例えば、第３通信部１９）を、更に備える。音声認識部（１８２）は、通信部が外部から受信した情報に基づいて音声認識に用いる情報の変更を行う。 In the first aspect, the information terminal (10) of the second aspect further includes a communication unit (for example, a third communication unit 19) that communicates with the outside (for example, the server 70). The voice recognition unit (182) changes the information used for voice recognition based on the information received from the outside by the communication unit.

この構成によると、音声認識に用いる情報を受信するので、情報端末（１０）が音声認識に必要な情報のみを、情報端末（１０）に記憶することができる。これにより、情報端末（１０）は、ローカルで音声認識を行うので、取得した音声に対する応答性が、音声認識を外部の装置で行う場合と比較して高くなる。 According to this configuration, since the information used for voice recognition is received, the information terminal (10) can store only the information necessary for voice recognition in the information terminal (10). As a result, since the information terminal (10) performs voice recognition locally, the responsiveness to the acquired voice becomes higher than in the case where the voice recognition is performed by an external device.

第３の態様の情報端末（１０）では、第１又は第２の態様において、情報の変更は、音声認識部（１８２）が音声認識可能なキーワード（例えば、所定のキーワード、制御用ワード）の追加を含む。 In the information terminal (10) of the third aspect, in the first or second aspect, the change of information is performed on a keyword (for example, a predetermined keyword, a control word) that can be voice-recognized by the voice recognition unit (182). Including additions.

この構成によると、ユーザに応じたキーワードを登録することができる。 According to this configuration, keywords according to the user can be registered.

第４の態様の情報端末（１０）では、第３の態様において、制御処理部（１８３）は、情報端末（１０）の動作、及び情報端末（１０）により制御可能な機器の動作のうち少なくとも一方の一部の動作を、音声認識部（１８２）の音声認識結果に基づいて制御可能に構成されている。変更によって追加されるキーワードは、一部の動作以外の動作を制御するためのワードを含む。 In the information terminal (10) of the fourth aspect, in the third aspect, the control processing unit (183) has at least one of the operation of the information terminal (10) and the operation of the device controllable by the information terminal (10). One part of the operation can be controlled based on the voice recognition result of the voice recognition unit (182). Keywords added by the change include words to control actions other than some actions.

この構成によると、音声認識より制御可能な動作を追加することができる。 According to this configuration, it is possible to add an operation that can be controlled by voice recognition.

第５の態様の情報端末（１０）では、第４の態様において、上記一部の動作は、インターホン通話に関する動作を含む。 In the information terminal (10) of the fifth aspect, in the fourth aspect, some of the above operations include an operation related to an intercom call.

この構成によると、音声認識によりインターホン通話を行うことができる。 According to this configuration, it is possible to make an intercom call by voice recognition.

第６の態様の情報端末（１０）では、第４又は第５の態様において、一部の動作は、情報端末（１０）が設置された施設（５）で入退するための玄関扉（２００）に設けられた電気錠（２０１）の開錠動作、及び施設（５）内の警報システムの確認動作のうち少なくとも一方の動作を含む。 In the information terminal (10) of the sixth aspect, in the fourth or fifth aspect, some operations are performed by the entrance door (200) for entering and exiting at the facility (5) where the information terminal (10) is installed. ), And at least one of the operation of unlocking the electric lock (201) and the operation of confirming the warning system in the facility (5).

この構成によると、音声認識により電子錠の開錠動作及び警報システム（８０）の確認動作のうち少なくとも一方の動作を含むことができる。 According to this configuration, at least one of the unlocking operation of the electronic lock and the confirmation operation of the alarm system (80) can be included by voice recognition.

第７の態様の情報端末（１０）では、第１〜第６のいずれかの態様において、音声認識部（１８２）は、インターホン通話中において音声認識が実行可能に構成されている。 In the information terminal (10) of the seventh aspect, in any one of the first to sixth aspects, the voice recognition unit (182) is configured to be capable of performing voice recognition during an intercom call.

この構成によると、インターホンでの通話中においても音声認識による操作が可能となる。 According to this configuration, it is possible to operate by voice recognition even during a call with the intercom.

第８の態様の情報端末（１０）では、第７の態様において、音声認識部（１８２）は、インターホンが非通話である場合に所定のキーワードを音取得部（１３）が音を取得すると、制御に係る音声認識を開始する。音声認識部（１８２）は、インターホンが通話中である場合に所定のキーワードを音取得部（１３）が音を取得すると制御に係る音声認識を開始、又はインターホンが通話中である場合には所定のキーワードの取得に依存することなく制御に係る音声認識を開始する。 In the information terminal (10) of the eighth aspect, in the seventh aspect, when the voice recognition unit (182) acquires a predetermined keyword when the intercom is non-calling, the sound acquisition unit (13) acquires the sound. Start voice recognition related to control. The voice recognition unit (182) starts voice recognition related to control when the sound acquisition unit (13) acquires a sound of a predetermined keyword when the interphone is in a call, or is predetermined when the interphone is in a call. Start voice recognition related to control without depending on the acquisition of the keyword of.

この構成によると、インターホンによる通話状態に関係なく、音声認識による操作が可能となる。 According to this configuration, the operation by voice recognition is possible regardless of the call state by the intercom.

第９の態様の情報端末（１０）は、第１〜第８のいずれかの態様において、操作部（１４）を、更に備える。音声認識部（１８２）は、操作部（１４）が受け付けた操作に基づいて、音声認識で参照する情報の変更内容を決定する。 The information terminal (10) of the ninth aspect further includes an operation unit (14) in any one of the first to eighth aspects. The voice recognition unit (182) determines the content of the change of the information referred to in the voice recognition based on the operation received by the operation unit (14).

この構成によると、ユーザによる操作に基づいて情報の変更内容を決定することができる。 According to this configuration, it is possible to determine the content of information to be changed based on the operation by the user.

第１０の態様の情報端末（１０）では、第１〜第９のいずれかの態様において、情報の変更は、音声認識部が音声認識可能な言語の種類の追加、同一言語における表現方法の追加、及び同一言語における発音主体の追加のうち少なくとも１つを含む。 In the information terminal (10) of the tenth aspect, in any one of the first to ninth aspects, the change of information is the addition of the type of language that the voice recognition unit can recognize the voice, and the addition of the expression method in the same language. , And at least one of the additions of the sounding subject in the same language.

この構成によると、ユーザが発する音声に適した情報を追加することができる。 According to this configuration, it is possible to add information suitable for the voice emitted by the user.

第１１の態様のインターホンシステム（１）は、第１〜第１０のいずれかの態様の情報端末（１０）と、情報端末（１０）と通信するインターホン玄関装置（ロビーインターホン２０、玄関子機４０）と、を備える。 The intercom system (1) of the eleventh aspect is an intercom entrance device (lobby intercom 20, entrance slave unit 40) that communicates with the information terminal (10) of any one of the first to tenth aspects and the information terminal (10). ) And.

第１２の態様の処理方法は、インターホン装置として動作する情報端末（１０）で用いられる。処理方法は、音取得ステップと、音声認識ステップと、制御ステップと、を含む。音取得ステップは、少なくともユーザの音声を含む音を取得する。音声認識ステップは、音取得ステップで取得した音に基づいた音声認識処理を行う。制御処理ステップは、音声認識ステップでの音声認識結果に基づいて制御を行う。音声認識処理は、音声認識で参照する情報を変更可能に構成されている。 The processing method of the twelfth aspect is used in the information terminal (10) operating as an intercom device. The processing method includes a sound acquisition step, a voice recognition step, and a control step. The sound acquisition step acquires at least a sound including the user's voice. The voice recognition step performs voice recognition processing based on the sound acquired in the sound acquisition step. The control processing step controls based on the voice recognition result in the voice recognition step. The voice recognition process is configured so that the information referred to in the voice recognition can be changed.

この処理方法によると、音声認識の精度をより高めることが可能である。 According to this processing method, it is possible to further improve the accuracy of voice recognition.

第１３の態様のプログラムは、コンピュータに、第１２の態様の処理方法を実行させるためのプログラムである。 The program of the thirteenth aspect is a program for causing a computer to execute the processing method of the twelfth aspect.

このプログラムによると、音声認識の精度をより高めることが可能である。 According to this program, it is possible to improve the accuracy of speech recognition.

１インターホンシステム
５集合住宅
１０情報端末
１３音取得部
１４操作部
１９第３通信部（通信部）
２０ロビーインターホン（インターホン玄関装置）
４０玄関子機（インターホン玄関装置）
７０サーバ（外部）
８０警報システム
１８２音声認識部
１８３制御処理部
２００玄関扉（扉）
２０１電気錠 1 Intercom system 5 Apartment house 10 Information terminal 13 Sound acquisition unit 14 Operation unit 19 Third communication unit (communication unit)
20 Lobby intercom (intercom entrance device)
40 Entrance slave unit (intercom entrance device)
70 server (external)
80 Warning system 182 Voice recognition unit 183 Control processing unit 200 Entrance door (door)
201 electric lock

Claims

An information terminal that operates as an intercom device
A sound acquisition unit that acquires at least the sound including the user's voice,
A voice recognition unit that performs voice recognition based on the sound acquired by the sound acquisition unit, and a voice recognition unit.
A control processing unit that performs control based on the voice recognition result of the voice recognition unit is provided.
The voice recognition unit is configured to be able to change the information referred to for voice recognition.
Information terminal.

Further equipped with a communication unit that communicates with the outside
The voice recognition unit changes the information used for the voice recognition based on the information received from the outside by the communication unit.
The information terminal according to claim 1.

The change of the information includes the addition of a keyword that can be voice-recognized by the voice recognition unit.
The information terminal according to claim 1 or 2.

The control processing unit
The operation of the information terminal and the operation of at least one part of the operation of the device controllable by the information terminal can be controlled based on the voice recognition result of the voice recognition unit.
The keywords added by the change include words for controlling actions other than some of the actions.
The information terminal according to claim 3.

Some of the operations include operations related to intercom calls.
The information terminal according to claim 4.

The partial operation is an operation of at least one of an operation of unlocking an electric lock provided on a door for entering and exiting at a facility where the information terminal is installed and an operation of confirming an alarm system in the facility. include,
The information terminal according to claim 4 or 5.

The voice recognition unit is configured so that the voice recognition can be executed during an intercom call.
The information terminal according to any one of claims 1 to 6.

The voice recognition unit
When the sound acquisition unit acquires the sound with a predetermined keyword when the intercom is non-calling, the voice recognition related to the control is started.
When the sound acquisition unit acquires the sound when the intercom is in a call, the voice recognition related to the control is started, or when the intercom is in a call, the predetermined keyword is acquired. The voice recognition related to the control is started without depending on the control.
The information terminal according to claim 7.

Further equipped with an operation unit
The voice recognition unit determines the content of the change of the information referred to in the voice recognition based on the operation received by the operation unit.
The information terminal according to any one of claims 1 to 8.

The change of the information includes at least one of the addition of the type of language that the voice recognition unit can recognize, the addition of the expression method in the same language, and the addition of the sounding subject in the same language.
The information terminal according to any one of claims 1 to 9.

The information terminal according to any one of claims 1 to 10 and
An intercom entrance device that communicates with the information terminal is provided.
Intercom system.

It is a processing method used in an information terminal that operates as an intercom device.
A sound acquisition step to acquire at least the sound including the user's voice,
A voice recognition step that performs voice recognition processing based on the sound acquired in the sound acquisition step, and a voice recognition step.
A control processing step that controls based on the voice recognition result in the voice recognition step is included.
The voice recognition process is configured so that the information referred to in the voice recognition can be changed.
Processing method.

A program for causing a computer to execute the processing method according to claim 12.