WO2010035524A1

WO2010035524A1 - Intercom system

Info

Publication number: WO2010035524A1
Application number: PCT/JP2009/054739
Authority: WO
Inventors: 桂内田; 倫渡邉; 博之高田
Original assignee: ブラザー工業株式会社
Priority date: 2008-09-23
Filing date: 2009-03-12
Publication date: 2010-04-01
Also published as: JP2010080993A

Abstract

Provided is an intercom system that includes a sub-unit which is placed outside a house, and a base unit which is placed inside the house. The sub-unit includes an imaging means that images a predetermined area in front of the sub-unit and outputs image information about the image of the area, a visitor detection means that detects a visitor, and a sub-unit presenting means that presents information to the visitor. The base unit includes an image acquisition means that acquires the image information when a visitor is detected, a recognition determination means that determines, from the acquired image information, whether the face of the visitor can be recognized, a sub-unit presenting control means that controls, if it is determined that the face of the visitor cannot be recognized, the sub-unit presenting means to allow the sub-unit presenting means to present information representing a message to prompt the visitor to take another action depending on why the recognition of the visitor's face is hampered, a base-unit presenting means that presents information to a person in the house, and a first base-unit presentation control means that controls the base-unit presenting means to transmit information for giving warning of a level which varies according to why the recognition of the visitor's face is hampered.

Description

Intercom system

The present invention relates to an intercom system, and more specifically, to an intercom system including a slave unit installed outdoors and a master unit installed indoors.

2. Description of the Related Art Conventionally, for example, an interphone system including a child device installed outside a house and a parent device installed indoors is known. Among such intercoms, there is an interphone that can capture a visitor with a camera provided in the slave unit and display the captured visitor image on a monitor provided in the master unit. The main purpose of such an intercom system is to make it possible for residents in the room to look at the face of the visitor displayed on the monitor to see who the visitor is and to respond appropriately. That is. Therefore, there is an intercom that asks the visitor for some action when an image that allows the resident to recognize the face of the visitor cannot be acquired. For example, Patent Document 1 discloses an intercom device that outputs a sound for instructing a visitor to move to a predetermined position from a speaker of a slave unit when a normal face cannot be recognized from a captured image.
JP 2005-92262 A

The intercom device described in Patent Literature 1 always gives the same instruction to a visitor even though there are a plurality of reasons why the face cannot be recognized from the captured image. Therefore, even if the visitor handles according to the instruction, there is a possibility that it is not appropriate. Moreover, when a visitor is acting so that a face is not recognized, there is a possibility that the instruction is not followed. If the resident cannot recognize the visitor's face, the resident may respond without warning even if the visitor is a malicious person.

The present invention has been made to solve the above problems, and when an image that can recognize the face of the visitor cannot be acquired, the visitor is instructed to appropriately deal with the warning necessary for the resident. An object is to provide an intercom system capable of prompting the user.

According to the present invention, there is provided an interphone system including a child device installed outdoors and a parent device connected to the child device and installed indoors, wherein the child device is a predetermined front of the child device. Imaging means for imaging the imaging range and outputting captured image information that is image information of the imaging range, visitor detection means for detecting a visitor, and slave unit notification means for notifying the visitor of information The master unit is acquired by the image acquisition unit that acquires the captured image information output from the imaging unit when the visitor is detected by the visitor detection unit; Based on the captured image information, a recognition determination unit that determines whether or not the visitor's face is recognizable, and the recognition determination unit determines that the visitor's face is not recognizable, The handset notification means A slave unit notification control unit for controlling and notifying the visitor of information for prompting different actions according to a cause that hinders recognition; a master unit notification unit for notifying information in the room; and the master unit notification There is provided an intercom system provided with first parent device notification control means for controlling the means to notify information prompting different levels of warning depending on the cause.

1 is a block diagram showing an electrical configuration of an intercom system 1. FIG. FIG. 3 is an external front view of parent device 20 displaying registrant mode screen 310. 4 is an explanatory diagram of a storage area included in a flash ROM 220 of the parent device 20. FIG. 6 is an explanatory diagram showing an example of data stored in a face feature storage area 221 of a flash ROM 220. FIG. 5 is a flowchart of main processing of the master unit 20. It is a flowchart of the recognition judgment process performed during the main process of FIG. It is a detection error process performed during the recognition determination process of FIG. It is a feature extraction error process executed during the recognition determination process of FIG. It is a flowchart of the visitor process performed during the main process of FIG. FIG. 3 is an external front view of master device 20 displaying caution mode screen 320. FIG. 3 is an external front view of parent device 20 displaying warning mode screen 330.

Hereinafter, an embodiment of an intercom system 1 embodying the present invention will be described with reference to the drawings. These drawings are used for explaining the technical features that can be adopted by the present invention, and the configuration of the apparatus and the flowcharts of various processes described are not intended to be limited to the drawings. This is just an illustrative example.

First, with reference to FIGS. 1 to 4, the overall configuration of the interphone system 1 according to the present embodiment, and the configurations of the slave unit 10 and the master unit 20 that are components of the interphone system 1 will be described in order.

The schematic configuration of the intercom system 1 will be described with reference to FIG. As shown in FIG. 1, the intercom system 1 of the present embodiment includes a child device 10 and a parent device 20 connected via a signal line 30. In a house, a company, a building, etc., the subunit | mobile_unit 10 is installed outdoors, and the main | base station 20 is installed indoors. Since a call can be made between the child device 10 and the parent device 20, a person in the room (hereinafter referred to as a room responder) can respond to a visitor outside the room without opening the entrance.

The slave unit 10 will be described with reference to FIG. First, the physical configuration of the slave unit 10 will be described. Although details are not shown, the handset 10 includes a substantially rectangular parallelepiped casing, and a microphone 111, a speaker 112, a camera 113, a call button 114, a call button 114, a front face of the casing (a surface facing the visitor), And infrared LED illumination 117 is provided.

Referring to FIG. 1, the electrical configuration of slave unit 10 will be described. As shown in FIG. 1, the slave unit 10 includes a CPU 101, ROM 102, RAM 103, microphone 111, speaker 112, camera 113, call button 114, human sensor 115, infrared LED illumination 117, and communication device 150. These are all interconnected by a bus.

The CPU 101 controls the entire slave unit 10. The ROM 102 stores programs necessary for basic operations of the slave unit 10 and setting values for the programs. The CPU 101 controls the operation of the child device 10 according to a program stored in the ROM 102. The RAM 103 is a storage device for temporarily storing various data.

The microphone 111 is a device that converts an input voice of a visitor into a voice signal and outputs the voice signal to the parent device 20 via the communication device 150. The speaker 112 is a device that converts the audio signal input from the parent device 20 into sound and outputs the sound.

The camera 113 is, for example, a well-known CCD camera. The camera 113 captures a predetermined imaging range in front of the slave unit 10 and outputs an image signal of the captured image to the master unit 20 via the communication device 150. The predetermined imaging range is set in advance so as to include, for example, an area where the face of the visitor standing facing the front of the child device 10 at the normal magnification is predicted to be located. The camera 113 according to the present embodiment is capable of imaging at different resolutions, imaging at different magnifications, that is, imaging at different angles of view using a zoom function, and still images and moving images. The call button 114 is a button for a visitor to call a room responder. When call button 114 is pressed, a call signal is transmitted to base unit 20 via communication device 150.

Human sensor 115 is a sensor that detects the movement of an object in a predetermined area in front of handset 10. As the human sensor 115, for example, an infrared sensor that emits infrared rays to an object and detects the object based on a change in the amount of received infrared light can be employed. In addition, a human sensor using ultrasonic waves may be adopted as the human sensor 115. In the present embodiment, the human sensor 115 constantly detects a moving object, and transmits a signal indicating the result to the parent device 20 via the first communication device 250.

The infrared LED illumination 117 includes an illuminance sensor that detects illuminance. When the detected illuminance is lower than a predetermined threshold, the infrared LED illumination 117 is lit to illuminate the vicinity of the slave unit 10. The communication device 150 is a device that transmits and receives various signals including a control signal, an image signal, and an audio signal to and from the parent device 20 via the signal line 30.

Next, the configuration of the base unit 20 will be described with reference to FIGS. First, with reference to FIG. 2, the physical structure of the main | base station 20 is demonstrated. The base unit 20 includes a housing 205 having a substantially rectangular parallelepiped shape. A microphone 211 and a speaker 212 are provided at the lower left front of the housing 205. A display monitor 214 is provided at the front left center of the housing 205, and a warning lamp 216 is provided at the upper left. An operation panel 215 is provided on the front right half of the housing 205.

The microphone 211 is a device that converts the voice of the room responder into a voice signal and outputs the voice signal to the slave unit 10 via the first communication device 250. The speaker 212 is a device that converts an audio signal input from the child device 10 into sound and outputs the sound. The display monitor 214 is a liquid crystal monitor including a liquid crystal panel and a drive circuit, for example. The display monitor 214 is a display device that displays an image of the imaging area captured by the camera 113 of the slave unit 10. The warning lamp 216 is, for example, an LED lamp. As will be described later, the warning lamp 216 is turned on in order to alert a room responder when a visitor cannot be identified.

The operation panel 215 is a display device in which, for example, a liquid crystal panel, a drive circuit that drives the liquid crystal panel, and a touch pad that is an input device capable of detecting an input position are incorporated at least in part. For example, as shown in FIG. 2, the operation panel 215 displays a visitor information display area 311 for displaying information related to visitors, and various operation buttons for the room attendant to input instructions. The operation buttons include, for example, a registration correction button 312, a corresponding button 313, and a rejection button 314 shown in FIG. When the indoor person touches the operation button with a finger, the position is detected by the touch pad, and the corresponding information is output. In the present embodiment, a different display is performed on the operation panel 215 depending on the recognition status of the visitor. Details will be described later.

Referring to FIG. 1, the electrical configuration of base unit 20 will be described. As shown in FIG. 1, the main unit 20 includes a CPU 201, a ROM 202, a RAM 203, a microphone 211, a speaker 212, a display monitor 214, an operation panel 215, a warning lamp 216, a flash ROM 220, a first communication device 250, and a second communication device. 260, all of which are connected to each other by a bus.

The CPU 201 controls the entire master device 20. The ROM 202 stores a program necessary for causing the master unit 20 to execute various processes including a main process, which will be described later, and setting values for that purpose. CPU 201 controls the operation of parent device 20 in accordance with a program stored in ROM 202. The RAM 203 is a storage device for temporarily storing various data.

The microphone 211, speaker 212, display monitor 214, operation panel 215, and warning lamp 216 are as described above. The first communication device 250 is a device that transmits and receives various signals including a control signal, an image signal, and an audio signal to and from the slave unit 10 via the signal line 30. The second communication device 260 is connected to the public telephone line network 5 via the telephone line 50 and is a device that transfers a call with the handset 10 to an external telephone through the public telephone line network 5.

The flash ROM 220 will be described with reference to FIGS. The flash ROM 220 is a nonvolatile semiconductor memory. For example, as shown in FIG. 3, the flash ROM 220 includes a face feature storage area 221, a display screen storage area 222, a notification audio storage area 223, a setting information storage area 224, a moving image storage area 225, and a still image storage area 226. A plurality of storage areas are provided.

In the face feature storage area 221, face feature data that is data indicating the facial features of a plurality of persons and related information that is information related to the persons are stored in association with each other. For example, as shown in FIG. 4, the face feature storage area 221 includes, for example, an ID column, a face feature data column, a name column, a relationship column, a previous visit date column, and a memo column.

In the ID column, an ID that is unique information for identifying each data is stored. In the face feature data column, for example, numerical data indicating the position and shape of eyebrows, eyes, nose, mouth and the like are stored as face feature data. In the present embodiment, it is assumed that feature point data corresponding to eyes, nose, and mouth is adopted as face feature data. In the name column, the name of the person (hereinafter referred to as a registrant) from which facial feature data is extracted is stored. In the relationship column, information indicating the relationship between the registrant and the user of the parent device 20 is stored as the relationship. The user of base unit 20 is, for example, a resident if the intercom system 1 is installed in a house, or a company employee if the interphone system 1 is installed in a company. . Therefore, for example, information such as a relative of the resident, a friend, and a nearby resident, or an employee of the company, a customer, and a trader is stored as information indicating the relationship.

The previous visit date column stores the date when the registrant visited the house or company where the intercom system 1 was installed last time as the last visit date. In the memo field, a text to be notified to the room responder when the registrant visits is stored as a memo. The name, relationship, last visit to Japan, and memo stored in the face feature storage area 221 corresponding to the face feature data are related information of the registrant. Of the above information stored in the face feature storage area 221, except for the previous visit to Japan, the information is registered by the user of the parent device 20 as appropriate. On the other hand, the previous visit date is information that is automatically updated when it is recognized that the registrant has visited, as will be described later.

In the display screen storage area 222 shown in FIG. 3, a screen template to be displayed on the operation panel 215 of the parent device 20 is stored. Although details will be described later, on the operation panel 215, for example, as shown in FIG. 2, a screen having operation buttons for inputting information related to the visitor and various instructions for the user of the parent device 20 is displayed. In the present embodiment, since there are a plurality of types of screens displayed on the operation panel 215, a plurality of display screen templates are stored in the display screen storage area 222.

In the notification audio storage area 223, audio data of audio to be output to the speaker 212 of the parent device 20 or the speaker 112 of the child device 10 is stored. Examples of the stored voice data include voice data that is output from the speaker 112 of the child device 10 and prompts a visitor to perform a predetermined action.

In the setting information storage area 224, various setting information used in various processes to be described later is stored. The stored setting information includes, for example, collation conditions, resolution at the time of image capturing by the camera 113 of the slave unit 10, and magnification. The collation condition means that if at least one of feature point data corresponding to eyes, nose and mouth is found in the collation of facial feature data, or all feature point data corresponding to eyes, nose and mouth are collected. This is information indicating whether or not collation is performed only when In the present embodiment, two types of values are stored as the resolution, the normal resolution that is the default resolution and the resolution that is higher than the normal resolution. Further, three types of magnifications are stored: a normal magnification which is a default magnification, a low magnification lower than the normal magnification, and a high magnification higher than the normal magnification.

In the moving image storage area 225, moving images in the imaging area are stored. Although details will be described later, in the present embodiment, a moving image captured by the camera 113 is stored in the moving image storage area 225 when the human sensor 115 detects an object whose face area cannot be detected for a predetermined time. Is done.

Hereinafter, with reference to FIG. 5 to FIG. 11, processing executed in the base unit 20 of the intercom system 1 will be described. The process described below is executed by the CPU 201 in accordance with a program stored in the ROM 202.

The main process shown in FIG. 5 is started when the power of the main unit 20 is turned on, is continuously repeated while the power is on, and ends when the power is turned off. First, a reset process is performed (S1). Specifically, all the information stored in the RAM 203 is deleted, and all the various flags stored in the flag storage area (not shown) of the RAM 203 are turned off.

In the present embodiment, the flags that are turned off in step S1 are a detection flag, a detection error flag, and a feature extraction error flag. The detection flag is a flag indicating whether or not a moving object is detected by the human sensor 115. Specifically, when the detection flag is ON, it indicates that a moving object has been detected, and when it is OFF, it indicates that it has not been detected. The detection error flag is a flag indicating whether or not a face area has been detected. Specifically, when the detection error flag is ON, it indicates that the face area cannot be detected, that is, there is a detection error, and when it is OFF, it indicates that the face area has been detected. The feature extraction error flag is a flag indicating whether all feature points of the face have been extracted. Specifically, when the feature extraction error flag is ON, it indicates that all the facial feature points could not be extracted, that is, there was a feature extraction error, and when it was OFF, all the facial feature points could be extracted. It shows that.

Subsequently, the CPU 201 determines whether or not the detection flag is ON (S2). In the first process, the detection flag is OFF (S2: NO). In this case, the CPU 201 determines whether or not a signal indicating that a moving object, that is, an object that seems to be a visitor has been detected, has been input from the child device 10 to the parent device 20 by the human sensor 115 of the child device 10. (S3). While the moving object is not detected, the CPU 201 stands by (S3: NO, S2: NO).

When a signal indicating that a moving object is detected by the human sensor 115 is input to the parent device 20 (S3: YES), the detection flag turned OFF in step S1 is turned ON (S4). Subsequently, the first communication device 250 transmits a control signal for starting imaging by the camera 113 and data of predetermined imaging conditions to the slave unit 10 via the signal line 30 (S5). The data of the predetermined imaging condition transmitted at this time includes the normal resolution that is the default resolution and the data of the normal magnification that is the default magnification, which are stored in the setting information storage area 224 of the flash ROM 220. .

In the slave unit 10 that has received the control signal from the master unit 20 and data of predetermined imaging conditions, the resolution and magnification of the camera 113 are set according to the received normal resolution and normal magnification. In the present embodiment, after the handset 10 receives the imaging start control signal in step S5, the camera 113 always performs imaging. The image signal output from the camera 113 is transmitted to the parent device 20 via the signal line 30 by the communication device 150. In base unit 20, processing for converting the image signal received by first communication device 250 into data that can be displayed on display monitor 214 is performed according to a separately executed program. That is, the image captured by the camera 113 of the slave unit 10 can be displayed on the display monitor 214 in real time.

After an instruction to start imaging (S5), it is determined whether or not a visitor or a visitor candidate has already been specified (S7). In the first process, a visitor or a visitor candidate has not yet been identified (S7: NO). Therefore, recognition determination processing is performed (S10 and FIG. 6). Hereinafter, the recognition determination process will be described with reference to FIGS.

As shown in FIG. 6, the CPU 201 first acquires one frame of the image signal transmitted from the slave 10 and received by the first communication device 250, and generates still image data (S100). Based on the generated still image data, it is determined whether or not the face area can be detected (S101). Any known method may be employed for detecting the face area. For example, a method of matching with a face pattern stored in advance or a method of detecting a skin color region can be employed. If a face area is possible from still image data (S101: YES), the face area is detected (S102).

Subsequently, it is determined whether or not facial feature points can be extracted (S103). Here, it is determined that the facial feature points can be extracted only when the feature points corresponding to the eyes, nose, and mouth can be extracted (S103: YES). In this case, feature points of eyes, nose, and mouth, which are facial feature points, are extracted from the face region, and numerical data indicating these positions and shapes are obtained as visitor facial feature data (S104).

The facial feature data of the registrant is read in order from the facial feature storage area 221 of the flash ROM 220 and collated with the facial feature data of the visitor (S105). As a result of the collation, when it is determined that the facial feature data of the current visitor does not match any registrant (S106: NO), it is determined that the visitor is an unregistered person (S109). Then, the recognition determination process ends, and the process returns to the main process shown in FIG.

On the other hand, if it is determined as a result of the collation that the facial feature data of the visitor matches the facial feature data of any registrant (S106: YES), it is determined that the visitor is a registrant whose facial feature data matches. It is determined (S107). Then, the previous visit date stored in the face feature storage area 221 as the related information of the registrant identified as a visitor is updated to the current date (S108). Thereby, when the same registrant next visits, the latest visit date can be displayed on the registrant mode screen 310 (refer FIG. 2) mentioned later. After the date update, the recognition determination process ends, and the process returns to the main process shown in FIG.

If it is determined in step S101 that the face area cannot be detected from the generated still image (S101: NO), the fact that the face of the visitor cannot be identified is determined because the face area cannot be detected. . Therefore, assuming that a detection error has occurred, the detection error flag stored in the flag storage area of the RAM 203 is turned ON (S111). Then, detection error processing is performed (S120 and FIG. 7). The detection error process is a process that prompts the visitor to perform a predetermined action so that the face area can be detected.

As shown in FIG. 7, in the detection error process, it is determined whether or not a predetermined time has elapsed since the moving sensor was detected by the human sensor 115 (S121). Specifically, for example, when the detection flag is turned on in step S4 of the main process shown in FIG. 5, a timer (not shown) is started to measure the elapsed time, and whether or not the elapsed time exceeds a predetermined threshold value. You just have to judge. If the predetermined time has not elapsed (S121: NO), the detection error process is ended as it is because there is a possibility that the face area may be specified within the predetermined time. The process returns to the recognition determination process shown in FIG. 6, and then returns to the main process shown in FIG.

On the other hand, if it is determined that a predetermined time has elapsed since the visitor was detected (S121: YES), it is determined whether the moving objects detected during the predetermined time are the same object (S122). Specifically, for example, for a plurality of images captured by the camera 113 during a predetermined period, a histogram indicating the luminance distribution of each image can be generated, and determination can be made based on the comparison result of these histograms. When it is determined that the moving objects detected during the predetermined time are not the same object (S122: NO), the detection error process is ended as it is. The process returns to the recognition determination process shown in FIG. 6, and then returns to the main process shown in FIG.

When it is determined that the moving object detected during the predetermined time is the same object (S122: YES), this moving object is a visitor, but the face is not displayed because the visitor is not standing at an appropriate position. The area may not be detected. Accordingly, a control signal and data for outputting a voice instruction from the speaker 112 of the child device 10 to the visitor for a predetermined action when the face area cannot be detected is transmitted to the child device 10 (S123). . Specifically, voice data of a predetermined instruction voice stored in the notification voice storage area 223 of the flash ROM 220 is read, converted into a voice signal, and transmitted to the slave unit 10.

As a result, in the handset 10, for example, a voice instructing the camera 113 to stand at a position where the face can be photographed is output, such as “Please stand in front of the handset”. Thus, the visitor can take an appropriate action according to the recognition error that the face area cannot be detected in accordance with the voice instruction.

If it is determined that the moving object detected during the predetermined time is the same object (S122: YES), there is a possibility that a suspicious person is wandering around. Therefore, in addition to the voice instruction to the visitor in step S123, recording is performed. However, if recording has already started in the previous detection error process and recording is in progress (S125: YES), the detection error process is terminated as it is, and the process returns to the recognition determination process shown in FIG. If recording has not yet started (S125: NO), recording for a predetermined time (for example, 1 minute) is started (S126). Specifically, the CPU 201 acquires the image signal transmitted from the slave unit 10 to generate moving image data, and stores it in the moving image storage area 225 of the flash ROM 220.

In this way, if there is a possibility that the visitor is a suspicious person, by recording a video, it is possible to save evidence that may identify the person later. Thereafter, the CPU 201 turns on the warning lamp 216 (S127) to notify the user of the parent device 20 in the room that there is an unidentified visitor. Then, the detection error process ends. The process returns to the recognition determination process shown in FIG. 6, and then returns to the main process shown in FIG.

The face area detection error may be caused by the face being too close to the camera 113 and not being in the shooting area. Therefore, in step S126, instead of recording, the slave unit 10 may be instructed to capture a still image at a low magnification (wide angle), and a low-magnification still image may be generated from the obtained image signal. By changing the magnification in this way, there is a possibility that the face of the visitor can be recognized if the person in the room sees it visually.

When it is determined in step S103 of the recognition determination process shown in FIG. 6 that at least one of the eye, nose, and mouth feature points cannot be extracted from the face area (S103: NO), the visitor's face recognition is performed. What cannot be specified is caused by the fact that facial feature points cannot be extracted. Therefore, assuming that a feature extraction error has occurred, the feature extraction error flag stored in the flag storage area of the RAM 203 is turned ON (S112). Then, feature extraction error processing is performed (S130 and FIG. 8). The feature extraction error process is a process that prompts the visitor to perform a predetermined action so that all feature points of the face can be extracted.

As shown in FIG. 8, in the feature extraction error process, first, which region in the face region is hidden is identified (S131). For example, depending on whether the feature points not extracted in the recognition judgment process correspond to the eyes, nose, or mouth, it is possible to identify whether the feature points are hidden, the eye area, the nose area, or the mouth area. That's fine. Or you may identify based on the luminance value of each pixel in a face area. In addition, the case where any area | region is concealed is assumed, for example, when it is specified that the eye area is concealed, and when wearing sunglasses. When the nose region and the mouth region are hidden, a case where a mask is put on is assumed.

Subsequently, it is determined whether or not the setting for performing partial verification is made (S132). Here, partial verification refers to verification of a part of the registrant's facial feature data using a part of the data when all of the feature points corresponding to the eyes, nose and mouth cannot be extracted. To do. The collation condition, which is information indicating whether or not partial collation is performed, is set by the user and stored in the setting information storage area 224 of the flash ROM 220 as described above.

When the collation condition stored in the setting information storage area 224 indicates that partial collation is not performed (S132: NO), a predetermined action when a facial feature point cannot be extracted for a visitor is performed. A control signal and data for outputting a prompt instruction from the speaker 112 of the child device 10 are transmitted to the child device 10 (S133). Specifically, voice data of a predetermined instruction voice stored in the notification voice storage area 223 of the flash ROM 220 is read, converted into a voice signal, and transmitted to the slave unit 10. In the handset 10, for example, when it is specified that the eye area is hidden, an instruction is given to show the hidden area, for example, “Please keep your eyes open”. Audio is output. Therefore, the visitor can take an appropriate action according to the recognition error that the facial feature point cannot be extracted according to the voice instruction.

Subsequently, an instruction for imaging at a high resolution is transmitted to the slave unit 10 (S141). Specifically, the CPU 201 sets the high resolution read from the setting information storage area 224 of the flash ROM 220 as the resolution of the camera 113 and transmits a control signal for imaging to the slave unit 10. The reason why high-resolution imaging is performed here is that facial feature points cannot be extracted at normal resolution, but if the resolution is high, indoor responders may be able to visually recognize the faces of visitors. It is. Then, the CPU 201 acquires one frame of image signal transmitted from the child device 10 to generate a high-resolution still image and stores it in the still image storage area 226 of the flash ROM 220 (S142).

In step S141, the CPU 201 may instruct imaging at a high magnification (zoom) stored in the setting information storage area 224 of the flash ROM 220 instead of instructing imaging at high resolution. If facial feature points cannot be extracted, it may be because the visitor is too far from the camera 113. Therefore, by changing the magnification in this way, there is a possibility that the face of the visitor can be recognized if the person in the room looks with his / her eyes.

Thereafter, the CPU 201 transmits an instruction to return the resolution of the camera 113 to the normal resolution to the slave unit 10 (S143). Specifically, the normal resolution read from the setting information storage area 224 of the flash ROM 220 is set as the resolution of the camera 113 and a control signal for imaging is transmitted to the slave unit 10. Thereafter, the warning lamp 216 is turned on (S144) to notify the user of the parent device 20 in the room that there is an unidentified visitor. Then, the feature extraction error process ends. The process returns to the recognition determination process shown in FIG. 6, and then returns to the main process shown in FIG.

If it is determined in step S132 that the collation condition stored in the setting information storage area 224 indicates that partial collation is to be performed (S132: YES), only some of the feature points that can be extracted are used. The collation with the registrant's facial feature data is performed (S135). For example, during the spring pollen season, more people are wearing masks as a countermeasure against hay fever. It is harsh to instruct such visitors to remove their masks and open their noses and mouths. Therefore, in such a case, the collation condition can be set in advance so as to perform partial collation.

In partial matching, for example, when feature points corresponding to eyes cannot be extracted, and only feature points corresponding to nose and mouth can be extracted, data obtained from feature points corresponding to nose and mouth and registrant's The nose and mouth data of the face feature data are collated. If the data does not match any of the registrant data (S136: NO), as described above, an instruction for imaging at high resolution is transmitted to the slave unit 10 (S141), and the high resolution still image becomes a still image. The information is stored in the storage area 226 (S143), and the warning lamp 216 is turned on (S144). Then, the feature extraction error process ends, the process returns to the recognition determination process shown in FIG. 6, and then returns to the main process shown in FIG.

If it is determined in step S136 that the extracted feature points are matched with the facial feature data using only some of the feature points, the facial feature data is determined to match any of the registrant facial features data (S136: YES). The registrant is identified as a visitor candidate (S137). Here, the candidate is identified as the result of matching based on only a part of the facial feature points, so the reliability of the results is not as high as when all the feature points are used. is there. Thereafter, the feature extraction error process ends, the process returns to the recognition determination process shown in FIG. 6, and further returns to the main process shown in FIG.

In the main process shown in FIG. 5, after the recognition determination process (S10), it is determined whether or not the call button 114 of the child device 10 has been pressed by the visitor (S11). Specifically, it is determined whether or not a call signal transmitted from slave unit 10 has been received. If the call button 114 is not depressed by the visitor (S11: NO), the process returns to step S2 in order to continue the process until the visitor depresses the call button 114.

In subsequent step S2, it is determined whether or not the detection flag is ON. When the process returns to step S2 after step S11, the moving object has already been detected once, and the detection flag is ON (S2: YES). Further, it is determined whether or not a moving object is detected by the human sensor 115 (S23). Here, when a moving object is not detected (S23: NO), it means that the moving object once detected is out of the detection range of the human sensor 115 during the subsequent processing. Therefore, in this case, it is assumed that there are no visitors, and the process returns to step S1. Then, the information stored in the RAM 203 is erased, various flags are turned OFF, and the process for the next visitor is performed as described above.

On the other hand, if a moving object is detected in step S23 (S23: YES), it means that the moving object remains in the detection range as it is, that is, the visitor is still in the vicinity of the child device 10. Therefore, it is determined whether or not a visitor or a visitor candidate has already been specified in the previous process (S7). If it has already been specified, it is not necessary to perform the recognition determination process again. Therefore, the process proceeds to step S11 as it is, and it is determined whether or not the call button 114 is pressed (S11). On the other hand, if a visitor or a visitor candidate has not been specified in the previous process (S7: NO), the recognition determination process is performed again (S10). The same processing is repeated until the call button 114 is pressed by the visitor (S2 to S10).

After the recognition determination process (S10), if the call button 114 is pressed (S11: YES), a visitor process is performed (S20 and FIG. 9). Details of the visitor process will be described below with reference to FIG. 2 and FIGS.

As shown in FIG. 9, in the visitor process, the CPU 201 first outputs a ringing tone from the speaker 212 of the parent device 20, and based on display data generated from the image signal transmitted from the child device 10, For example, as shown in FIG. 2, the image captured by the camera 113 is displayed on the display monitor 214 (S201). Further, it is determined whether or not a visitor or a visitor candidate has been specified in order to display on the operation panel 215 according to the recognition status of the visitor (S202).

As described above, when the visitor is identified as one of the registrants in the recognition determination process (S107 in FIG. 6), when identified as an unregistered person (S109), and the feature detection error process If any of the registrants is specified as a visitor candidate (S137 in FIG. 8), it is determined that the visitor or the candidate is specified (S202: YES). In this case, the CPU 201 displays the registrant mode screen 310 shown in FIG. 2 on the operation panel 215 (S213).

In the registrant mode screen 310, for example, as shown in FIG. 2, a visitor information display area 311, a registration correction button 312, a corresponding button 313, and a reject button 314 are provided. The registrant mode screen 310 may be created by pouring information on visitors or visitor candidates into a template stored in the display screen storage area 222 (see FIG. 3) of the flash ROM 220. The same applies to other screens described below.

The visitor information display area 311 displays information on visitors or visitor candidates. Specifically, for example, when a visitor is identified as a registrant, the message “Visitor is the following registrant” is displayed, and a visitor candidate is identified. Displays the message “Candidates for visitor are the following registrants”. In addition, the name, relationship, previous visit to Japan, and memo of the registrant stored as related information in the face feature storage area 221 are read and displayed.

FIG. 2 is an example of a registrant mode screen 310 when a registrant with ID “1” is identified as a visitor among registrants whose data is stored in the face feature storage area 221 of FIG. Therefore, the visitor's name “A”, which is related information of this registrant, the relationship “acquaintance of daddy”, the previous visit date “September 1, 2008”, and the memo “waiting for daddy to return” Is displayed.

The registration correction button 312 is a button for inputting an instruction to shift to the correction screen when the room corresponding person wants to correct the related information of the displayed registrant. The response button 313 is a button for inputting an instruction to start a call with the child device 10 when the room responder wants to directly respond to the visitor. The refusal button 314 is a button for inputting a proxy response instruction when the room responder does not want to directly respond to the visitor.

As described above, when a visitor or a visitor candidate is specified, a real-time image transmitted from the slave unit 10 is displayed on the display monitor 214, and a visitor or a visitor candidate is displayed on the operation panel 215. Related information is displayed. Therefore, the room responder can easily know whether or not the visitor is a known person, and if it is a known person, the identity of the visitor can be easily determined. . In addition, when the matching is performed with only a part of the facial feature points, the visitor is displayed as a candidate. Therefore, the room responder can easily guess who the visitor is.

On the other hand, when it is determined in step S202 that the visitor or the visitor candidate is not specified (S202: NO), a notification sound indicating that the connection is being established is output from the speaker 112 of the slave unit 10. The control signal and data are transmitted to the slave unit 10 (S203). Here, for example, a voice message “Connected. Please wait a moment” is output.

Subsequently, it is determined whether or not a feature extraction error has occurred in the recognition determination process (S205). Specifically, if all the feature points of the face cannot be extracted and the feature extraction error flag is ON (S112 in FIG. 6), it is determined that there is a feature extraction error (S205: YES). In this case, the CPU 201 displays the attention mode screen 320 shown in FIG. 10 on the operation panel 215 (S206).

The caution mode screen 320 is provided with a caution information display area 321, a high-resolution image display area 322, a warning release button 323, and a reject button 314, for example, as shown in FIG. 10. The caution information display area 321 displays information that calls attention to visitors. Specifically, for example, a message “Attention! Visitor confirmation is required” is displayed. The high-resolution image display area 322 is generated when all feature points cannot be extracted in the feature extraction error process (S142 in FIG. 8) and stored in the still image storage area 226 of the flash ROM 220. The image is read and displayed.

The warning cancel button 323 allows the room attendant to check the normal resolution real-time image displayed on the display monitor 214 and the high-resolution still image displayed in the high-resolution image display area 322 and then directly to the visitor. This is a button for inputting an instruction to display the corresponding button 313 when it is desired to respond. The reject button 314 is as described in connection with the registrant mode screen 310.

As described above, when the feature points of all the faces cannot be extracted even though the face of the visitor is detected, a real-time image transmitted from the slave unit 10 is displayed on the display monitor 214 and the operation panel is displayed. In 215, a high-resolution still image of the imaging area is displayed. Therefore, for example, as shown in FIG. 10, even if a part of the visitor's face is hidden and the visitor cannot be identified by collation with the face feature data, By looking at the screen, it is easy to determine whether the visitor is actually a registered person and whether or not the visitor is a suspicious person.

If it is determined in step S205 that no feature extraction error has occurred in the recognition determination process (S205: NO), it is determined whether a detection error has occurred (S208). Specifically, if the face area cannot be detected and the detection error flag is ON (S111 in FIG. 6), it is determined that a detection error has occurred (S208: YES). In this case, the CPU 201 displays the warning mode screen 330 shown in FIG. 11 on the operation panel 215 (S209).

The warning mode screen 330 is provided with a warning information display area 331, a moving image display area 332, a transfer button 333, a warning release button 323, and a rejection button 314, for example, as shown in FIG. In the warning information display area 331, information for warning that the visitor is a person requiring attention is displayed. Specifically, for example, a message “Warning !!!! There is a person who needs attention” is displayed. In the moving image display area 332, when the same object is detected for a predetermined time in the detection error processing (S126 in FIG. 7), the moving image stored in the moving image storage area 225 of the flash ROM 220 is reproduced and displayed. .

The transfer button 333 is a button for inputting an instruction to transfer the audio data input from the slave unit 10 to a predetermined transfer destination telephone. The warning cancel button 323 and the reject button 314 are as described in relation to the attention mode screen 320 and the registrant mode screen 310, respectively.

As described above, when a moving object is detected for a predetermined time, but a visitor's face area cannot be detected, a real-time image transmitted from the slave unit 10 is displayed on the display monitor 214 and recording is performed. The moving image that has been displayed is displayed on the operation panel 215. Therefore, for example, as shown in FIG. 11, even when a person is wandering, the room responder confirms whether or not the person is a suspicious person by both the real-time image from the camera 113 and the recorded video. be able to.

If it is determined in step S208 that no detection error has occurred in the recognition determination process (S208: NO), the CPU 201 displays an unregistered mode screen (not shown) on the operation panel 215 (S211). The unregistered mode screen is provided with an unregistered person notification area (not shown) instead of the visitor information display area 311 of the registrant mode screen 310 shown in FIG. 2, and the visitor is an unregistered person. Is displayed. Similarly to the registrant mode screen 310, a correspondence button 313 and a rejection button 314 are provided. Further, a registration button or the like for inputting an instruction to shift to a screen for newly registering a visitor's facial feature data or the like in the facial feature storage area 221 may be provided.

As described above, after the display according to the recognition status of the visitor is performed on the operation panel 215 (S213, S206, S209, or S211), whether or not the panel operation is performed, specifically, the operation panel It is determined whether or not an input from the touch pad 215 is detected (S216). When the panel operation is not performed (S216: NO), it is determined whether or not a predetermined time (for example, 1 minute) has elapsed (S217). For example, the elapsed time may be measured by a timer that is started when pressing of the call button 114 is detected, and it may be determined whether or not a threshold value has been exceeded. While the predetermined time has not elapsed (S217: NO), the CPU 201 returns to the determination of whether or not there has been a panel operation (S216) and repeats the process.

If the predetermined time has elapsed (S217: YES), it is considered that there is no person in the room and no response is made, so the visitor process is terminated as it is, and the process returns to the main process in FIG. In this case, in the main process, a reset process is performed for the next visitor (S1).

When it is determined that an input has been made from the operation panel 215 (S216: YES), the input instruction is displayed according to the input position detected by the displayed screen and the touch pad, as shown in FIG. ) Is selected, it is determined whether or not a call start instruction is given (S221). When the corresponding button 313 is selected (S221: YES), the CPU 201 performs a call start process (S222). Specifically, the CPU 201 separately activates a program for controlling the operation of the parent device 20 related to a call with the child device 10. As a result, a communication path between the child device 10 and the parent device 20 is formed, and a call can be made between the visitor and the room responder. After the call start process (S222), the visitor process ends, and the process returns to the main process in FIG. In the main process, a reset process is performed for the next visitor (S1).

If the instruction input from the operation panel 215 is not an instruction to start a call by selecting the corresponding button 313 (S221: NO), an instruction for proxy response by selecting the reject button 314 (see FIGS. 2, 10, and 11) It is determined whether or not (S224). When the refusal button 314 is selected (S224: YES), the CPU 201 performs a proxy response process (S225). Specifically, a control signal and data for outputting a notification voice indicating that direct response cannot be received from the speaker 112 of the slave unit 10 is transmitted to the slave unit 10 (S225). As a result, the child device 10 outputs, for example, a voice saying “Sorry, it cannot be handled because it is currently being imported”. Thereafter, the visitor process ends, and the process returns to the main process in FIG. In the main process, a reset process is performed for the next visitor (S1).

If the instruction input from the operation panel 215 is not a proxy response instruction by selecting the reject button 314 (S224: NO), it is determined whether or not the instruction is a transfer instruction by selecting the transfer button 333 (see FIG. 11). (S227). When the transfer button 333 is selected (S227: YES), the CPU 201 performs a transfer process (S228). Specifically, a call is made to a telephone number stored in advance in a predetermined storage area (not shown) of the flash ROM 220 as a transfer destination. Then, when the corresponding telephone (which may be a mobile phone or a fixed telephone) responds and the line enters a call state with the main unit 20 via the public telephone line network 5, the parent unit 10 and the telephone set the parent unit. Transfer calls can be made via the machine 20. Thereafter, the visitor process ends, and the process returns to the main process in FIG. In the main process, a reset process is performed for the next visitor (S1).

If the instruction input from the operation panel 215 is not a transfer instruction by selecting the transfer button 333 (S227: NO), other processing according to the input instruction is performed. For example, when the warning release button 323 is selected on the attention mode screen 320 shown in FIG. 10 or the warning mode screen 330 shown in FIG. 11, a corresponding button 313 (see FIG. 2) is displayed instead of the warning release button 323. Processing is performed. After the other processes are completed, the process returns to step S216, and the process according to the operation on the operation panel 215 is performed as described above.

As described above, in the intercom system 1 of the present embodiment, when a visitor is detected, imaging by the camera 113 of the slave unit 10 is started. Then, the face area is detected from the still image generated from the image signal acquired from the handset 10, and the feature points are extracted from the face area. When face areas cannot be detected or feature points cannot be extracted, face recognition cannot be performed by collating with face feature data of a plurality of persons stored in the face feature storage area 221 of the flash ROM 220 in advance. Further, in such a case, even if the room responder views the image captured by the camera 113, the face of the visitor cannot be recognized in many cases.

Therefore, an instruction that prompts the visitor to act according to the cause that hinders recognition is output from the speaker 112 as a voice, thereby notifying the visitor. Therefore, the visitor can take an appropriate action according to the cause according to the notified instruction. In addition, screens 310 to 330 showing information prompting different levels of warning depending on the cause of hindering recognition are displayed on operation panel 215. Therefore, the room responder can take appropriate warning measures according to the cause according to the notified information.

In particular, when the visitor's face cannot be recognized from the acquired image, the cause is mainly that the face area of the visitor cannot be detected or the feature of the visitor's face cannot be extracted. According to the intercom 1, depending on which of the above two causes, the contents of notification to the visitor and the level of caution to be notified in the room are changed, and appropriate instructions to the visitor and appropriate A warning can be made.

Also, in the intercom system 1, when the visitor's face cannot be recognized, the imaging method (for example, resolution, angle of view, still image or moving image) by the camera 113 is changed according to the cause, and the acquired image is changed. It is displayed on the operation panel 215 of the master unit 20. Therefore, the room responder can check different images depending on the cause of the face recognition failure when alerting is urged, and can take more appropriate alerting measures.

Furthermore, in the intercom 1, the face feature data of the visitor is checked against the face feature data of a plurality of registrants stored in the face feature storage area 221 of the flash ROM 220, and it is determined whether there is a matching person. The Information for prompting different levels of alert according to the determination result is notified to the room through the operation panel 215. Therefore, the room responder can know whether or not the visitor is a person to be wary, and it is easy to avoid careless correspondence.

It should be noted that the configuration and processing shown in the above-described embodiment are examples, and it goes without saying that various modifications are possible. For example, in the above-described embodiment, when the facial feature data of the visitor is obtained in the recognition determination process (see FIG. 6), the facial feature data of the registrant stored in the facial feature storage area 221 of the flash ROM 220. It is attempted to identify visitors by collating with. However, it is not always necessary to specify a visitor by collating facial feature data. That is, in the recognition determination process of FIG. 6, the processes of steps S104 to S109 need not be performed. In this case, the process of step S7 of the main process shown in FIG. 5 and the process of steps S202 and S213 of the visitor process shown in FIG. That is, after step S201, the process proceeds to step S203. And the process of step S211 does not need to be performed.

That is, when there is a feature extraction error or a detection error, the attention mode screen 320 (see FIG. 10) or the warning mode screen 330 (see FIG. 11) is displayed according to the type of error. If the facial feature points have been successfully extracted, the image captured by the camera 113 is simply displayed on the display monitor 214 (S201 in FIG. 9). Even in this case, since the detection of the face area and the extraction of the facial feature points have succeeded, the room responder can recognize the visitor's face by looking at the image displayed on the display monitor 214.

In the above-described embodiment, when the face of the visitor cannot be recognized, information prompting different actions depending on the cause is notified by voice from the speaker 112 of the slave unit 10. However, it is not always necessary to notify the visitors by voice. Instead, a display monitor may be provided in the child device 10 to display information that prompts different actions depending on the cause when the face of the visitor cannot be recognized.

In the above-described embodiment, different screens 310 to 330 are displayed on the operation panel 215 of the main unit 20 depending on the reason why the visitor's face cannot be recognized. Have been notified. However, it is not always necessary to notify the indoor responder by display. Instead, a message prompting a different level of alert from the speaker 212 of the parent device 20 such as “I know my dad”, “Please check the image carefully”, “Be wary because it seems to be a suspicious person”, etc. May be output by voice. Further, the warning lamp 216 may blink at different intervals for each warning level, or a plurality of warning lamps 216 having different emission colors may be provided to light the warning lamps 216 of different colors.

In the above-described embodiment, the moving object in front of the child device 10 is detected by the human sensor 115, but the human sensor 115 is not necessarily provided in the child device 10. Instead, a moving object may be detected by changing the image using an image captured by the camera 113.

In the above-described embodiment, the parent device 20 includes the flash ROM 220, and voice data is output from the child device 10 by transmitting instruction voice data and the like stored therein to the child device 10. However, the instruction voice data does not necessarily have to be stored in the parent device 20, and may be stored in the child device 10 by providing a flash ROM. In this case, only an instruction for specifying data is transmitted from the parent device 20 to the child device 10, and the CPU 101 of the child device 10 performs voice output in accordance with the instruction.

Claims

An intercom system comprising a slave unit installed outdoors and a master unit connected to the slave unit and installed indoors,
The slave is
Imaging means for imaging a predetermined imaging range in front of the slave unit and outputting captured image information that is information of an image in the imaging range;
A visitor detection means for detecting a visitor;
Comprising a handset notification means for notifying the visitor of information,
The base unit is
An image acquisition unit that acquires the captured image information output from the imaging unit when the visitor is detected by the visitor detection unit;
Recognition determination means for determining whether the visitor's face is recognizable based on the captured image information acquired by the image acquisition means;
When the recognition determining unit determines that the face of the visitor is not recognizable, the slave unit notification unit is controlled to perform different actions on the visitor depending on the cause of the recognition hindering. A handset notification control means for notifying the prompting information;
A master notification means for notifying information in the room;
An interphone system comprising: a first master unit notification control unit that controls the master unit notification unit to notify information that prompts different levels of alerting according to the cause.
The recognition determining means includes
Based on the captured image information acquired by the image acquisition means, first determination means for determining whether or not a face area that is an area corresponding to the visitor's face exists in the imaging range;
When the first determination means determines that the face area of the visitor exists within the imaging range, a visitor face that is a feature of the face of the visitor based on the captured image information Second determination means for determining whether or not a feature can be extracted;
When the determination result by the first determination unit or the determination result by the second determination unit is negative, the cause is determined according to the determination result by the first determination unit or the second determination unit. The intercom system according to claim 1, further comprising cause identifying means for identifying
The imaging means is capable of imaging at different resolutions, imaging at different angles of view, and still images and moving images,
The base unit is
An imaging control unit that controls the imaging unit to change the imaging method of the imaging unit according to the cause when the recognition determining unit determines that the face of the visitor is not recognizable;
Image display means for displaying an image;
3. The display control unit according to claim 1, further comprising a display control unit configured to control the image display unit to display an image in the imaging range based on the captured image information acquired by the image acquisition unit. The listed intercom system.
The base unit is
A face for extracting a visitor face feature that is a feature of the visitor's face based on the captured image information when the recognition judgment means determines that the face of the visitor is recognizable. Feature extraction means;
The plurality of facial features stored in a facial feature storage unit that stores a plurality of facial features in association with identification information of a plurality of persons are compared with the visitor facial features, and the plurality of facial features A coincidence judging means for judging whether or not any of the above matches the visitor face characteristics;
2. The device according to claim 1, further comprising a second parent device notification control unit that controls the parent device notification unit to notify information prompting different levels of alerting according to a determination result by the match determination unit. The intercom system according to any one of items 1 to 3.
The base unit is
A visitor specifying means for specifying any of the plurality of persons as the visitor based on the facial features determined by the match determining means to match the visitor face features;
5. The device according to claim 4, further comprising third parent device notification control means for controlling the parent device notification means to notify the information of the visitor specified by the visitor specifying means. Intercom system.
The coincidence determining unit determines, based on a part of the plurality of facial features and a part of the visitor features, when a part of the visitor features is extracted by the face feature determining unit. And
The base unit is
Candidate specification for specifying any one of the plurality of persons as a candidate for the visitor based on a part of the plurality of facial features determined to match a part of the visitor facial feature by the matching determination unit Means,
6. A fourth parent device notification control unit that controls the parent device notification unit to notify information on the visitor candidate specified by the candidate specifying unit. The intercom system described in.