JP2005109612A

JP2005109612A - Interphone apparatus, voice conversion method in interphone apparatus, voice conversion program for interphone apparatus

Info

Publication number: JP2005109612A
Application number: JP2003336881A
Authority: JP
Inventors: Nobuhiko Takehara; 伸彦竹原; Tomoki Watabe; 智樹渡部
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-09-29
Filing date: 2003-09-29
Publication date: 2005-04-21
Anticipated expiration: 2023-09-29
Also published as: JP3908212B2

Abstract

<P>PROBLEM TO BE SOLVED: To convert the voice of a corresponding user through an interphone depending on a person photographed with a video image. <P>SOLUTION: When an interphone apparatus 1 receives the video image of a visitor from an interphone terminal 2, a person detecting means 13 detects the person and a feature amount extracting means 14 extracts the feature amount of the person and checks whether the person is a registered visitor by referring to a user DB151. If it is a first-time visitor, features of the person and user's voice conversion request information are registered in the user DB151. If the visitor is already registered in the user DB151, a voice converting means 12 converts voice from a user terminal 3 based on the registered voice conversion request information and outputs the converted voice to the interphone terminal 2. The user may register voice conversion conditions previously in the user DB151 of the interphone apparatus 1. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は，玄関と室内とを連絡用として用いるインターホン装置に関し，特に，訪問者に応じて自動的に音声を変換して出力するようにしたインターホン装置，インターホン装置における音声変換方法，インターホン装置用音声変換プログラムに関する。 The present invention relates to an intercom device that uses an entrance and a room for communication, and in particular, an intercom device that automatically converts and outputs a sound according to a visitor, a sound conversion method in the intercom device, and an intercom device It relates to a voice conversion program.

従来からカメラ付きのドアホン装置等はあった。例えば，下記の特許文献１には，ビデオカメラに電源が投入された時に色調整等のために発生する映像のゆらぎが住戸のモニタテレビに出力されるのを防止し得るカメラドアホン装置について記載されている。なお，インターホン装置に関する技術ではないが，顔画像データから性別・年齢を推定する技術については，以下の特許文献２に記載されている。
特開平８−４６９５４号公報特開２００３−９９７７９号公報 There have been door phone devices with cameras. For example, Patent Document 1 listed below describes a camera door phone device that can prevent image fluctuations that occur due to color adjustment when a video camera is powered on, from being output to a monitor TV in a dwelling unit. ing. In addition, although it is not a technique regarding an intercom device, a technique for estimating gender and age from face image data is described in Patent Document 2 below.
JP-A-8-46954 JP 2003-99779 A

従来のインターホンでは，映像で撮影された人物に応じて，室内で対応した人物の音声を変換することができなかった。そのため，インターホンを玄間と室内との連絡用として用いると，インターホンを介して，部屋の住人が女性であるかもしくは子供であるかなどが外部の人間にわかってしまい防犯上の問題があった。 With a conventional intercom, it was not possible to convert the sound of a person who was supported indoors according to the person photographed in the video. For this reason, when intercoms are used for communication between the front door and the room, there is a problem with crime prevention because it is possible for outside people to know whether a resident of the room is a woman or a child through the intercom. .

本発明は，上記従来技術の問題点を解決し，映像で撮影された人物に応じて，対応したユーザの音声を変換することによって，女性を狙って何度も同じ家に訪問する訪問販売者等を避けるのに有効な手段を提供することを目的とする。 The present invention solves the above-mentioned problems of the prior art, and converts the corresponding user's voice according to the person photographed in the video, thereby visiting a visitor who visits the same house many times with the aim of a woman. The purpose is to provide an effective means for avoiding such problems.

本発明は，上記課題を解決するため，玄関と室内の連絡用として用いるインターホン装置であって，映像から人物を検出する人物検出手段と，検出した人物の特徴量を抽出する特徴量抽出手段と，抽出した人物の特徴量およびその人物に対する過去のユーザの音声変換要求を登録し記憶する記憶手段と，前記登録した人物の特徴量と映像から抽出した人物の特徴量との照合により，前記ユーザの音声変換要求の登録情報に基づいて，ユーザの音声を変換する音声変換手段とを備えることを特徴とする。 In order to solve the above-mentioned problem, the present invention is an intercom device used for communication between an entrance and a room, a person detecting means for detecting a person from an image, and a feature quantity extracting means for extracting a detected feature quantity of the person, , The storage means for registering and storing the extracted feature value of the person and the voice conversion request of the past user for the person, and collation of the registered feature value of the person and the feature value of the person extracted from the video, And voice conversion means for converting the voice of the user based on the registration information of the voice conversion request.

また，本発明は，玄関と室内の連絡用として用いるインターホン装置であって，予め設定された音声変換の条件を登録し記憶する記憶手段と，映像から人物を検出する人物検出手段と，検出した人物の特徴量を抽出する特徴量抽出手段と，抽出した人物の特徴量と前記音声変換の条件とのマッチングにより，ユーザの音声を変換する音声変換手段とを備えることを特徴とする。 Further, the present invention is an intercom device used for communication between an entrance and a room, which stores and stores a preset voice conversion condition, a person detection means for detecting a person from an image, and a detection It is characterized by comprising a feature quantity extraction means for extracting a person's feature quantity, and a voice conversion means for converting a user's voice by matching the extracted person's feature quantity with the voice conversion condition.

また，本発明は，玄関と室内の連絡用として用いるインターホン装置における音声変換方法であって，映像から人物を検出し，検出した人物の特徴量を抽出し，抽出した人物およびその検出した人物に対する過去のユーザの音声変換要求を登録しておき，登録した人物の特徴量と映像から抽出した人物の特徴量との照合により，前記ユーザの音声変換要求の登録情報に基づいて，ユーザの音声を変換することを特徴とする。 The present invention is also an audio conversion method in an intercom device used for communication between an entrance and a room, detecting a person from an image, extracting a feature quantity of the detected person, and extracting the detected person and the detected person. The user's voice conversion request is registered, and the user's voice is obtained based on the registered information of the user's voice conversion request by comparing the registered person's feature quantity with the person's feature quantity extracted from the video. It is characterized by converting.

また，本発明は，玄関と室内の連絡用として用いるインターホン装置における音声変換方法であって，予めユーザにより設定された音声変換の条件を登録し記憶しておき，映像から人物を検出し，検出した人物の特徴量を抽出し，抽出した人物の特徴量と前記音声変換の条件とのマッチングにより，ユーザの音声を変換することを特徴とする。 The present invention also relates to a voice conversion method in an intercom device used for communication between a front door and a room, in which a voice conversion condition set in advance by a user is registered and stored, and a person is detected from an image and detected. The feature amount of the extracted person is extracted, and the voice of the user is converted by matching the extracted feature amount of the person with the condition of the voice conversion.

以上のインターホン装置における音声変換方法の処理は，インターホン装置が備えるマイクロコンピュータとソフトウェアプログラムとによって実現することができ，そのプログラムをコンピュータ読み取り可能な記録媒体に記録することができる。 The processing of the voice conversion method in the interphone device described above can be realized by a microcomputer and a software program provided in the interphone device, and the program can be recorded on a computer-readable recording medium.

本発明によれば，映像で撮影された人物に応じて，対応したユーザの音声を変換することが可能となる。これにより，訪問者は，現在の在宅者が女性もしくは子供などであることがインターホンを介して分からず，ホームセキュリティを高めることができる。 According to the present invention, it is possible to convert the corresponding user's voice according to the person photographed in the video. Thereby, the visitor cannot know that the current home-stayer is a woman or a child through the intercom, and can improve home security.

図１は，本発明のシステム構成例を示す図である。１はインターホン装置，２は玄関に設置されるカメラ付きのインターホン端末，３は室内に設置され，在宅者が訪問者に対応するためのユーザ端末である。 FIG. 1 is a diagram showing a system configuration example of the present invention. 1 is an interphone device, 2 is an interphone terminal with a camera installed at the entrance, and 3 is a user terminal that is installed indoors so that a person staying at home corresponds to a visitor.

インターホン装置１は，インターホン端末２およびユーザ端末３との間で情報を送受信するための送受信手段１１，ユーザの音声を変換する音声変換手段１２，インターホン端末２により撮影された映像から人物および顔画像を検出する人物検出手段１３，検出された人物の顔，性別，年齢，身長等の特徴量を抽出する特徴量抽出手段１４，記憶手段１５から構成される。記憶手段１５内において，１５１は検出された人物の特徴量やユーザ設定情報が登録されるユーザデータベース（ＤＢ）である。これらの各手段は，マイクロコンピュータおよびソフトウェアプログラム等によって実現される。インターホン装置１は，ユーザ端末３と同一の筐体にあってもよい。 The intercom apparatus 1 includes a transmission / reception unit 11 for transmitting / receiving information between the interphone terminal 2 and the user terminal 3, a voice conversion unit 12 for converting a user's voice, and a person and a face image from video captured by the interphone terminal 2. A person detecting means 13 for detecting the feature amount, a feature quantity extracting means 14 for extracting feature quantities such as the detected face, gender, age, and height of the person, and a storage means 15. In the storage unit 15, reference numeral 151 denotes a user database (DB) in which feature amounts of detected persons and user setting information are registered. Each of these means is realized by a microcomputer and a software program. The intercom device 1 may be in the same housing as the user terminal 3.

インターホン装置１には，送受信手段１１を介して訪問者が対応するためのインターホン端末２およびユーザが操作するユーザ端末３が接続されている。インターホン端末２は，送受信手段２１，訪問者を撮影する映像撮像手段２２，呼び出し手段（呼び出しボタン）２３，通話手段２４から構成される。ユーザ端末３は，送受信手段３１，表示手段３２，ユーザが操作可能な操作手段３３，通話手段３４から構成される。 The intercom device 1 is connected to an intercom terminal 2 for a visitor to respond to and a user terminal 3 operated by a user via a transmission / reception means 11. The intercom terminal 2 includes a transmission / reception means 21, a video imaging means 22 for photographing a visitor, a call means (call button) 23, and a call means 24. The user terminal 3 includes a transmission / reception unit 31, a display unit 32, an operation unit 33 that can be operated by a user, and a call unit 34.

〔第１の実施の形態〕
図２は，本発明の第１の実施の形態を示す図である。これはユーザ宅に初めて人物が訪問する際に，ユーザが訪問者を見てから音声を変換するかどうかを判断する場合の例である。登録された人物が訪問した場合には，登録情報をもとに自動で音声を変換する。 [First Embodiment]
FIG. 2 is a diagram showing a first embodiment of the present invention. This is an example of a case where when a person visits a user's home for the first time, the user sees the visitor and determines whether to convert the sound. When a registered person visits, the voice is automatically converted based on the registration information.

ユーザ宅に訪問者が現れた場合，インターホン端末２は映像撮像を開始し（ステップＳ２０１），呼び出しボタンを訪問者が押すと呼び出し信号をインターホン装置１に送信する（ステップＳ２０２）。 When a visitor appears at the user's house, the intercom terminal 2 starts image capturing (step S201), and when the visitor presses the call button, a call signal is transmitted to the intercom apparatus 1 (step S202).

インターホン装置１は，インターホン端末２から映像および呼び出し信号を受信すると（ステップＳ１０１），映像から人物および人物の顔の検出処理を行う（ステップＳ１０２）。この人物検出処理の方法は，画像処理で背景差分を用いて，動画物体を検出し，検出された動画物体の中から顔の位置を検出する方法などを用いればよい。次に検出された顔の中から顔認識のための特徴量を抽出し（ステップＳ１０３），抽出された特徴量をユーザＤＢ１５１へ問い合わせ，登録された訪問者であるかを調べる（ステップＳ１０４）。次に，ユーザＤＢ１５１に問い合わせた結果から，訪問者が既に登録された人物かを判断し，その情報をユーザ端末３に送信して，ユーザに提示する（ステップＳ１０５）。 When the interphone device 1 receives the video and the calling signal from the interphone terminal 2 (step S101), the interphone device 1 performs a process of detecting a person and a person's face from the video (step S102). As a method of this person detection processing, a method of detecting a moving image object using a background difference in image processing and detecting a face position from the detected moving image object may be used. Next, a feature quantity for face recognition is extracted from the detected face (step S103), the extracted feature quantity is inquired to the user DB 151, and it is checked whether the visitor is registered (step S104). Next, it is determined whether or not the visitor has already been registered from the result of the inquiry to the user DB 151, and the information is transmitted to the user terminal 3 and presented to the user (step S105).

次に，ユーザ端末３は，訪問者の映像受信・表示と同時に，映像の人物が以前の訪問者であるか否かの問い合わせ結果を提示する（ステップＳ３０１）。既に登録された人物であるかどうかをユーザが確認の後（ステップＳ３０２），既に登録があった場合には，ユーザ端末３は訪問者と通話するためにユーザ応答信号をインターホン装置１に送信する（ステップＳ３０３）。インターホン装置１は，ユーザＤＢ１５１に問い合わせ，以前の対応結果，すなわち以前に音声をどのように変換して対応したかの情報をもとに音声変換方法を決定する（ステップＳ１０６）。 Next, the user terminal 3 presents an inquiry result as to whether or not the person of the video is a previous visitor simultaneously with the video reception / display of the visitor (step S301). After the user confirms whether or not the person has already been registered (step S302), if the user has already been registered, the user terminal 3 transmits a user response signal to the intercom apparatus 1 in order to talk to the visitor. (Step S303). The intercom apparatus 1 makes an inquiry to the user DB 151, and determines a speech conversion method based on the previous correspondence result, that is, information on how speech was previously converted and handled (step S106).

また，ステップＳ３０２の判定において，登緑がなかった場合（初めての訪問者の場合），ユーザ端末３は，操作手段３３から音声（声）を変換するかどうかのユーザの指示を入力する（ステップＳ３０４）。ユーザが音声を変換して対応したい訪問者の場合，音声変換の要求信号をインターホン装置１に送信し（ステップＳ３０５），ユーザ応答信号をインターホン端末２に送信する（ステップＳ３０６）。 Further, in the determination in step S302, if there is no climbing (in the case of a first visitor), the user terminal 3 inputs a user instruction whether to convert voice (voice) from the operation means 33 (step S302). S304). If the user wants to respond by converting the voice, a request signal for voice conversion is transmitted to the intercom apparatus 1 (step S305), and a user response signal is transmitted to the intercom terminal 2 (step S306).

インターホン装置１は，ユーザ端末３から音声変換要求信号（または，音声非変換要求信号）を受信し，ユーザＤＢ１５１に登録する（ステップＳ１０７）。ここでいう音声変換はユーザの音声を変化させるものであり，例えば，周波数を変換することにより，女性の音声や子供の音声を成人男性の音声に近づけるものをいう。 The intercom apparatus 1 receives the voice conversion request signal (or voice non-conversion request signal) from the user terminal 3 and registers it in the user DB 151 (step S107). The voice conversion here is to change the voice of the user. For example, it converts the voice of a woman or the voice of a child to the voice of an adult male by converting the frequency.

インターホン装置１は，ユーザが音声変換を希望していない場合には，そのままユーザ応答信号をインターホン端末２に送信する（ステップＳ１１０）。インターホン装置１は，ユーザが音声変換を希望している場合には（ステップＳ１０８），音声変換処理の設定を行い，通話が開始されたならユーザの音声を変換するようにし（ステップＳ１０９），ユーザ応答信号をインターホン端末２に送信する（ステップＳ１１０）。 When the user does not desire voice conversion, the intercom apparatus 1 transmits a user response signal as it is to the interphone terminal 2 (step S110). When the user desires voice conversion (step S108), the intercom apparatus 1 sets the voice conversion process, and converts the user's voice when a call is started (step S109). A response signal is transmitted to the intercom terminal 2 (step S110).

インターホン端末２はユーザ応答信号を受信し（ステップＳ２０３），ユーザとの通話を開始する（ステップＳ２０４）。ユーザ端末３の通話切断要求信号（ステップＳ３０７）が，インターホン装置１から転送されると（ステップＳ１１１），通話は切断される（ステップＳ２０５）。 The intercom terminal 2 receives the user response signal (step S203) and starts a call with the user (step S204). When the call disconnect request signal (step S307) of the user terminal 3 is transferred from the intercom apparatus 1 (step S111), the call is disconnected (step S205).

〔第２の実施の形態〕
図３は，本発明の第２の実施の形態を示す図である。第２の実施の形態では，ユーザが予め音声変換の条件を設定し，ユーザが設定した音声変換の条件の特徴をもった人物が訪問した場合には，音声変換の条件をもとに音声を自動で変換する。 [Second Embodiment]
FIG. 3 is a diagram showing a second embodiment of the present invention. In the second embodiment, when a user sets voice conversion conditions in advance and a person with the characteristics of the voice conversion conditions set by the user visits, the voice is converted based on the voice conversion conditions. Convert automatically.

ユーザは，ユーザ端末３の操作手段３３を用いて，ユーザの望む音声変換の条件を設定する（ステップＳ６０１）。設定する音声変換の条件は，例えば「訪問者が男性と判断された場合には，男性の音声を出す」，「訪問者が女性もしくは子供と判断された場合には，音声の変換をしない」というような条件である。 The user uses the operation means 33 of the user terminal 3 to set the voice conversion conditions desired by the user (step S601). The voice conversion conditions to be set are, for example, “If the visitor is determined to be male, the male voice is output”, “If the visitor is determined to be female or child, no voice conversion is performed”. It is such a condition.

これらを設定するためのメニュー項目は，インターホン装置１からユーザ端末３に送られる。設定条件画面の一例を図４に示す。設定条件として，訪問者の訪問履歴，性別，年齢，身長等，種々の条件を設定対象にすることができる。例えば，図４（Ａ）の設定条件画面では，過去の訪問回数が０回と１回の場合にはユーザ端末３で入力された音声を男性の音声に変換して出力し，２回の場合および３回以上の場合には音声変換をしないという設定がされている。 Menu items for setting these items are sent from the intercom apparatus 1 to the user terminal 3. An example of the setting condition screen is shown in FIG. As setting conditions, various conditions such as visit history, sex, age, and height of visitors can be set. For example, in the setting condition screen of FIG. 4A, when the number of past visits is 0 and 1, the voice input at the user terminal 3 is converted into a male voice and output. In the case of three or more times, a setting is made so that voice conversion is not performed.

図４（Ａ）の例では．過去の訪問回数を音声変換の条件にしているが，設定条件としては，図４（Ｂ）に示すように訪問者の性別を音声変換の条件にする場合や，図４（Ｃ）に示すように訪問者の身長を音声変換の条件にする場合など様々考えられる。また，顔画像データから訪問者の年齢を推定し，年齢を音声変換の条件にすることも可能である。これらの条件を組み合わせて設定することも可能である。性別・年齢推定には，サポートベクタマシン（ＳＶＭ）を用いたものや，平均顔データを用意しておき該人物の年齢と性別を推定する技術（例えば，特許文献２参照）があるので，その技術を利用すればよい。 In the example of FIG. The number of past visits is set as a voice conversion condition. As a setting condition, as shown in FIG. 4B, the sex of the visitor is set as a voice conversion condition, or as shown in FIG. In addition, there are various cases where the height of the visitor is used as a condition for voice conversion. It is also possible to estimate the visitor's age from the face image data and make the age a condition for voice conversion. These conditions can be set in combination. For gender and age estimation, there are those using a support vector machine (SVM) and techniques for preparing the average face data and estimating the age and gender of the person (for example, see Patent Document 2). Use technology.

上記ステップＳ６０１においてユーザが設定条件画面で条件を設定すると，ユーザ設定条件がインターホン装置１に送信され，ユーザ設定条件がインターホン装置１の記憶手段１５中のユーザＤＢ１５１に登録される（ステップＳ５０４）。 When the user sets conditions on the setting condition screen in step S601, the user setting conditions are transmitted to the intercom apparatus 1, and the user setting conditions are registered in the user DB 151 in the storage unit 15 of the interphone apparatus 1 (step S504).

ユーザが条件を設定後，ユーザ宅に訪問者が現れた場合，インターホン端末２は映像撮像を開始し（ステップＳ４０１），呼び出しボタンを訪問者が押すと呼び出し信号をインターホン装置１に送信する（ステップＳ４０２）。インターホン装置１は，インターホン端末２から映像および呼び出し信号を受信すると（ステップＳ５０１），映像からの人物および人物の顔の検出処理を行う（ステップＳ５０２）。この人物検出処理の方法は，画像処理で背景差分を用いて，動画物体を検出し，検出された動画物体の中から顔の位置を検出する方法などを用いればよい。 After a user sets conditions, when a visitor appears at the user's home, the intercom terminal 2 starts video imaging (step S401), and when the visitor presses a call button, a call signal is transmitted to the intercom apparatus 1 (step S401). S402). When the intercom apparatus 1 receives the video and the calling signal from the interphone terminal 2 (step S501), the interphone device 1 performs a process of detecting a person and a person's face from the video (step S502). As a method of this person detection processing, a method of detecting a moving image object using a background difference in image processing and detecting a face position from the detected moving image object may be used.

次に，検出された顔の中から顔認識のための特徴量を抽出し（ステップＳ５０３），抽出された特徴量を見て，ユーザ設定条件とのマッチングを行う（ステップＳ５０５）。その後，マッチング結果をユーザ端末３に送信する（ステップＳ５０６）。 Next, feature quantities for face recognition are extracted from the detected faces (step S503), and the extracted feature quantities are viewed to match with user setting conditions (step S505). Thereafter, the matching result is transmitted to the user terminal 3 (step S506).

ユーザ端末３は，訪問者とユーザ設定条件とのマッチング結果を受信すると（ステップＳ６０２），ユーザ応答信号をインターホン装置１に送信する（ステップＳ６０３）。インターホン装置１は，マッチング結果をもとにユーザの望む音声変換処理をした後（ステップＳ５０７），ユーザ応答信号をインターホン端末２に送信する（ステップＳ５０８）。インターホン端末２はユーザ応答信号を受信すると（ステップＳ４０３），ユーザとの通話を開始する（ステップＳ４０４）。通話が終了し，ユーザ端末３の通話切断要求信号（ステップＳ６０４）が，インターホン装置１から転送されると（ステップＳ５０９），通話は切断される（ステップＳ４０５）。 When receiving the matching result between the visitor and the user setting condition (step S602), the user terminal 3 transmits a user response signal to the intercom apparatus 1 (step S603). The intercom apparatus 1 performs a voice conversion process desired by the user based on the matching result (step S507), and then transmits a user response signal to the interphone terminal 2 (step S508). When the intercom terminal 2 receives the user response signal (step S403), it starts a call with the user (step S404). When the call ends and the call disconnect request signal (step S604) of the user terminal 3 is transferred from the intercom apparatus 1 (step S509), the call is disconnected (step S405).

以上の発明により，一度，訪問した訪問者の顔のデータベースをユーザのインターホンに登録しておき，ユーザが予め指定した人物が再度訪れた際には，自動で例えば男性の音声で対応することを可能とする。 According to the above invention, the visitor's face database once visited is registered in the user's interphone, and when a person designated in advance by the user visits again, for example, the voice of a man is automatically handled. Make it possible.

また，ユーザが予め条件を設定し，例えば，訪問した人物の画像特徴から男性と判断された場合には，男性の音声を出力する。また，女性と判断された場合には，女性の音声を出力するというようなことも可能とする。 In addition, when the user sets conditions in advance and, for example, it is determined that the user is male from the image characteristics of the person who has visited, the male voice is output. In addition, if it is determined that the woman is female, it is possible to output a female voice.

インターホンを玄間と室内との連絡用として用いると，インターホンを介して，部屋の住人が女性もしくは子供などがわかってしまい防犯上の問題があったが，本発明によって音声を変換することで，女性を狙って何度も同じ家に訪問する訪問販売者等には有用な手段となる。 If the intercom is used for communication between the front door and the room, the inhabitants of the room could know women or children through the intercom, but there was a problem in crime prevention. By converting the sound by the present invention, It is a useful tool for door-to-door sellers who visit the same house many times aiming at women.

本発明のシステム構成例を示す図である。It is a figure which shows the system configuration example of this invention. 本発明の第１の実施の形態を示す図である。It is a figure which shows the 1st Embodiment of this invention. 本発明の第２の実施の形態を示す図である。It is a figure which shows the 2nd Embodiment of this invention. 設定条件画面の一例を示す図である。It is a figure which shows an example of a setting condition screen.

Explanation of symbols

１インターホン装置
２インターホン端末
３ユーザ端末
１１，２１，３１送受信手段
１２音声変換手段
１３人物検出手段
１４特徴量抽出手段
１５記憶手段
１５１ユーザＤＢ
２２映像撮像手段
２３呼び出し手段
２４，３４通話手段
３２表示手段
３３操作手段 DESCRIPTION OF SYMBOLS 1 Intercom apparatus 2 Intercom terminal 3 User terminal 11, 21, 31 Transmission / reception means 12 Voice conversion means 13 Person detection means 14 Feature-value extraction means 15 Storage means 151 User DB
22 Video imaging means 23 Calling means 24, 34 Calling means 32 Display means 33 Operating means

Claims

An intercom device used for communication between the outside and the room,
A person detecting means for detecting a person from an image taken by an outdoor camera;
A feature quantity extracting means for extracting a feature quantity of the detected person;
Storage means for registering and storing the extracted feature amount of the person and the voice conversion request of the past user for the person;
Voice conversion means for converting a user's voice based on registration information of the user's voice conversion request by collating the feature quantity of the registered person with a person's feature quantity extracted from the video. Intercom device to do.

An intercom device used for communication between the outside and the room,
Storage means for registering and storing preset voice conversion conditions;
A person detecting means for detecting a person from an image taken by an outdoor camera;
A feature quantity extracting means for extracting a feature quantity of the detected person;
An intercom apparatus, comprising: voice conversion means for converting a user's voice by matching the extracted feature quantity of the person with the voice conversion condition.

A voice conversion method in an intercom device used for communication between the outside and the room,
Detecting a person from an image taken by an outdoor camera;
Extracting a feature amount of the detected person;
Registering and storing the extracted feature amount of the person and the voice conversion request of the past user for the person;
And a step of converting the user's voice based on registration information of the user's voice conversion request by collating the feature quantity of the registered person with the feature quantity of the person extracted from the video. A voice conversion method in an apparatus.

A voice conversion method in an intercom device used for communication between the outside and the room,
Registering and storing preset voice conversion conditions;
Detecting a person from an image taken by an outdoor camera;
Extracting a feature amount of the detected person;
A voice conversion method in an intercom apparatus, comprising: converting a user's voice by matching the extracted feature quantity of the person and the voice conversion condition.

A program for causing a computer to execute a voice conversion method in an intercom device used for communication between the outside and the room,
A process for detecting a person from video taken by an outdoor camera;
A process for extracting feature quantities of the detected person;
Processing for registering and storing the extracted feature amount of the person and the voice conversion request of the past user for the person;
A process of converting the user's voice based on the registration information of the user's voice conversion request by collating the registered person's feature quantity with the person's feature quantity extracted from the video,
A voice conversion program for an intercom device to be executed by a computer.

A program for causing a computer to execute a voice conversion method in an intercom device used for communication between the outside and the room,
Processing for registering and storing preset voice conversion conditions;
A process for detecting a person from video taken by an outdoor camera;
A process for extracting feature quantities of the detected person;
A process of converting the user's voice by matching the extracted feature quantity of the person with the voice conversion condition,
A voice conversion program for an intercom device to be executed by a computer.