JP2005275935A

JP2005275935A - Terminal device

Info

Publication number: JP2005275935A
Application number: JP2004089915A
Authority: JP
Inventors: Erina Takigawa; えりな瀧川; Ui Tada; 有為多田
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 2004-03-25
Filing date: 2004-03-25
Publication date: 2005-10-06

Abstract

<P>PROBLEM TO BE SOLVED: To raise the convenience of a user interface in a device which is usable even in a place with much noise such as a busy place without requiring a card which records information concerning users. <P>SOLUTION: The facial image of the user is picked up so as to estimate the attribute (race, sex, age, etc.) of the user through the use of the image. The convenience of the user interface is raised by controlling the content or state of an output, on the basis of the estimation result. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、ユーザに対し情報や選択肢などを出力する装置などに適用されて有効な技術に関する。 The present invention relates to a technique that is effective when applied to a device that outputs information, options, and the like to a user.

従来、端末装置において出力される表示や音声に用いられる言語はその端末装置が設置される環境などに応じて予め決定されていたが、現在、グローバル化が進むことにより、設置環境などに限らず複数種の言語に対応する事が必要となってきた。例えば、米国のあるＡＴＭ（Automated Teller Machine）では、多くの言語に対応するため、タッチパネルやボタンを操作することで、その後の表示をユーザが選択した言語の表示に変更させることが可能となっている。しかし、言語の選択肢が多くなればなるほど、ユーザが所望の言語を選択するのに多くの時間や手間を要してしまうという問題があった。 Conventionally, the language used for display and voice output in a terminal device has been determined in advance according to the environment in which the terminal device is installed. However, as the globalization progresses, the language is not limited to the installation environment. It has become necessary to support multiple languages. For example, in an ATM (Automated Teller Machine) in the United States, since it corresponds to many languages, it is possible to change the subsequent display to the display of the language selected by the user by operating the touch panel and buttons. Yes. However, there is a problem that as the number of language choices increases, a user needs more time and effort to select a desired language.

このような問題に対し、人物の識別情報入力手段としてのＩＤカードやクレジットカードや預金通帳などに記録された利用者情報を取得し、利用者情報の特性に応じて使用する言語を切り替える技術がある（特許文献１参照）。また、ユーザの第一声を認識し、ユーザの言語を識別して、ユーザの言語を用いた音声応答で対応する技術がある（特許文献２参照）。これらの技術によれば、ユーザがわざわざ所望の言語を選択する必要はなく、端末装置の操作に要していた手間を軽減できる。
特開２０００−２５０８４０号公報特開２００３−３１６３８３号公報 For such a problem, there is a technique for acquiring user information recorded in an ID card, a credit card, a bankbook, etc. as a person identification information input means and switching a language to be used according to the characteristics of the user information. Yes (see Patent Document 1). Further, there is a technique for recognizing the user's first voice, identifying the user's language, and responding by voice response using the user's language (see Patent Document 2). According to these technologies, it is not necessary for the user to bother to select a desired language, and it is possible to reduce the labor required for operating the terminal device.
JP 2000-250840 A JP 2003-316383 A

しかしながら、従来の上記技術には以下のような問題点があった。利用者情報を記録しておく技術においては、ユーザは利用しようとする端末装置に応じたカードなどの記録媒体を有している必要があった。このため、利用者を特定する必要のない端末装置（例えば公共端末装置など。具体的には、券売機，観光案内装置，博物館や美術館などの解説装置など）に対しては、利用者情報を記録しておく技術を適用することは困難であった。 However, the conventional technique has the following problems. In the technique for recording user information, the user needs to have a recording medium such as a card corresponding to the terminal device to be used. For this reason, user information is not provided to terminal devices that do not require identification of users (for example, public terminal devices, such as ticket vending machines, tourist information devices, and explanation devices such as museums and art galleries). It was difficult to apply the technology to record.

また、ユーザの声から言語を認識する技術では、人通りの多い場所など雑音の多い場所においては、利用者の声を正確に認識することは困難であった。特に、利用者を特定する必要のない上記装置は、このような雑音の多い場所に設置されることが多いため、ユーザの声から言語を認識する技術は不適切であった。 Also, with the technology for recognizing the language from the user's voice, it is difficult to accurately recognize the user's voice in a noisy place such as a busy place. In particular, the above-mentioned device that does not need to specify a user is often installed in such a noisy place, and thus a technique for recognizing a language from a user's voice has been inappropriate.

そこで本発明はこれらの問題を解決し、利用者に関する情報が記録されたカードなどを要することなくかつ人通りの多い場所など雑音の多い場所においても利用可能な装置であって、ユーザインタフェースの利便性を向上させるための装置を提供することを目的とする。 Therefore, the present invention solves these problems and is an apparatus that can be used in a noisy place such as a place where there is a lot of traffic without requiring a card or the like on which information about the user is recorded. An object of the present invention is to provide an apparatus for improving performance.

上記問題を解決するため、本発明は以下のような構成をとる。本発明の第一の態様は、端末装置であって、ユーザの顔を撮像する撮像手段，撮像手段によって撮像された画像からユーザの顔を検出する検出手段，検出手段によって検出された顔の画像からユーザの属性を推定する属性推定手段，属性推定手段による推定結果に応じて出力の内容や態様を制御する出力制御手段，及び、出力制御手段による制御内容に応じて出力を行う出力手段を備える。 In order to solve the above problems, the present invention has the following configuration. 1st aspect of this invention is a terminal device, Comprising: The imaging means which images a user's face, The detection means which detects a user's face from the image imaged by the imaging means, The image of the face detected by the detection means Attribute estimation means for estimating the user's attributes, output control means for controlling the content and mode of output according to the estimation result by the attribute estimation means, and output means for outputting according to the control content by the output control means .

本発明によれば、ユーザの音声や、カードに記録された情報などを用いることなく、ユーザの顔の画像によってその属性が推定される。従って、ユーザはカードなどの記録媒体を保持する必要が無く、さらに雑音の多い場所などにおいてもユーザの属性を推定することが可能となる。そして、本発明によれば、推定されたユーザの属性（推定結果）に応じて、出力の内容や態様が制御される。 According to the present invention, the attribute is estimated from the image of the user's face without using the user's voice or information recorded on the card. Therefore, the user does not need to hold a recording medium such as a card, and the user's attribute can be estimated even in a noisy place. And according to this invention, the content and aspect of an output are controlled according to the estimated user attribute (estimation result).

ここで、ユーザの属性とは、例えばユーザの人種、年代、性別などに関する情報である。また、出力の内容や態様とは、例えば出力に用いられる言語や言葉，出力される文章などの内容，表示される文字の大きさ，表示される文字へのふりがなの有無，表示される画像の内容や種類，出力される音声の音量，出力される音声の性別などである。このような出力の内容や態様が、ユーザの属性に応じて制御されるため、端末装置のユーザインタフェースの利便性向上を図る事ができる。 Here, the user attribute is information on the race, age, sex, etc. of the user, for example. The contents and form of output include, for example, the language and words used for output, the contents of output text, the size of displayed characters, the presence or absence of phonetics for displayed characters, and the displayed image. The content and type, the volume of the output sound, and the gender of the output sound. Since the content and mode of such output are controlled according to the user's attributes, it is possible to improve the convenience of the user interface of the terminal device.

また、属性推定手段によって推定される属性は、ユーザの人種に関する情報を含むように構成されても良い。この場合、出力制御手段は、推定された人種に応じた言語による出力を行うように出力手段を制御するように構成されても良い。 Moreover, the attribute estimated by the attribute estimation means may be configured to include information on the race of the user. In this case, the output control means may be configured to control the output means so as to perform output in a language corresponding to the estimated race.

このように構成された本発明では、ユーザの人種に関する情報が少なくとも推定される。そして、推定されたユーザの人種に応じた言語による出力が行われるため、ユーザは容易に出力の内容を理解することが可能となる。 In the present invention configured as described above, at least information on the race of the user is estimated. And since the output by the language according to the estimated race of the user is performed, the user can easily understand the contents of the output.

また、属性推定手段によって推定される属性がユーザの人種に関する情報を含むように構成された場合に、出力制御手段は、推定された人種に応じた複数の言語候補を出力するように出力手段を制御するように構成されても良い。この場合、端末装置は、ユーザが、出力手段において使用される言語を複数の言語候補の中から選択するための選択手段をさらに備えても良い。さらに、出力制御手段は、選択手段を介して言語の選択が行われた後は、選択された言語による出力を行うように出力手段を制御するように構成されても良い。 Further, when the attribute estimated by the attribute estimation unit is configured to include information on the user's race, the output control unit outputs so as to output a plurality of language candidates according to the estimated race. It may be configured to control the means. In this case, the terminal device may further include a selection unit for the user to select a language used in the output unit from a plurality of language candidates. Further, the output control means may be configured to control the output means so as to perform output in the selected language after the language is selected via the selection means.

このように構成された本発明では、ユーザは、選択手段を用いる事によって、出力手段において使用される言語を選択することが可能となる。この際に、あらゆる言語が選択するための候補として提示されるのではなく、ユーザの人種に応じた複数の言語が候補として提示される。人種に応じた言語候補は、例えば、各人種によって使用される言語の統計結果などに基づいて予め決定されても良い。このように構成されることにより、ユーザが使用言語を選択する際の労力を軽減させることが可能となる。 In the present invention configured as described above, the user can select a language used in the output unit by using the selection unit. At this time, not all languages are presented as candidates for selection, but a plurality of languages according to the race of the user are presented as candidates. Language candidates corresponding to the race may be determined in advance based on, for example, a statistical result of a language used by each race. With this configuration, it is possible to reduce the labor when the user selects the language to be used.

また、属性は、ユーザの年代に関する情報を含むように構成されても良い。この場合、出力制御手段は、推定された年代が高齢であると判断できる場合に、表示される文字を通常時よりも大きく表示する、及び／又は、出力される音声を通常時よりも大きな音量で出力するように出力手段を制御するように構成されても良い。 The attribute may be configured to include information related to the user's age. In this case, the output control means displays the displayed characters larger than normal when the estimated age can be determined to be older, and / or the output sound is louder than normal. It may be configured to control the output means so as to output the output.

このように構成された本発明では、ユーザの年代が推定され、この年代からユーザが高齢であると判断できる場合に、文字が通常時よりも大きく表示される。又は、ユーザが高齢であると判断できる場合に、音声が通常時よりも大きな音量で出力される。このため、高齢者にとって読みやすい文字や聞き取りやすい音量での出力が行われる。 In the present invention configured as described above, the age of the user is estimated, and when it can be determined that the user is old from this age, the characters are displayed larger than the normal time. Or when it can be judged that a user is elderly, a sound is output with a louder volume than usual. For this reason, the output is performed at a volume that is easy to read and easy to hear for the elderly.

また、属性推定手段により複数の顔に対する推定処理が実行された場合に、出力制御手段が複数種の推定結果のうちいずれの推定結果を用いて処理を行うべきか選択する推定結果選択手段をさらに備えるように構成されても良い。この場合、出力制御手段は、推定結
果選択手段により選択された推定結果を用いて処理を行うように構成されても良い。 In addition, when the estimation processing for a plurality of faces is performed by the attribute estimation unit, the output control unit further includes an estimation result selection unit that selects which of the plurality of types of estimation results should be used for processing. You may comprise so that it may provide. In this case, the output control means may be configured to perform processing using the estimation result selected by the estimation result selection means.

本発明は、プログラムが情報処理装置によって実行されることによって実現されても良い。即ち、本発明は、上記した各手段が実行する処理を、情報処理装置に対して実行させるためのプログラム、或いは当該プログラムを記録した記録媒体として特定することができる。また、本発明は、上記した各手段が実行する処理を情報処理装置が実行する方法をもって特定されても良い。 The present invention may be realized by a program being executed by an information processing apparatus. That is, the present invention can specify the processing executed by each of the above-described means as a program for causing the information processing apparatus to execute, or a recording medium on which the program is recorded. Further, the present invention may be specified by a method in which the information processing apparatus executes the processing executed by each of the above-described means.

本発明によれば、利用者の顔画像を用いてその属性が推定されるため、利用者に関する情報が記録されたカードなどを要することなくかつ人通りの多い場所など雑音の多い場所においても、利用者の属性を推定し属性に応じてユーザインタフェースの利便性を向上させることが可能となる。 According to the present invention, since the attribute is estimated using the face image of the user, it is not necessary to use a card or the like on which information about the user is recorded, and even in a noisy place such as a busy place, It is possible to estimate the user's attributes and improve the convenience of the user interface according to the attributes.

以下の説明において、顔画像とは、少なくとも人物の顔の一部または全部の画像を含む画像である。従って、顔画像とは、人物全体の画像を含んでも良いし、人物の顔だけや上半身だけの画像を含んでも良い。また、複数の人物についての画像を含んでも良い。さらに、背景に人物以外の風景（背景：被写体として注目された物も含む）や模様などのいかなるパターンが含まれても良い。 In the following description, a face image is an image including at least a part or all of an image of a person's face. Therefore, the face image may include an image of the entire person, or may include an image of only the face of the person or only the upper body. Moreover, you may include the image about a several person. Furthermore, the background may include any pattern such as a scenery (background: including objects that have attracted attention as a subject) or a pattern other than a person.

［システム構成］
端末装置１は、ハードウェア的には、バスを介して接続されたＣＰＵ（中央演算処理装置），主記憶装置（ＲＡＭ），補助記憶装置などを備える。補助記憶装置は、不揮発性記憶装置を用いて構成される。ここで言う不揮発性記憶装置とは、いわゆるＲＯＭ（Read-Only Memory：ＥＰＲＯＭ（Erasable Programmable Read-Only Memory），ＥＥＰＲＯＭ（Electrically Erasable Programmable Read-Only Memory），マスクＲＯＭ等を含む），
ＦＲＡＭ（Ferroelectric RAM），ハードディスク等を指す。 [System configuration]
In terms of hardware, the terminal device 1 includes a CPU (Central Processing Unit), a main storage device (RAM), an auxiliary storage device, and the like connected via a bus. The auxiliary storage device is configured using a nonvolatile storage device. The non-volatile storage device referred to here is a so-called ROM (including Read-Only Memory: EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), mask ROM, etc.),
FRAM (Ferroelectric RAM), hard disk, etc.

図１は、端末装置１の機能ブロックを示す図である。端末装置１は、補助記憶装置に記憶された各種のプログラム（ＯＳ，アプリケーション等）が主記憶装置にロードされＣＰＵにより実行されることによって、撮像部２，顔検出部３，特徴抽出部４，属性推定部５，属性情報決定部６，出力制御部７，及び出力部８等を含む装置として機能する。顔検出部３，特徴抽出部４，属性推定部５，属性情報決定部６，及び出力制御部７は、プログラムがＣＰＵによって実行されることにより実現される。また、顔検出部３，特徴抽出部４，属性推定部５，属性情報決定部６，及び出力制御部７は専用のチップとして構成されても良い。次に、端末装置１が含む各機能部について説明する。 FIG. 1 is a diagram illustrating functional blocks of the terminal device 1. The terminal device 1 loads various programs (OS, applications, etc.) stored in the auxiliary storage device into the main storage device and is executed by the CPU, whereby the imaging unit 2, the face detection unit 3, the feature extraction unit 4, It functions as a device including an attribute estimation unit 5, an attribute information determination unit 6, an output control unit 7, an output unit 8, and the like. The face detection unit 3, the feature extraction unit 4, the attribute estimation unit 5, the attribute information determination unit 6, and the output control unit 7 are realized by executing a program by the CPU. Further, the face detection unit 3, the feature extraction unit 4, the attribute estimation unit 5, the attribute information determination unit 6, and the output control unit 7 may be configured as a dedicated chip. Next, each functional unit included in the terminal device 1 will be described.

〔撮像部〕
撮像部２は、端末装置１のユーザの顔画像を撮像するように構成され、撮像した画像を顔検出部３へ渡す。例えば撮像部２は、ＣＣＤ（Charge-Coupled Devices）やＣＭＯＳ（Complementary Metal-Oxide Semiconductor）センサ等を用いて構成される。 (Imaging section)
The imaging unit 2 is configured to capture a face image of the user of the terminal device 1, and passes the captured image to the face detection unit 3. For example, the imaging unit 2 is configured using a CCD (Charge-Coupled Devices), a CMOS (Complementary Metal-Oxide Semiconductor) sensor, or the like.

撮像部２は、常に撮像を行うように構成されても良いし、不図示のセンサによってユーザが端末装置１を使用しようとしていることが検知された場合に撮像を行うように構成されても良いし、ユーザによって端末装置１が操作された際に撮像を行うように構成されても良い。 The imaging unit 2 may be configured to always perform imaging, or may be configured to perform imaging when it is detected by a sensor (not illustrated) that the user intends to use the terminal device 1. However, it may be configured to perform imaging when the terminal device 1 is operated by the user.

〔顔検出部〕
顔検出部３は、撮像部２によって撮像された画像から人の顔を検出し、検出された顔の
位置や大きさ等を示す顔情報を特定する。顔検出部３は、例えば、顔全体の輪郭に対応した基準テンプレートを用いたテンプレートマッチングによって顔を検出するように構成されても良い。また、顔検出部３は、顔の構成要素（目，鼻，耳など）に基づくテンプレートマッチングによって顔を検出するように構成されても良い。また、顔検出部３は、クロマキー処理によって頭部などの頂点を検出し、この頂点に基づいて顔を検出するように構成されても良い。また、顔検出部３は、肌の色に近い領域を検出し、その領域を顔として検出するように構成されても良い。また、顔検出部３は、ニューラルネットワークを使って教師信号による学習を行い、顔らしい領域を顔として検出するように構成されても良い。また、顔検出部３による顔検出処理は、その他、既存のどのような技術が適用されることによって実現されても良い。 (Face detection unit)
The face detection unit 3 detects a human face from the image captured by the imaging unit 2 and specifies face information indicating the position and size of the detected face. For example, the face detection unit 3 may be configured to detect a face by template matching using a reference template corresponding to the contour of the entire face. The face detection unit 3 may be configured to detect a face by template matching based on face components (eg, eyes, nose, ears). The face detection unit 3 may be configured to detect a vertex such as a head by chroma key processing and detect a face based on the vertex. The face detection unit 3 may be configured to detect a region close to the skin color and detect the region as a face. Further, the face detection unit 3 may be configured to perform learning by a teacher signal using a neural network and detect a face-like region as a face. In addition, the face detection process by the face detection unit 3 may be realized by applying any existing technique.

また、顔検出部３は、撮像部２によって撮像された画像から複数の人の顔が検出された場合、特定の基準に従って処理の対象となる顔を決定しても良い。所定の基準とは、例えば顔の大きさ、顔の向き、画像中における顔の位置、不図示の測距手段（例えば、光線を対象物に対して発しその反射光を得ることにより測距する手段）による測距結果などである。 Further, when a plurality of human faces are detected from the image captured by the imaging unit 2, the face detection unit 3 may determine a face to be processed according to a specific criterion. The predetermined reference is, for example, the size of the face, the direction of the face, the position of the face in the image, a distance measuring unit (not shown) (for example, distance measurement is performed by emitting a light beam to an object and obtaining reflected light thereof). The distance measurement result by means).

顔検出部３は、検出された顔に関する顔情報を特徴抽出部４に対し渡す。 The face detection unit 3 passes face information regarding the detected face to the feature extraction unit 4.

〔特徴抽出部〕
まず、特徴抽出部４は、顔検出部３によって検出された顔における複数の特徴点を設定する（特徴点設定処理）。そして、特徴抽出部４は、特徴点設定処理によって設定された特徴点を元に、この被写体の顔の特徴量として各特徴点の特徴量を取得する（特徴量取得処理）。以下、特徴点設定処理及び特徴量取得処理について説明する。 (Feature extraction unit)
First, the feature extraction unit 4 sets a plurality of feature points in the face detected by the face detection unit 3 (feature point setting processing). Then, the feature extraction unit 4 acquires the feature quantity of each feature point as the feature quantity of the face of the subject based on the feature point set by the feature point setting process (feature quantity acquisition process). Hereinafter, feature point setting processing and feature amount acquisition processing will be described.

〈特徴点設定処理〉
特徴点設定処理において、まず特徴抽出部４は検出された顔の器官を検出する。顔の器官とは、例えば目，鼻，鼻孔，口（唇），眉，顎，額などである。特徴抽出部４は、いずれの顔の器官を検出しても良く、複数の器官を検出しても良い。例えば特徴抽出部４は、被写体の顔の両目及び口を検出するように構成される。 <Feature point setting process>
In the feature point setting process, the feature extraction unit 4 first detects the detected facial organs. Examples of facial organs include eyes, nose, nostrils, mouth (lips), eyebrows, jaws, and forehead. The feature extraction unit 4 may detect any facial organ or a plurality of organs. For example, the feature extraction unit 4 is configured to detect both eyes and mouth of the face of the subject.

次に、特徴抽出部４は、検出された顔の画像をグレースケールの画像に変換する。また、特徴抽出部４は、検出された顔の器官の位置関係に基づいて、検出された顔の画像の角度正規化やサイズ正規化を実施する。これらの処理をまとめて前処理と呼ぶ。また、画像をグレースケールに変換する処理は、顔検出部３における処理や特徴点設定処理におけるどの時点で実行されても良い。 Next, the feature extraction unit 4 converts the detected face image into a grayscale image. The feature extraction unit 4 also performs angle normalization and size normalization of the detected face image based on the positional relationship of the detected facial organs. These processes are collectively called preprocessing. Further, the process of converting the image into gray scale may be executed at any point in the process in the face detection unit 3 or the feature point setting process.

次に、特徴抽出部４は、検出された顔の器官（以下、「注視点」と呼ぶ：例えば両目や口を示す点）の位置に基づいて、複数の特徴点の位置を設定する。特徴抽出部４は、注視点に近いほど密に、注視点から離れるほど粗に特徴点を設定する。 Next, the feature extraction unit 4 sets the positions of a plurality of feature points based on the positions of the detected facial organs (hereinafter referred to as “gaze points”: points indicating both eyes and mouth, for example). The feature extraction unit 4 sets feature points more densely as it is closer to the gazing point and coarser as it is farther from the gazing point.

図２（ａ）は、顔検出部３によって検出された被写体の顔を示す図である。図２（ｂ）は、特徴点設定処理によって設定された複数の特徴点の例を示す図である。図２（ｂ）において、黒塗りの丸は注視点を示し、斜線の丸は注視点に基づいて設定された特徴点を示す。以下に説明する特徴量取得処理において、注視点が特徴点として取り扱われても良い。 FIG. 2A is a diagram illustrating the face of the subject detected by the face detection unit 3. FIG. 2B is a diagram illustrating an example of a plurality of feature points set by the feature point setting process. In FIG. 2B, a black circle indicates a gazing point, and a hatched circle indicates a feature point set based on the gazing point. In the feature amount acquisition process described below, a gazing point may be handled as a feature point.

このような特徴点設定処理は、例えば以下の論文に記載されたＲｅｔｉｎａサンプリングを適用することによって実現できる。 Such feature point setting processing can be realized, for example, by applying Retina sampling described in the following paper.

F. SmeraldiandJ. Bigun, “Facial features detection by saccadic exploration of
the Gabordecomposition”, International Conference on Image Processing, ICIP-98, Chicago, October 4-7, volume 3, pages 163-167, 1998. F. SmeraldiandJ. Bigun, “Facial features detection by saccadic exploration of
the Gabordecomposition ”, International Conference on Image Processing, ICIP-98, Chicago, October 4-7, volume 3, pages 163-167, 1998.

〈特徴量取得処理〉
特徴量取得処理において、特徴抽出部４は特徴点設定処理によって設定された各特徴点に対し、ガボールフィルタを畳み込む。即ち、特徴抽出部４は、各特徴点についてガボールウェーブレット変換（Gabor Wavelets Transformation：GWT）を実施する。図３は、特徴量取得処理において使用されるガボールフィルタの例（実部）である。特徴抽出部４は、図３に示されるような解像度と方位とを変化させた複数のガボールフィルタを畳み込むことにより、特徴点周辺における濃淡特徴の周期性と方向性とを特徴量として取得する。 <Feature acquisition processing>
In the feature amount acquisition process, the feature extraction unit 4 convolves a Gabor filter with each feature point set by the feature point setting process. That is, the feature extraction unit 4 performs Gabor Wavelets Transformation (GWT) for each feature point. FIG. 3 is an example (real part) of a Gabor filter used in the feature amount acquisition process. The feature extraction unit 4 obtains the periodicity and directionality of the light and dark features around the feature points as feature amounts by convolving a plurality of Gabor filters whose resolution and orientation are changed as shown in FIG.

数１は、ガボールフィルタを表す式である。ガボールフィルタの使用において、式中のｋとθとの値を変更することにより、濃淡特徴から任意の周期性と方向性とを特徴量として取得することが可能となる。

Equation 1 is an expression representing a Gabor filter. In the use of the Gabor filter, by changing the values of k and θ in the equation, it is possible to acquire arbitrary periodicity and directionality as feature quantities from the density features.

特徴抽出部４は、特徴量取得処理によって得られる各特徴点の特徴量を、属性推定部５へ渡す。なお、特徴抽出部４は、顔検出部３によって検出された顔のうち、所定の条件を満たす全ての顔について処理を行うように構成されても良い。所定の条件とは、例えば所定の大きさ以上の顔，所定の位置（例えば画像中央の領域），所定の向き（例えば正面を向いている）の顔などの条件である。 The feature extraction unit 4 passes the feature amount of each feature point obtained by the feature amount acquisition process to the attribute estimation unit 5. Note that the feature extraction unit 4 may be configured to perform processing on all faces that satisfy a predetermined condition among the faces detected by the face detection unit 3. The predetermined condition is, for example, a condition such as a face having a predetermined size or more, a face at a predetermined position (for example, an area at the center of the image), and a face having a predetermined direction (for example, facing the front).

〔属性推定部〕
属性推定部５は、特徴抽出部４から受け取る各特徴点の特徴量に基づいて、顔検出部３によって検出された被写体の属性情報を推定する。属性情報とは、その人に関する情報であり、例えば人種、年代、性別などが項目として含まれる。属性推定部５は、予め学習処理が完了しているパターン認識の識別器に対して各特徴点の特徴量を入力することにより、被写体の属性情報を推定する。属性推定部５は、パターン認識の識別器としてサポートベクターマシン（Support Vector Machine：ＳＶＭ）を用いて処理を行う。以下、属性情報のうち人種を推定する場合を例として、サポートベクターマシンについて説明する。 [Attribute estimation part]
The attribute estimation unit 5 estimates the attribute information of the subject detected by the face detection unit 3 based on the feature amount of each feature point received from the feature extraction unit 4. The attribute information is information related to the person, and includes, for example, race, age, gender, and the like as items. The attribute estimation unit 5 estimates the attribute information of the subject by inputting the feature amount of each feature point to a pattern recognition discriminator for which learning processing has been completed in advance. The attribute estimation unit 5 performs processing using a support vector machine (SVM) as a pattern recognition classifier. Hereinafter, the support vector machine will be described by taking as an example the case of estimating the race among the attribute information.

サポートベクターマシンは、二つのクラスに属する学習データの真ん中を通る超平面を識別の境界面とし、パターン認識を行う手法である。サポートベクターマシンにおける識別器は、数２に示される識別関数を用いることにより、入力されたデータ（ここでは全特徴点における特徴量）が二つのクラスのいずれに属するかを推定する。

The support vector machine is a pattern recognition method using a hyperplane passing through the middle of learning data belonging to two classes as an identification boundary surface. The discriminator in the support vector machine uses the discriminant function shown in Equation 2 to estimate which of the two classes the input data (here, feature quantities at all feature points) belongs.

数２において、ｌは、学習処理によって選別された学習データの個数、即ち属性推定部５による属性推定処理に用いられる学習データの個数を示す。αｉは、ラグランジュ乗数
を示す。ｘｉ，ｙｉは学習データを示す。ｙｉは“−１”又は“１”のいずれかを有し、ｘｉが二つのクラスのいずれに属するかを示す。ｂはバイアス項、即ちパラメタを示す。学習処理によりこれらの値が決定され、属性推定部５はその学習処理の結果を記憶する。 In Equation 2, l indicates the number of learning data selected by the learning process, that is, the number of learning data used in the attribute estimation process by the attribute estimation unit 5. αi represents a Lagrange multiplier. xi and yi indicate learning data. yi has either “−1” or “1”, and indicates whether xi belongs to one of the two classes. b represents a bias term, that is, a parameter. These values are determined by the learning process, and the attribute estimation unit 5 stores the result of the learning process.

また、数２において、Ｋはカーネル関数を示す。カーネル関数を使って、入力データをより高い次元に非線形に写像するサポートベクターマシンの非線形拡張が提案されており、これにより、より実問題に対して有効な識別器を構築することが可能である。このようなカーネル関数の代表的なものに、多項式カーネル（数３参照）やガウシアンカーネル（数４参照）等がある。属性推定部５において、いずれのカーネル関数が適用されても良い。

In Equation 2, K represents a kernel function. A nonlinear extension of the support vector machine has been proposed that uses a kernel function to map input data to higher dimensions in a non-linear manner, which makes it possible to construct classifiers that are more effective for real problems. . Typical examples of such a kernel function include a polynomial kernel (see Equation 3) and a Gaussian kernel (see Equation 4). In the attribute estimation unit 5, any kernel function may be applied.

サポートベクターマシンは、二つのクラスを識別する識別器を構成する学習法であり、複数種（三種以上）の人種を識別（推定）するためには複数のサポートベクターマシンを組み合わせる必要がある。属性推定部５は、二分木探索を適用することにより、サポートベクターマシンを用いたマルチクラスタリングを実現する。図４は、属性推定部５における人種推定処理に適用される二分木探索の例を示す図である。ここでは、被写体の人種がコーカソイド、ネグロイド、モンゴロイドのいずれであるかを推定する処理を例として説明する。設計に応じて他の人種がさらに候補として含まれるように構成されても良い。 The support vector machine is a learning method that constitutes a discriminator for discriminating two classes. In order to discriminate (estimate) a plurality of (three or more) races, it is necessary to combine a plurality of support vector machines. The attribute estimation unit 5 implements multi-clustering using a support vector machine by applying a binary tree search. FIG. 4 is a diagram illustrating an example of a binary tree search applied to the race estimation process in the attribute estimation unit 5. Here, processing for estimating whether the race of the subject is a caucasian, a negroid, or a mongoloid will be described as an example. Other races may be further included as candidates according to the design.

属性推定部５は、まず、サポートベクターマシンを用いて、被写体の人物がネグロイドであるか否かについて推定する。そして、被写体の人物がネグロイドではないと推定された場合には、属性推定部５は、被写体の人物がモンゴロイドであるか否か（又はコーカソイドであるか否か）について推定する。属性推定部５は、人種以外の属性情報の項目、即ち年代や性別についても同様に二分木探索を行うことによって各項目についての推定処理を行う。そして、属性推定部５は、推定された属性情報を属性情報決定部６へ渡す。なお、属性推定部５は、顔検出部３によって検出された顔のうち、所定の条件を満たす全ての顔について処理を行うように構成されても良い。また、属性推定部５は、人種以外の属性情報、例えば性別や年代についても同様の処理を行う事により推定を実施する。 The attribute estimation unit 5 first estimates whether or not the subject person is a negroid using a support vector machine. When it is estimated that the subject person is not a negroid, the attribute estimation unit 5 estimates whether the subject person is a mongoloid (or a caucasian). The attribute estimation unit 5 similarly performs an estimation process for each item by performing a binary tree search for items of attribute information other than race, that is, age and sex. Then, the attribute estimation unit 5 passes the estimated attribute information to the attribute information determination unit 6. The attribute estimation unit 5 may be configured to perform processing on all faces that satisfy a predetermined condition among the faces detected by the face detection unit 3. Moreover, the attribute estimation part 5 performs estimation by performing the same process also about attribute information other than race, for example, sex and age.

〔属性情報決定部〕
複数の顔が検出された場合、それぞれの顔について属性情報が得られる。このように複数の属性情報が得られた場合、それぞれの属性情報に含まれる各項目の値は異なってしまう場合がある。このような場合、出力制御部７へ出力される属性情報の各項目の値はそれぞれ一つに決定される必要がある。このため、属性情報決定部６は、属性情報決定処理を実施することにより、属性情報の各項目について、いずれの値を出力すべきか決定する。以下、属性情報決定処理の具体例について説明する。 [Attribute Information Determination Section]
When a plurality of faces are detected, attribute information is obtained for each face. When a plurality of attribute information is obtained in this way, the values of the items included in the respective attribute information may be different. In such a case, the value of each item of the attribute information output to the output control unit 7 needs to be determined as one. For this reason, the attribute information determination unit 6 determines which value to output for each item of the attribute information by performing the attribute information determination process. Hereinafter, a specific example of attribute information determination processing will be described.

まず属性情報決定部６は、各項目について、各値を有する顔ごとに集合を生成する。次に、属性情報決定部６は、各集合についてその注目度を取得する。注目度とは、各集合に含まれる顔の顔情報などに基づいて決定される値である。例えば、注目度は、各集合に含まれる顔の数、顔の大きさ、顔の向き、顔の位置などに基づいて決定される。そして、属
性情報決定部６は、最も注目度の高い集合における値をその項目の値として決定する。属性情報決定部６は、各項目について値を決定することにより一つの属性情報を生成し、その属性情報を出力制御部７へ渡す。 First, the attribute information determination unit 6 generates a set for each face having each value for each item. Next, the attribute information determination unit 6 acquires the attention level for each set. The attention level is a value determined based on face information of faces included in each set. For example, the degree of attention is determined based on the number of faces included in each set, the face size, the face orientation, the face position, and the like. Then, the attribute information determination unit 6 determines the value in the set with the highest degree of attention as the value of the item. The attribute information determination unit 6 generates one attribute information by determining a value for each item, and passes the attribute information to the output control unit 7.

図５は、属性推定部５によって推定された属性情報の例を示す表である。以下、注目度の具体例として、顔の数が多いほど注目度が高いと仮定し、図５に示される例を用いて、属性情報決定処理の具体例を説明する。属性情報決定部６は、人種、年代、性別の各項目の各値について集合を生成する。コーカソイドの集合は（Ａ）、ネグロイドの集合は（Ｂ，Ｅ）、モンゴロイドの集合は（Ｃ，Ｄ，Ｆ）となる。同様に、１０〜２０の集合は（Ｅ）、２０〜３０の集合は（Ａ，Ｂ，Ｃ，Ｄ，Ｆ）、男の集合は（Ａ，Ｆ）、女の集合は（Ｂ，Ｃ，Ｄ，Ｅ）となる。次に、属性情報決定部６は、各集合の注目度を取得する。この場合、各集合の注目度は、その集合に含まれる顔の数として得られるため、（コーカソイド，ネグロイド，モンゴロイド）＝（１，２，３），（１０〜２０，２０〜３０）＝（１，５），（男，女）＝（２，４）として得られる。そしてこの結果、属性情報決定部６は、（モンゴロイド，２０〜３０，女）という属性情報を生成し、この属性情報を出力制御部７へ渡す。なお、顔が一つしか検出されなかった場合、属性情報決定部６は、属性情報決定処理を実行せずにこの顔に関する属性情報を出力制御部７へ渡しても良い。 FIG. 5 is a table showing an example of attribute information estimated by the attribute estimation unit 5. Hereinafter, as a specific example of the attention level, it is assumed that the higher the number of faces, the higher the attention level, and a specific example of attribute information determination processing will be described using the example shown in FIG. The attribute information determination unit 6 generates a set for each value of each item of race, age, and sex. The Caucasian set is (A), the Neggloid set is (B, E), and the Mongoloid set is (C, D, F). Similarly, the set of 10-20 is (E), the set of 20-30 is (A, B, C, D, F), the set of men is (A, F), the set of women is (B, C, D, E). Next, the attribute information determination unit 6 acquires the attention level of each set. In this case, since the attention degree of each set is obtained as the number of faces included in the set, (Caucasoid, Negroid, Mongoloid) = (1,2,3), (10-20, 20-30) = ( 1,5), (male, female) = (2,4). As a result, the attribute information determination unit 6 generates attribute information (Mongoloid, 20 to 30, female) and passes this attribute information to the output control unit 7. When only one face is detected, the attribute information determination unit 6 may pass the attribute information regarding the face to the output control unit 7 without executing the attribute information determination process.

〔出力制御部〕
出力制御部７は、属性情報決定部６から渡される属性情報に基づいて、出力部８に出力する内容を制御する。出力制御部７は、属性情報に含まれる年代に応じて、出力部８から出力される表示や音声を制御する。例えば年代が０〜１０である場合、表示される漢字にはふりがなが付されるように制御されても良いし、漢字を用いた表示を行わないように制御されても良いし、アニメーションなどを付加して表示するように構成されても良い。また、例えば年代が６０以上である場合、表示される文字の大きさが通常時よりも大きく表示されるように制御されても良いし、出力される音声のボリュームが通常時よりも大きくなるように制御されても良い。 (Output control unit)
The output control unit 7 controls the contents output to the output unit 8 based on the attribute information passed from the attribute information determination unit 6. The output control unit 7 controls the display and sound output from the output unit 8 according to the age included in the attribute information. For example, when the age is from 0 to 10, the displayed kanji may be controlled so that the furigana is added, may be controlled not to display using the kanji, You may comprise so that it may add and display. For example, when the age is 60 or more, the size of the displayed character may be controlled to be displayed larger than the normal time, or the volume of the output voice may be larger than the normal time. It may be controlled.

また、出力制御部７は、属性情報に含まれる性別に応じて、表示される文字の色や絵柄、出力される音声の性別などを制御するように構成されても良い。 Further, the output control unit 7 may be configured to control the color and design of displayed characters, the sex of output audio, and the like according to the gender included in the attribute information.

また、出力制御部７は、例えばこの属性情報に含まれる人種に応じて、出力部８に出力される言語候補を選択しても良い。言語候補とは、ユーザが表示や使用を望む言語を選択するための候補である。例えばモンゴロイドに対応する言語候補は韓国語、日本語、タイ語、中国語、台湾語、北京語、アラビア語、モンゴル語、ラオ語などである。また、例えばコーカソイドに対応する言語候補は、英語、フランス語、ドイツ語、イタリア語、スペイン語、オランダ語、ロシア語、ポルトガル語、ギリシャ語などである。また、例えばネグロイドに対応する言語候補は、アラビア語、ポルトガル語、英語、フランス語、スワヒリ語などである。この場合、不図示の入力部を介して、ユーザによって言語候補の中から使用言語が選択される。そして、出力制御部７は、選択された使用言語を用いた出力を行うように出力部８を制御する。また、出力制御部７は、ユーザによる選択に関わらず使用言語を決定し、この使用言語を用いた出力を行うように出力部８を制御するように構成されても良い。 Further, the output control unit 7 may select language candidates to be output to the output unit 8 according to the race included in the attribute information, for example. A language candidate is a candidate for selecting a language that the user desires to display or use. For example, language candidates corresponding to Mongoloid are Korean, Japanese, Thai, Chinese, Taiwanese, Mandarin, Arabic, Mongolian, Lao and the like. For example, language candidates corresponding to the Caucasian are English, French, German, Italian, Spanish, Dutch, Russian, Portuguese, Greek, and the like. For example, language candidates corresponding to negroid are Arabic, Portuguese, English, French, Swahili, and the like. In this case, the language to be used is selected from language candidates by the user via an input unit (not shown). Then, the output control unit 7 controls the output unit 8 to perform output using the selected language used. Further, the output control unit 7 may be configured to determine the language used regardless of the selection by the user, and to control the output unit 8 so as to perform output using the language used.

〔出力部〕
出力部８は、ＣＲＴ（Cathode Ray Tube）や液晶ディスプレイ等の表示装置や、スピーカ等の音声出力装置などを用いて構成される。出力部８は、タッチパネル等を用いて構成されても良い。出力部８は、出力制御部７による制御内容に応じて、文字や画像の表示や音声出力などを行う。 [Output section]
The output unit 8 is configured using a display device such as a CRT (Cathode Ray Tube) or a liquid crystal display, an audio output device such as a speaker, or the like. The output unit 8 may be configured using a touch panel or the like. The output unit 8 performs display of characters and images, audio output, and the like according to the contents of control by the output control unit 7.

［動作例］
図６は、端末装置１の動作例を示すフローチャートである。図６を用いて、端末装置１の動作例について説明する。まず、撮像部２がユーザを撮像すると（Ｓ０１）、撮像された画像に対し顔検出部３が顔検出処理を実行しユーザの顔を検出する（Ｓ０２）。次に、特徴抽出部４及び属性推定部５は、顔検出処理によって検出された顔について、属性推定処理を実行する（Ｓ０３）。 [Operation example]
FIG. 6 is a flowchart illustrating an operation example of the terminal device 1. An operation example of the terminal device 1 will be described with reference to FIG. First, when the imaging unit 2 captures an image of the user (S01), the face detection unit 3 executes face detection processing on the captured image to detect the user's face (S02). Next, the feature extraction unit 4 and the attribute estimation unit 5 perform attribute estimation processing on the face detected by the face detection processing (S03).

図７は、属性推定処理の処理例を示すフローチャートである。図７を用いて、属性推定処理の処理例について説明する。属性推定処理が開始すると、特徴抽出部４は、顔検出部３によって検出された顔における注視点、即ち顔の器官を検出する（Ｓ０８）。次に、特徴抽出部４は、検出された顔の画像に対して注視点の位置などに基づいた前処理を実行し（Ｓ０９）、特徴点を設定する（Ｓ１０）。そして、特徴抽出部４は、各特徴点に対しガボールフィルタを畳み込むことにより特徴量を取得する（Ｓ１１）。 FIG. 7 is a flowchart illustrating an example of attribute estimation processing. An example of attribute estimation processing will be described with reference to FIG. When the attribute estimation process starts, the feature extraction unit 4 detects a gaze point in the face detected by the face detection unit 3, that is, a facial organ (S08). Next, the feature extraction unit 4 performs pre-processing based on the position of the point of sight on the detected face image (S09), and sets feature points (S10). And the feature extraction part 4 acquires a feature-value by convolving a Gabor filter with respect to each feature point (S11).

特徴抽出部４が特徴量を取得すると、属性推定部５は、取得された特徴量を入力としてサポートベクターマシンを用いることにより被写体の属性（人種，年代，性別など）を推定する（Ｓ１２）。そして、属性推定部５は、属性推定の結果を出力する（Ｓ１３）。 When the feature extraction unit 4 acquires the feature amount, the attribute estimation unit 5 estimates the attribute (race, age, sex, etc.) of the subject by using the support vector machine with the acquired feature amount as an input (S12). . And the attribute estimation part 5 outputs the result of attribute estimation (S13).

図６を用いてＳ０３以降の処理について説明する。属性推定処理（Ｓ０３）の後、属性情報決定部６は、顔検出部３によって検出された顔の数に応じてその後の処理を決定する。属性情報決定部６は、顔の数が１以下であった場合（Ｓ０４：＜＝１）、その旨を出力制御部７へ伝える。その後、出力制御部７は、顔が一つ検出されている場合にはその顔の属性情報に基づいた出力を行うように、顔が検出されていない場合には顔が検出されなかった場合用に用意された出力（例えば、通常の音量や通常の文字の大きさで行われる出力や、全ての言語候補を表示するような出力など）を行うように、出力部８を制御する（Ｓ０６）。そして、出力部８は、制御内容に従って出力を実行する（Ｓ０７）。 The processes after S03 will be described with reference to FIG. After the attribute estimation process (S03), the attribute information determination unit 6 determines the subsequent process according to the number of faces detected by the face detection unit 3. If the number of faces is 1 or less (S04: <= 1), the attribute information determination unit 6 notifies the output control unit 7 to that effect. Thereafter, the output control unit 7 performs output based on the attribute information of the face when one face is detected. For the case where no face is detected when the face is not detected. The output unit 8 is controlled so as to perform the output (for example, output performed at normal volume or normal character size, or output that displays all language candidates) (S06). . And the output part 8 performs an output according to the control content (S07).

Ｓ０４の処理の説明に戻る。属性情報決定部６は、顔の数が１より多い場合（Ｓ０４：＞１）、属性情報決定処理を実行する（Ｓ０５）。図８は、属性情報決定処理の処理例を示すフローチャートである。図８を用いて、属性情報決定処理の処理例について説明する。属性情報決定処理が開始すると、属性情報決定部６は、各項目における集合を取得する（Ｓ１４）。次に、属性情報決定部６は、各集合における注目度を取得する（Ｓ１５）。そして、各集合における注目度に従って各項目の値を決定する（Ｓ１６）。属性情報決定部６は、例えば、最大の注目度を有する集合の値をその項目の値として決定しても良い。そして、属性情報決定部６は、各項目について決定した値を有する属性情報を生成し、その属性情報を出力する（Ｓ１７）。 Returning to the description of the process in S04. When the number of faces is greater than 1 (S04:> 1), the attribute information determination unit 6 executes an attribute information determination process (S05). FIG. 8 is a flowchart illustrating an example of attribute information determination processing. A processing example of attribute information determination processing will be described with reference to FIG. When the attribute information determination process starts, the attribute information determination unit 6 acquires a set in each item (S14). Next, the attribute information determination unit 6 acquires the attention level in each set (S15). Then, the value of each item is determined according to the degree of attention in each set (S16). For example, the attribute information determination unit 6 may determine the value of the set having the highest degree of attention as the value of the item. And the attribute information determination part 6 produces | generates the attribute information which has the value determined about each item, and outputs the attribute information (S17).

図６を用いてＳ０５以降の処理について説明する。属性情報決定処理（Ｓ０５）の処理の後、出力制御部７は、属性情報決定処理の結果出力された属性情報に基づいた出力を行うように出力部８を制御する（Ｓ０６）。そして、出力部８は、制御内容に従って出力を実行する（Ｓ０７）。 The processes after S05 will be described with reference to FIG. After the attribute information determination process (S05), the output control unit 7 controls the output unit 8 to perform output based on the attribute information output as a result of the attribute information determination process (S06). And the output part 8 performs an output according to the control content (S07).

［作用／効果］
端末装置１は、ユーザの顔画像からユーザの属性情報を推定し、その推定結果である属性情報に基づいて出力内容を制御する。例えば、端末装置１は、属性情報としてユーザの人種を推定し、その人種から推測されるユーザの使用言語の選択候補（言語候補）を提示するように構成されても良い。この場合、ユーザは、多くの言語候補の中からではなく、人種推定処理や出力制御処理などによりある程度絞られた言語候補の中から使用言語を選択することが可能となる。従って、使用言語の選択に要するユーザの労力を軽減させることが可能となる。また、例えば、端末装置１は、属性情報としてユーザの年代を推定し、
その年代から推測されるユーザの状態に応じた出力を行うように構成されても良い。具体的には、推測される年代が高齢の場合には、表示される文字を大きく制御しても良いし、出力される音量を上げるように制御しても良い。また、推測される年代が幼い場合には、表示される文字に難しい漢字を用いないように制御しても良いし、ふりがなを振るように制御しても良い。この場合、ユーザは、年代に応じて異なる身体能力や知的能力に応じた出力を受ける事が可能となり、ユーザインタフェースの利便性が向上される。また、例えば、端末装置１は、属性情報としてユーザの性別を推定し、その性別から推測されるユーザの好みに応じた出力を行うように構成されても良い。この場合、ユーザにとってより親しみやすい出力を受ける事が可能となる。 [Action / Effect]
The terminal device 1 estimates user attribute information from the user's face image, and controls the output content based on the attribute information that is the estimation result. For example, the terminal device 1 may be configured to estimate a user's race as attribute information and present a user's language selection candidates (language candidates) estimated from the race. In this case, the user can select a language to be used from language candidates narrowed down to some extent by race estimation processing, output control processing, and the like, not from many language candidates. Therefore, it is possible to reduce the user's labor required for selecting the language to be used. For example, the terminal device 1 estimates the user's age as attribute information,
You may comprise so that the output according to the state of the user estimated from the age may be performed. Specifically, when the estimated age is elderly, the displayed characters may be controlled to be large, or may be controlled to increase the output volume. In addition, when the estimated age is young, it may be controlled not to use difficult kanji for the displayed characters, or may be controlled to shake the furigana. In this case, the user can receive output corresponding to different physical abilities and intellectual abilities according to the age, and the convenience of the user interface is improved. Further, for example, the terminal device 1 may be configured to estimate the user's gender as the attribute information and perform output according to the user's preference estimated from the gender. In this case, it is possible to receive an output that is more familiar to the user.

また、端末装置１は、ユーザの属性を、ユーザの顔画像から推定する。このため、騒音の多い場所などにも端末装置１を設置することが可能となり、このような場所でも有効に動作させることができる。 Moreover, the terminal device 1 estimates a user's attribute from a user's face image. For this reason, the terminal device 1 can be installed in a noisy place and the like, and can be operated effectively in such a place.

また、端末装置１では、属性情報などが登録された記録媒体を用いる必要なく、ユーザの属性が推定される。このため、ユーザは端末装置１を操作するにあたって、自身の属性情報などが記録された記録媒体を用いることなく、自身の属性情報に応じたユーザインタフェースの提供を受ける事が可能となる。 Further, in the terminal device 1, the user's attribute is estimated without using a recording medium in which attribute information or the like is registered. For this reason, when operating the terminal device 1, the user can be provided with a user interface corresponding to his / her attribute information without using a recording medium in which his / her attribute information is recorded.

［変形例］
属性情報決定部６は、顔検出部３によって検出された顔の数ではなく、属性推定処理の対象となった顔の数を用いてＳ０４の判断を行うように構成されても良い。 [Modification]
The attribute information determination unit 6 may be configured to perform the determination in S04 using the number of faces subjected to the attribute estimation process instead of the number of faces detected by the face detection unit 3.

また、出力制御部７は、出力される文章や言葉や画像などの内容を、属性情報に応じて制御するように構成されても良い。例えば、出力制御部７は、ユーザの年代に応じて、その年代に親しみやすい又は理解しやすい内容を出力するように制御しても良い。例えば、年代によって親しみのある名称や理解可能な言葉に差が生じることは明らかである。また、例えば観光案内などにおいて、年代や性別によって案内されたいとユーザが望むであろう内容は異なる。従って、このような制御が行われることにより、ユーザはより有効な情報を得ることが可能となる。 Moreover, the output control part 7 may be comprised so that the contents, such as an output sentence, a word, and an image, may be controlled according to attribute information. For example, the output control unit 7 may perform control so as to output contents that are easy to understand or understand in accordance with the age of the user. For example, it is clear that there are differences in familiar names and understandable words depending on the age. Also, for example, in tourist information, the contents that the user would like to be guided by age and gender are different. Therefore, by performing such control, the user can obtain more effective information.

端末装置の機能ブロックを示す図である。It is a figure which shows the functional block of a terminal device. 特徴点の設定例を示す図である。It is a figure which shows the example of a setting of a feature point. ガボールフィルタの例を示す図である。It is a figure which shows the example of a Gabor filter. 二分木探索の例を示す図である。It is a figure which shows the example of a binary tree search. 属性推定部によって推定された属性情報の例を示す表である。It is a table | surface which shows the example of the attribute information estimated by the attribute estimation part. 端末装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of a terminal device. 属性推定処理における動作例を示すフローチャートである。It is a flowchart which shows the operation example in an attribute estimation process. 属性情報決定処理における動作例を示すフローチャートである。It is a flowchart which shows the operation example in an attribute information determination process.

Explanation of symbols

１端末装置
２撮像部
３顔検出部
４特徴抽出部
５属性推定部
６属性情報決定部
７出力制御部
８出力部 DESCRIPTION OF SYMBOLS 1 Terminal device 2 Imaging part 3 Face detection part 4 Feature extraction part 5 Attribute estimation part 6 Attribute information determination part 7 Output control part 8 Output part

Claims

Imaging means for imaging a user's face;
Detecting means for detecting a user's face from an image captured by the imaging means;
Attribute estimation means for estimating user attributes from the face image detected by the detection means;
Output control means for controlling the content and mode of the output according to the estimation result by the attribute estimation means;
A terminal device comprising: output means for outputting according to the control content of the output control means.

The attribute includes information about the race of the user,
The terminal device according to claim 1, wherein the output control unit controls the output unit to perform output in a language corresponding to the estimated race.

The attribute includes information about the race of the user,
The output control means controls the output means to output a plurality of language candidates according to the estimated race,
The terminal device further includes a selection unit for the user to select a language used in the output unit from the plurality of language candidates.
The terminal device according to claim 1, wherein the output control unit controls the output unit to perform output in the selected language after the language is selected through the selection unit.

The attribute includes information about the user's age,
When the estimated age can be determined to be old, the output control means displays the displayed characters larger than normal and / or outputs the output voice at a volume higher than normal. The terminal device according to claim 1, wherein the output unit is controlled to do so.

When the estimation processing for a plurality of faces is performed by the attribute estimation unit, the output control unit further includes an estimation result selection unit that selects which of the plurality of types of estimation results should be used for processing. ,
The terminal device according to claim 1, wherein the output control unit performs processing using the estimation result selected by the estimation result selection unit.

Imaging a user's face;
Detecting a user's face from the captured image;
Estimating user attributes from detected face images;
Controlling the content and mode of the output according to the estimation result;
A program for causing an information processing apparatus to execute the step of outputting according to the control content.

An information processing device imaging a user's face;
An information processing device detecting a user's face from the captured image;
An information processing device estimating a user attribute from the detected face image;
An information processing device controlling the content and mode of output according to the estimation result; and
A method in which an information processing apparatus performs output in accordance with the control content.