JP5180116B2

JP5180116B2 - Nationality determination device, method and program

Info

Publication number: JP5180116B2
Application number: JP2009032892A
Authority: JP
Inventors: 素寺横
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2009-02-16
Filing date: 2009-02-16
Publication date: 2013-04-10
Anticipated expiration: 2029-02-16
Also published as: JP2010191530A

Description

本発明は、対象人物の属性に応じた情報提供を行う技術に関する。 The present invention relates to a technique for providing information according to attributes of a target person.

特許文献１では、複数種類の人物属性識別用データに対応させてこれらを個別に識別するための人物属性識別用モデルを予め用意しておき、人物属性識別用データ作成処理手段により、カメラで人物を撮影して得られた処理対象のフレーム画像から複数種類の人物属性識別用データを作成した後、指標値算出処理手段３７により、人物属性識別用モデルを用いて各人物属性識別用データについて尤度等の指標値を個別に算出し、その後、識別結果情報算出処理手段により、複数の指標値を統合する。 In Patent Document 1, a person attribute identification model for individually identifying these data corresponding to a plurality of types of person attribute identification data is prepared in advance. After creating a plurality of types of person attribute identification data from the frame image to be processed obtained by shooting the image, the index value calculation processing unit 37 uses the person attribute identification model to estimate each person attribute identification data. Index values such as degrees are calculated individually, and thereafter, a plurality of index values are integrated by the identification result information calculation processing means.

特許文献２では、顔認識部は、カメラの前の人物を撮影し、その画像内から顔部分を検出して、検出された顔画像が年齢・性別・職業・国籍・出身地・表情などの属性に関し、いずれのカテゴリの顔特徴に類似しているかを判定する。演出統括部は、人物を含む来場者からの操作に関する入力の処理、来場者の履歴情報ファイルの管理、ランキング集計、顔認識部への撮影命令の送信、メール送信などのシステム全体の統括制御と、来場者への操作に関する指示の表示、属性判定結果の表示、ランキング表示などの演出表示制御を行う。来場者情報蓄積部は、来場者別に作成された履歴情報ファイルおよび撮影顔画像ファイルを記憶領域に蓄積する。 In Patent Document 2, the face recognition unit shoots a person in front of the camera, detects a face portion from the image, and the detected face image includes age, sex, occupation, nationality, birthplace, facial expression, etc. With respect to the attribute, it is determined which category the facial features are similar to. The production control unit is responsible for overall control of the entire system, such as input processing related to operations from visitors including people, management of visitor history information files, ranking aggregation, transmission of shooting commands to the face recognition unit, and email transmission. The display of instructions regarding operations to visitors, the display of attribute determination results, and the presentation display control such as ranking display are performed. The visitor information storage unit stores history information files and photographed face image files created for each visitor in the storage area.

特許文献３および４は従来の複数言語音声認識システムの一例である。また、特許文献５は従来の服装認識の一例である。 Patent Documents 3 and 4 are examples of conventional multilingual speech recognition systems. Patent Document 5 is an example of conventional clothing recognition.

特許文献６は、顔の特徴点の検出技術の一例であり、検出対象画像上の所定対象物の複数種類の特徴点を、マシンラーニングで生成された許容度が大きい第１の特徴点検出器群と、統計的学習により生成された、上記複数種類の特徴点同士の位置関係を規定する、許容度が大きい第１の位置関係モデルとを用いて、当該位置関係で拘束された上記複数種類の特徴点を暫定的に決定した後、その暫定的な各特徴点の近傍で、マシンラーニングで生成された許容度が小さい第２の特徴点検出器群と、統計的学習により生成された、許容度が小さい第２の位置関係モデルとを用いて、当該位置関係で拘束された上記複数種類の最終的な特徴点を決定する。 Patent Document 6 is an example of a technique for detecting facial feature points, and a first feature point detector having a high tolerance generated by machine learning for a plurality of types of feature points of a predetermined object on a detection target image. The plurality of types constrained by the positional relationship using a group and a first positional relationship model having a high tolerance that defines the positional relationship between the multiple types of feature points generated by statistical learning After tentatively determining the feature points, a second feature point detector group having a low tolerance generated by machine learning in the vicinity of each tentative feature point and a statistical learning, Using the second positional relationship model having a low tolerance, the plurality of types of final feature points constrained by the positional relationship are determined.

特許文献７は、人体領域の抽出の一例であり、画像中の顔領域Ｆを検出し、検出された顔領域Ｆの位置情報から人体領域が含まれるであろう候補領域Ｃを決定し、その候補領域を構成する各単位領域が人体領域を含むか否かを判断し、人体領域を含むと判断された単位領域の集合を人体領域が含まれると推測される推測領域Ｅとして決定し、その推測領域Ｅ中の人体領域Ｈｕを抽出し、推測領域Ｅ中の輪郭周辺領域に人体領域Ｈｕが存在しないと判断されるまで、推測領域Ｅを拡張更新し、拡張更新された推測領域Ｅ中の人体領域を抽出することを繰り返し行う。 Patent Document 7 is an example of extraction of a human body region, detects a face region F in an image, determines a candidate region C that will include a human body region from position information of the detected face region F, and It is determined whether or not each unit area constituting the candidate area includes a human body area, and a set of unit areas determined to include the human body area is determined as an inferred area E that is estimated to include the human body area. The human body region Hu in the estimation region E is extracted, and the estimation region E is expanded and updated until it is determined that the human body region Hu does not exist in the contour peripheral region in the estimation region E. Repeatedly extracting the human body region.

特許文献８は、顔抽出の一例であり、顔の形状、目、口の輪郭を、テンプレートとのマッチング度に応じて抽出する。 Patent Document 8 is an example of face extraction, and extracts the shape of a face, the contours of eyes and mouth according to the degree of matching with a template.

特許文献９は、瞳領域抽出の一例であり、エッジ検出、形状パターン検出、位置情報等を用いて目を抽出し、この目の画像データの輝度ヒストグラムから低輝度領域を抽出し、抽出した低輝度領域を収縮処理して瞳の領域を抽出する。 Patent Document 9 is an example of pupil region extraction. Eyes are extracted using edge detection, shape pattern detection, position information, and the like, and a low luminance region is extracted from the luminance histogram of the eye image data. The brightness area is contracted to extract the pupil area.

特許文献１０は、鼻形状抽出の一例であり、所定の検索領域内のエッジ映像上で所定の形状のテンプレートを整合する。大きい整合値を有する検出された候補のうち対称条件を満足する対を選択して鼻翼面位置として決定する。各鼻側面を多項式曲線によって表現するが、曲線は検出された鼻翼面テンプレート及び所定の係数で鼻翼面及び目のコーナー間に補間された３点に合致する。結局、所定の補間係数を使用して鼻先と低い鼻の曲面が位置する。 Patent Document 10 is an example of nose shape extraction, and a template having a predetermined shape is matched on an edge video in a predetermined search area. Among the detected candidates having a large matching value, a pair satisfying the symmetry condition is selected and determined as the nose wing surface position. Each nasal flank is represented by a polynomial curve, which matches the detected nasal wing surface template and three points interpolated between the nasal wing surface and the corners of the eyes by a predetermined factor. Eventually, the nose tip and the lower nose curved surface are located using a predetermined interpolation coefficient.

また特許文献１０は、口形状抽出の一例であり、まず、口の方形を初期化する。非皮膚色画素のモーメントを分析することによって初期方形をより精密な境界ボックスに縮少させる。唇関数映像を構成して、精製された境界ボックス内の画素を利用して画素が唇または皮膚に属する確率を測定する。高い唇関数値を有する画素の２次中心モーメントを測定することによって楕円を有して、唇の外郭線を初期化する。外力及び内力によって唇の外郭線を動的に移動させる。移動結果点に多項式を近似させて曲線表現式を生成する。 Patent Document 10 is an example of mouth shape extraction. First, a mouth square is initialized. The initial square is reduced to a more precise bounding box by analyzing the moments of non-skin color pixels. A lip function image is constructed to measure the probability that a pixel belongs to the lips or skin using the pixels in the refined bounding box. Initialize the outline of the lips with an ellipse by measuring the secondary central moment of the pixel with a high lip function value. The outline of the lips is dynamically moved by external force and internal force. A curve expression is generated by approximating a polynomial to the movement result point.

特許文献１１は、髪形検出の一例を示しており、髪形モデル形状パターンと画像のマッチングにより髪形を識別する。 Patent Document 11 shows an example of hairstyle detection, and a hairstyle is identified by matching a hairstyle model shape pattern with an image.

特許文献１２は、画像から検出された顔から年齢を推定する一例を示しており、性別及び年代が異なる複数の参照人物について異なる複数の顔向き方向から撮像された参照顔画像毎の参照特徴ベクトルの内から、性別及び年代が不明な顧客の顔を任意の顔向き方向から撮像した対象顔画像の対象特徴ベクトルに類似するものを判別する。この参照特徴ベクトルに対する参照顔画像が撮像された顔向き方向の範囲を、対象顔画像が撮像された顔向き方向の範囲であると推定する。さらに、推定された範囲の顔向き方向で撮像された参照顔画像の特徴ベクトルの内から対象特徴ベクトルに類似するものを判別することで、その参照特徴ベクトルに対応する参照顔画像の参照人物の性別あるいは年代が顧客の性別あるいは年代であると推定する。 Patent Document 12 shows an example of estimating age from a face detected from an image, and a reference feature vector for each reference face image captured from a plurality of different face orientation directions for a plurality of reference persons having different genders and ages. Among them, those similar to the target feature vector of the target face image obtained by capturing the face of the customer whose sex and age are unknown from any face direction are determined. The range in the face direction in which the reference face image is captured with respect to the reference feature vector is estimated to be the range in the face direction in which the target face image is captured. Further, by determining a feature vector similar to the target feature vector from the feature vectors of the reference face image captured in the estimated face orientation direction, the reference person of the reference face image corresponding to the reference feature vector is identified. Estimate that the gender or age is that of the customer.

特許文献１３は、画像から文字認識と認識文字の言語を推定する技術の一例であり、カメラ付き携帯電話は、複数の言語のいずれかに含まれる文字の画像を文字認識して文字コードに変換する認識処理部とを有している。当該認識処理部は、上記複数の言語のそれぞれについて、文字の認識結果が当該言語でありそうだと推定したことを示す推定履歴情報を履歴情報記憶部に格納する言語可能性値推定部と、認識対象とする文字の属する言語が当該言語であると指定する操作を受け付けたことを示す操作履歴情報を履歴情報記憶部に格納する操作部とを備え、上記履歴情報記憶部の履歴情報を参照して、文字の画像を文字コードに変換する。 Patent Document 13 is an example of a technique for estimating character recognition and the language of a recognized character from an image. A mobile phone with a camera recognizes a character image included in one of a plurality of languages and converts it into a character code. A recognition processing unit. The recognition processing unit includes, for each of the plurality of languages, a language possibility value estimation unit that stores estimated history information indicating that a character recognition result is estimated to be the language, in a history information storage unit, An operation unit that stores in the history information storage unit operation history information indicating that an operation for designating that the language to which the target character belongs belongs is the language, and refers to the history information in the history information storage unit. To convert the character image into a character code.

特許文献１４は、音声から話者の性別・年齢に対応した音響モデルを選定する技術の一例であり、ＭＦＣＣなど公知の音声認識特徴量への変換処理を行った後、性別もしくは年齢層別にカテゴライズされた複数の音響モデルと、言語モデルを用いて、マッチング処理を行う。ここで、各音響モデルを用いてマッチングした場合の認識結果のうち、上位Ｎ個の尤度もしくは単語信頼度の最も平均値が高い音響モデルを選定する。 Patent Document 14 is an example of a technique for selecting an acoustic model corresponding to a speaker's sex and age from speech, and after performing conversion processing to a known speech recognition feature such as MFCC, categorization is performed according to gender or age group. The matching processing is performed using the plurality of acoustic models and the language model. Here, among the recognition results when matching is performed using each acoustic model, an acoustic model having the highest average value of the top N likelihoods or word reliability is selected.

特開２００５−２５０７１２号公報JP-A-2005-250712 特開２００７−８００５７号公報Japanese Patent Laid-Open No. 2007-80057 特開２００１−１８８５５６号公報JP 2001-188556 A 特開平１０−１１６０９３号公報JP-A-10-116093 特開２００７−２７２８９６号公報JP 2007-272896 A 特開２００８−３７４９号公報JP 2008-3749 A 特開２００８−１５６４１号公報JP 2008-15641 A 特開２００１−２０９８０２号公報JP 2001-209802 A 特開２００５−１２２２８７号公報JP-A-2005-122287 特開２００５−７８６４６号公報JP-A-2005-78646 特開平１１−１６９３５７号公報JP-A-11-169357 特開２００８−２８２０８９号公報JP 2008-282089 A 特開２００６−３３１３５４号公報JP 2006-331354 A 特開２００８−９６５７７号公報JP 2008-96577 A

様々な国の人が訪れる空港や大都市のメインストリートで実施されるデジタルサイネージのように不特定多数の人に向けた情報を発信する場合、その個人の国籍（本願明細書では、純粋に法的な国籍ではなく、個人の属する何らかの国際的カテゴリーを表す情報とする）に合わせて情報の内容を選択あるいは変更できると、情報の伝達が効果的である。 When sending information to an unspecified number of people, such as digital signage at the airports visited by people from various countries and the main streets of large cities, the nationality of the individual (in this specification, Information is effective if it can be selected or changed according to the information that represents an international category to which an individual belongs rather than a specific nationality.

特許文献１〜５では、性別、年齢、顔のサンプル類似度、言語認識、服装認識を各々行っているが、各要素単独のみで情報の伝達内容を選ぶには正確性に欠ける。 In Patent Documents 1 to 5, gender, age, facial sample similarity, language recognition, and clothing recognition are performed, respectively, but it is not accurate to select information transmission contents by each element alone.

本発明は、個人の人種・形質のような生物学的特性、民族といった文化人類学的特性、発声した言語の属する母国語ないし方言といった言語学的特性その他の個人の外部的特徴を客観的に測定し、その測定結果に基づいて個人の国籍を自動的・総合的に判定し、それに応じたアクションを実行する。 The present invention objectively analyzes biological characteristics such as individual race and trait, cultural anthropological characteristics such as ethnicity, linguistic characteristics such as native language or dialect of the spoken language, and other external characteristics of the individual. , And automatically and comprehensively determine the nationality of the individual based on the measurement results, and perform actions accordingly.

国籍判定装置は、画像を入力する画像入力部と、画像入力部の入力した画像から人物の属性に関する特徴量を複数抽出し、抽出された個々の特徴量に基づいて、個々の特徴量に対応する人物の国籍を個別に判定する画像国籍判定部と、画像国籍判定部が個別に判定した個々の特徴量に対応する人物の国籍に基づいて、最終的な人物の国籍を判定する最終国籍判定部と、を備える。 The nationality determination device extracts an image input unit for inputting an image, and a plurality of feature amounts related to the attributes of the person from the image input by the image input unit, and handles each feature amount based on the extracted individual feature amounts Image nationality determination unit that individually determines the nationality of the person to be performed, and final nationality determination that determines the nationality of the final person based on the nationality of the person corresponding to each feature amount individually determined by the image nationality determination unit A section.

好ましくは、個々の特徴量は、画像から検出された顔領域の特徴量および顔領域の周辺領域の特徴量を含む。 Preferably, each feature amount includes a feature amount of the face area detected from the image and a feature amount of the peripheral area of the face region.

好ましくは、顔領域の特徴量は、顔パーツの色、位置および形状を含む。 Preferably, the feature amount of the face area includes the color, position, and shape of the face part.

好ましくは、顔領域の周辺領域の特徴量は文字情報および被服情報のうち少なくとも一方を含む。 Preferably, the feature amount of the peripheral region of the face region includes at least one of character information and clothing information.

好ましくは、音声を入力する音声入力部と、音声入力部の入力した音声に基づいて音声に対応する人物の国籍を判定する音声国籍判定部と、を備え、最終国籍判定部は、画像国籍判定部の判定した個々の特徴量に対応する人物の国籍と音声国籍判定部の判定した音声に対応する国籍に基づいて、最終的な人物の国籍を判定する。 Preferably, a voice input unit that inputs voice and a voice nationality determination unit that determines the nationality of a person corresponding to the voice based on the voice input by the voice input unit, and the final nationality determination unit includes an image nationality determination The final nationality of the person is determined based on the nationality of the person corresponding to each feature amount determined by the department and the nationality corresponding to the voice determined by the voice nationality determination part.

好ましくは、画像中の人物の国籍を、画像と発話内容から総合的に判断できる。 Preferably, the nationality of the person in the image can be comprehensively determined from the image and the content of the utterance.

好ましくは、音声国籍判定部は、音声から発話言語を認識し、認識された発話言語に基づいて音声に対応する国籍を判定する。 Preferably, the voice nationality determination unit recognizes an utterance language from the voice and determines a nationality corresponding to the voice based on the recognized utterance language.

好ましくは、画像国籍判定部は、画像から人物の普遍的な属性に関する特徴量を抽出し、抽出された人物の普遍的な属性に関する特徴量に基づいて、画像に対応する人物の普遍的な属性を判定し、音声国籍判定部は、音声に基づいて音声に対応する人物の普遍的な属性を判定し、最終国籍判定部は、画像国籍判定部の判定した画像に対応する人物の普遍的な属性と音声国籍判定部の判定した音声に対応する人物の普遍的な属性に基づいて、最終的な人物の普遍的な属性を判定する。 Preferably, the image nationality determination unit extracts a feature amount related to the universal attribute of the person from the image, and based on the feature amount related to the extracted universal attribute of the person, the universal attribute of the person corresponding to the image The voice nationality determination unit determines a universal attribute of the person corresponding to the voice based on the voice, and the final nationality determination unit determines the universality of the person corresponding to the image determined by the image nationality determination unit. The universal attribute of the final person is determined based on the attribute and the universal attribute of the person corresponding to the voice determined by the voice nationality determination unit.

好ましくは、最終的な人物の普遍的な属性は性別および年齢のうち少なくとも一方を含む。 Preferably, the universal attributes of the final person include at least one of gender and age.

好ましくは、最終国籍判定部は、画像国籍判定部の判定した個々の特徴量に対応する人物の国籍および音声国籍判定部の判定した音声に対応する国籍に対応して予め定義された優先度に基づいて、最終的な人物の国籍を判定する。 Preferably, the final nationality determination unit has a priority defined in advance corresponding to the nationality of the person corresponding to each feature amount determined by the image nationality determination unit and the nationality corresponding to the sound determined by the audio nationality determination unit. Based on this, determine the nationality of the final person.

好ましくは、最終国籍判定部の判定した最終的な人物の国籍を示す情報を所定の再生装置に出力する最終判定結果出力部を備える。 Preferably, a final determination result output unit is provided that outputs information indicating the nationality of the final person determined by the final nationality determination unit to a predetermined playback device.

好ましくは、最終的な人物の国籍と所望の再生情報とを対応づけて記憶する再生情報記憶部と、最終国籍判定部の判定した最終的な人物の国籍に対応する再生情報を再生情報記憶部から抽出し、抽出された再生情報を所定の再生装置に出力する再生情報出力部を備える。 Preferably, the reproduction information storage unit that stores the nationality of the final person and desired reproduction information in association with each other, and the reproduction information storage unit that stores the reproduction information corresponding to the nationality of the final person determined by the final nationality determination unit And a reproduction information output unit that outputs the extracted reproduction information to a predetermined reproduction device.

本発明に係る国籍判定方法は、コンピュータが、画像を入力するステップと、入力した画像から人物の属性に関する特徴量を複数抽出し、抽出された個々の特徴量に基づいて、個々の特徴量に対応する人物の国籍を個別に判定するステップと、個別に判定した個々の特徴量に対応する人物の国籍に基づいて、最終的な人物の国籍を判定するステップと、を含む。 The nationality determination method according to the present invention includes a step in which a computer inputs an image, and extracts a plurality of feature amounts relating to a person's attributes from the input image, and converts each feature amount into individual feature amounts based on the extracted individual feature amounts. A step of individually determining the nationality of the corresponding person, and a step of determining the final nationality of the person based on the nationality of the person corresponding to the individually determined individual feature amount.

国籍判定方法をコンピュータに実行させるためのプログラムも本発明に含まれる。 A program for causing a computer to execute the nationality determination method is also included in the present invention.

本発明によると、画像中の人物の国籍を、画像から得られた各々の特徴量から、さらに音声から、総合的に判断できる。また、その国籍に応じた情報の再生を行うことができ、人物の国籍に対応した内容の情報を提供できる。 According to the present invention, the nationality of a person in an image can be comprehensively determined from each feature amount obtained from the image and further from voice. In addition, information according to the nationality can be reproduced, and information corresponding to the nationality of the person can be provided.

国籍判定システムの概略構成図Schematic diagram of nationality determination system 国籍判定処理のフローチャートFlow chart of nationality determination processing 国籍情報ＤＢの情報を例示した図The figure which illustrated the information of nationality information DB

図１は本発明の好ましい実施形態に係る国籍判定システム１００の概略構成図である。このシステムは、画像入力装置１、画像解析装置２、音声入力装置３、音声解析装置４、国籍判定装置５、国籍情報ＤＢ６、国籍対応情報ＤＢ７、表示装置８を含む。国籍判定装置は単独のパソコン（演算回路、データ入出力回路、表示回路、操作装置、通信回路などを備えたもの）でもよいし、各パソコンがネットワークで接続されることで構成されてもよい。例えば、国籍情報ＤＢ６、国籍対応情報ＤＢ７はサーバコンピュータ、その他の装置はクライアントコンピュータとすることもできる。よって、これらの各装置が同じ場所に一体的に設置される必要はなく、例えば画像入力装置１、音声入力装置３、表示装置８は、空港のロビー、デパートの売り場、地下道の壁面、電車の乗降用扉の上部など人目につく場所に設置し、画像解析装置２、音声入力装置３、音声解析装置４、国籍判定装置５、国籍情報ＤＢ６、国籍対応情報ＤＢ７は、空港や地下道の管理室など人目につかない場所に置くことができる。 FIG. 1 is a schematic configuration diagram of a nationality determination system 100 according to a preferred embodiment of the present invention. This system includes an image input device 1, an image analysis device 2, a voice input device 3, a voice analysis device 4, a nationality determination device 5, a nationality information DB 6, a nationality correspondence information DB 7, and a display device 8. The nationality determination device may be a single personal computer (equipped with an arithmetic circuit, a data input / output circuit, a display circuit, an operation device, a communication circuit, etc.), or may be configured by connecting each personal computer via a network. For example, the nationality information DB 6 and the nationality correspondence information DB 7 can be server computers, and the other devices can be client computers. Therefore, it is not necessary for these devices to be integrally installed at the same place. For example, the image input device 1, the voice input device 3, and the display device 8 can be used in airport lobbies, department stores, underpass walls, train Installed in a conspicuous place such as the upper part of the entrance door, the image analysis device 2, the voice input device 3, the voice analysis device 4, the nationality determination device 5, the nationality information DB6, and the nationality correspondence information DB7 are the airport and underpass control room. It can be placed in an invisible place.

画像入力装置１は、画像（静止画または動画）を画像解析装置２に入力する。画像入力装置１は、撮像装置そのものでもよいし、他の撮像装置で撮影された画像を画像解析装置２に転送入力するインターフェースでもよい。入力される画像の被写体は、不特定多数の人通りのある場所に設置された撮像装置で撮影された人物である。例えば、上述のように画像入力装置１に組み込まれたか接続された撮像装置が空港のロビー、デパートの売り場、地下道の壁、電車の乗降用扉の上部などに設置されていれば、その被写体は、空港のロビーやデパートや地下道や電車を利用する人たちである。 The image input device 1 inputs an image (still image or moving image) to the image analysis device 2. The image input device 1 may be an imaging device itself or an interface that transfers and inputs an image captured by another imaging device to the image analysis device 2. The subject of the input image is a person photographed by an imaging device installed at a place where there are many unspecified people. For example, if an imaging device built in or connected to the image input device 1 as described above is installed in an airport lobby, department store, underpass wall, upper part of a train entrance door, the subject is These are people who use airport lobbies, department stores, underpasses and trains.

画像解析装置２は、入力された画像を解析し、その解析結果を国籍判定装置５に出力する。画像解析装置２の方法は任意である。例えば、顔検出を行い、顔領域を抽出する。顔検出の具体的な方法は、公知のものを採用できる。例えば、エッジ検出又は形状パターン検出による顔検出方法、特徴部の座標である特徴点をベクトル化し、特徴点ベクトルを近似検出することによる特徴点ベクトル近似法、色相検出又は肌色検出による領域検出方法、あるいは特許４１２７５２１のようなテンプレートとの相関値による顔判別等の公知の方法を利用することができる。そして、後述のように、検出された顔から各種の人の属性に関する特徴量を算出する。画像解析装置２と画像入力装置１とがリモート接続されていると、その間で画像の送受信が必要になるが、同一または近接した場所に両者を設置しておけば、その必要はなくなる。 The image analysis device 2 analyzes the input image and outputs the analysis result to the nationality determination device 5. The method of the image analysis apparatus 2 is arbitrary. For example, face detection is performed and a face area is extracted. As a specific method of face detection, a known method can be adopted. For example, a face detection method by edge detection or shape pattern detection, a feature point vector method by vectorizing feature points that are the coordinates of a feature part, and detecting a feature point vector approximately, a region detection method by hue detection or skin color detection, Alternatively, a known method such as face discrimination based on a correlation value with a template as in Japanese Patent No. 4127521 can be used. Then, as will be described later, feature amounts relating to various human attributes are calculated from the detected faces. When the image analysis device 2 and the image input device 1 are remotely connected, it is necessary to transmit and receive images between them. However, if both are installed at the same or close locations, this is not necessary.

音声入力装置３は、音声を集音してアナログ音声信号に変換するマイク、マイクの出力したアナログ信号を増幅するアンプ、増幅されたアナログ音声信号をデジタル音声データに変換する変換部を含む。入力される音声の発話者は、画像の被写体と同じく、不特定多数の人通りのある場所に設置された撮像装置で撮影された人物である。つまり、画像入力装置１と音声入力装置３は同一または近接した場所に設置される。なお、画像入力装置１と音声入力装置３の両方が機能しなくても、一方だけの情報で国籍判定することも可能であり、本発明は画像入力装置１の画像入力と音声入力装置３の音声入力の両方が常に存在しなければ実施不可能な訳ではない。 The voice input device 3 includes a microphone that collects voice and converts it into an analog voice signal, an amplifier that amplifies the analog signal output from the microphone, and a converter that converts the amplified analog voice signal into digital voice data. An input voice speaker is a person photographed by an imaging device installed at a place where there are a large number of unspecified people like the subject of the image. In other words, the image input device 1 and the voice input device 3 are installed at the same or close locations. Even if both of the image input device 1 and the voice input device 3 do not function, it is possible to determine the nationality based on only one of the information, and the present invention relates to the image input of the image input device 1 and the voice input device 3. This is not impossible unless both voice inputs are always present.

音声解析装置４は、複数の言語（方言含む）の発話音声パターンを記憶した音声データベース、音声入力装置３の出力したデジタル音声データと音声データベースとをマッチングして集音された発話の言語を識別する言語識別装置を含む。音声解析装置４と音声入力装置３とがリモート接続されていると、その間で音声の送受信が必要になるが、同一または近接した場所に両者を設置しておけば、その必要はなくなる。 The voice analysis device 4 identifies the language of the utterances collected by matching the voice database storing the utterance voice patterns of a plurality of languages (including dialects) and the digital voice data output from the voice input device 3 and the voice database. Including a language identification device. When the voice analysis device 4 and the voice input device 3 are remotely connected, it is necessary to transmit and receive voices between them. However, if both are installed in the same or close locations, this is not necessary.

国籍判定装置５は、ＣＰＵ，ＲＡＭ，ＲＯＭなど演算処理に必要な回路を備えており、画像解析装置２の解析結果と音声解析装置４による識別言語とに基づいて、人物の国籍を判定する。国籍の判定基準となる情報は国籍情報ＤＢ６に記憶されている。また、国籍判定装置５は、国籍判定システム１００の動作を統括制御することもできる。 The nationality determination device 5 includes circuits necessary for arithmetic processing such as a CPU, a RAM, and a ROM, and determines the nationality of a person based on the analysis result of the image analysis device 2 and the identification language by the voice analysis device 4. Information serving as a determination criterion for nationality is stored in the nationality information DB 6. The nationality determination device 5 can also control the operation of the nationality determination system 100 in an integrated manner.

表示装置８は、国籍判定装置５が判定した国籍に応じて出力する映像を表示する装置であり、液晶ディスプレイなどで構成される。表示装置８は、画像入力装置１と音声入力装置３は同一または近接した場所に設置される。よって、画像入力装置１と音声入力装置３の近くに人がいる場合、国籍判定装置５が判定したその人の国籍に応じた映像を、その人に見せることができる。 The display device 8 is a device that displays an image to be output according to the nationality determined by the nationality determination device 5, and includes a liquid crystal display or the like. In the display device 8, the image input device 1 and the voice input device 3 are installed at the same location or close to each other. Therefore, when there is a person in the vicinity of the image input apparatus 1 and the voice input apparatus 3, an image corresponding to the nationality of the person determined by the nationality determination apparatus 5 can be shown to the person.

国籍対応情報ＤＢ７には、最終判定の国籍とそれに対応する出力情報（文字および画像を含む映像および音声のいずれか一方、あるいは両者の組み合わせ）を予め蓄えておく。出力情報はさらに、国籍の分類基準とならない普遍情報、例えば性別や年齢などとさらに対応づけられていてもよく、国籍と対応する出力情報が男性と女性でさらに分類・個別化されていてもよい。国籍判定装置５は、判定した国籍に対応する出力情報を国籍対応情報ＤＢ７から抽出して、その出力情報を表示装置８に出力する。なお、国籍情報ＤＢ６や国籍対応情報ＤＢ７はＨＤＤなどの記憶媒体で構成されている。 The nationality correspondence information DB 7 stores in advance the final determination nationality and output information corresponding to it (one of video and audio including characters and images, or a combination of both). The output information may be further associated with universal information that does not constitute a nationality classification standard, such as gender and age, and the output information corresponding to nationality may be further classified and individualized for men and women. . The nationality determination device 5 extracts output information corresponding to the determined nationality from the nationality correspondence information DB 7 and outputs the output information to the display device 8. The nationality information DB 6 and the nationality correspondence information DB 7 are configured by a storage medium such as an HDD.

以下、図２のフローチャートを参照し、国籍判定システム１００の実行する国籍判定処理を説明する。この処理は国籍判定装置５によって制御され、その制御を実行させるためのプログラムは国籍判定装置５に備えられたＲＯＭなどのコンピュータ読取可能な記憶媒体に記憶されており、国籍判定装置５がこれを読み出して実行する。 Hereinafter, the nationality determination process executed by the nationality determination system 100 will be described with reference to the flowchart of FIG. This process is controlled by the nationality determination device 5, and a program for executing the control is stored in a computer-readable storage medium such as a ROM provided in the nationality determination device 5. Read and execute.

Ｓ１では、画像入力装置１にて、画像の入力を行う。入力方法は任意であり、カメラからの入力、既存の画像（静止画／動画）データの入力でもよい。ただし、画像の被写体の属性を即時に表示装置８の再生内容に即時に反映させるには、撮影画像をリアルタイムで入力する態様が望ましい。画像の撮影および入力タイミングを国籍判定装置５が指令できてもよい。例えば、国籍判定装置５は、１分ごとに１枚の静止画撮影を指示したり、あるいは、１０秒間の動画撮影を１分ごとに指示したりする。 In S <b> 1, the image input apparatus 1 inputs an image. The input method is arbitrary, and may be input from a camera or input of existing image (still image / moving image) data. However, in order to immediately reflect the attribute of the subject of the image in the reproduction content of the display device 8, it is desirable to input the captured image in real time. The nationality determination device 5 may be able to command image capturing and input timing. For example, the nationality determination device 5 instructs to take one still image every minute, or instructs to take a moving image for 10 seconds every minute.

Ｓ２では、画像解析装置２にて、入力した画像から顔領域を検出し、検出した顔から特徴量を算出する。画像が動画の場合、一定時間の内の動画を構成するコマから検出した、向きや表情が異なる複数の画像から同一人物の顔検出を行う。 In S <b> 2, the image analysis device 2 detects a face area from the input image, and calculates a feature amount from the detected face. When the image is a moving image, the face of the same person is detected from a plurality of images with different orientations and expressions detected from frames constituting the moving image within a certain time.

同一画像内に複数の顔が含まれる場合、あるいは異なる画像に複数の画像が含まれる場合は、画像が入力されたタイミングで、以降の国籍判定の対象とする顔を以下の条件に従って決定してもよい。 When multiple faces are included in the same image, or when multiple images are included in different images, the face to be subject to subsequent nationality determination is determined according to the following conditions at the time the image is input. Also good.

（１）画像に含まれる全ての顔を対象とする。 (1) Target all faces included in the image.

（２）画像に含まれる顔のうち、既定条件に合致するもの（顔サイズが閾値以上、例えば１６×１６ピクセル以上などのもの）。 (2) Of the faces included in the image, those that meet a predetermined condition (face size is equal to or greater than a threshold, for example, 16 × 16 pixels or greater)

（３）画像に含まれる顔のうち、対象とする顔を権限のあるユーザにより操作装置９（キーボード、マウス、タッチパネルなどで構成）を介して選択されたもの。 (3) Of the faces included in the image, the target face is selected by the authorized user via the operation device 9 (configured with a keyboard, mouse, touch panel, etc.).

なお、以上の条件に合致する顔が複数になる場合は、以降の処理は個々の顔について実行される。 If there are a plurality of faces that meet the above conditions, the subsequent processing is executed for each face.

Ｓ３では、画像解析装置２にて、抽出された顔から、顔の特徴量を算出する。顔の特徴量とその算出方法は例えば以下のようなものである。 In S3, the image analysis device 2 calculates the facial feature amount from the extracted face. The facial feature amount and its calculation method are as follows, for example.

（１）輪郭の形状。例えば特許文献７を参照。 (1) Contour shape. See, for example, US Pat.

（２）顔の肌の色。例えば特許文献７の段落００５４のように、色情報が所定の条件式を満たすと判定された領域（肌色領域）の色情報を肌の色とする。 (2) Face skin color. For example, as in paragraph 0054 of Patent Document 7, color information of an area (skin color area) determined that the color information satisfies a predetermined conditional expression is set as the skin color.

（３）目の位置、形状、瞳の色。目の位置の抽出は特許文献６の特徴点（両目の目尻、目頭）抽出で可能。目の形状の抽出は特許文献８で抽出した目の輪郭を目の形状とする。瞳の色の抽出については、特許文献９のように抽出した瞳領域の色情報を瞳の色とする。 (3) Eye position, shape, and pupil color. The eye position can be extracted by extracting the feature points (the eye corners and the eye heads) of Patent Document 6. The eye shape is extracted by using the eye contour extracted in Patent Document 8 as the eye shape. Regarding the extraction of the pupil color, the color information of the pupil region extracted as in Patent Document 9 is used as the pupil color.

（４）鼻の位置、形状。鼻の位置の抽出は特許文献６の特徴点抽出（左小鼻、右小鼻など）で可能。鼻の形状の抽出は特許文献１０で可能。 (4) The position and shape of the nose. Extraction of the position of the nose can be performed by feature point extraction (left nose, right nose, etc.) of Patent Document 6. Extraction of the shape of the nose is possible in Patent Document 10.

（５）唇の位置、形状。唇の位置の抽出は特許文献６の特徴点抽出（左口角、右口角、上唇の中点、下唇の中点など）で可能。唇の形状の抽出は特許文献１０で可能。 (5) The position and shape of the lips. The position of the lips can be extracted by feature point extraction (left mouth corner, right mouth corner, middle point of upper lip, middle point of lower lip, etc.) in Patent Document 6. Extraction of the shape of the lips is possible in Patent Document 10.

（６）髪形と髪の位置、色。髪形の抽出は特許文献１１で可能。また髪の位置と色は識別された髪形の存在位置とその色から抽出できる。 (6) Hair style, hair position and color. Extraction of hairstyle is possible in Patent Document 11. The position and color of the hair can be extracted from the position and color of the identified hairstyle.

あるいは、上記の顔の特徴量に基づいて、年齢、性別を推定する（特許文献１２参照）。その他の公知技術を用いて、画像から解析可能な人の属性を示す各種特徴量を算出してもよく、特徴量は上記に限定されない。 Alternatively, the age and sex are estimated based on the facial feature amount (see Patent Document 12). Other known techniques may be used to calculate various feature quantities indicating human attributes that can be analyzed from the image, and the feature quantities are not limited to the above.

Ｓ４では、画像解析装置２にて、入力された画像の顔領域の周辺領域（顔領域そのものは除かれる）から、人物の属性に関する特徴量を抽出する。例えば特許文献５のように服装を抽出する。首から下に身につけられている衣服に限らず帽子やマフラーなどの装飾品を抽出してもよい。あるいは、特許文献１２のように、画像からの文字認識技術を用いて、人の持っている書籍、新聞、雑誌や、人の着ている服にプリントされたロゴなど、人物の顔領域周辺にある文字情報を認識し、その認識された文字情報の言語を推定する。顔領域を基準とすればどこを周辺領域とするかは任意であり、例えば、顔領域の縦横サイズを所定の倍率（２倍、４倍など）で拡大した延長領域から顔領域を除いたものを周辺領域とする。 In S <b> 4, the image analysis apparatus 2 extracts a feature amount related to the attribute of the person from the peripheral area of the face area of the input image (the face area itself is excluded). For example, as in Patent Document 5, clothes are extracted. Not only clothes worn from the neck down, but also ornaments such as hats and mufflers may be extracted. Alternatively, as in Patent Document 12, using a character recognition technique from an image, around a person's face area such as a book, newspaper, magazine, or logo printed on a person's clothes Recognize certain character information and estimate the language of the recognized character information. If the face area is used as a reference, the surrounding area is arbitrary. For example, the face area is removed from the extended area obtained by enlarging the vertical and horizontal sizes of the face area by a predetermined magnification (2 ×, 4 ×, etc.) Is the surrounding area.

Ｓ５では、音声入力装置３から音声を入力する。画像の入力（撮像のタイミング）と音声入力のタイミングは同期するよう国籍判定装置５により制御されるものとする。 In S5, a voice is input from the voice input device 3. Assume that the nationality determination device 5 controls the image input (imaging timing) and the audio input timing to be synchronized.

Ｓ６では、音声解析装置４による音声解析を開始する。例えば、特許文献３および４のように、音声言語データベースのサンプル音声パターンと入力音声とのマッチングを行うことで、発話言語を識別する。なお、画像解析に代えて、あるいは画像解析とともに、音声に基づいて話者の性別や年齢を判定することもできる。例えば特許文献１４のように、性別、年齢別の音声モデルと入力音声とのマッチングに応じて、最も一致度の高い音声モデルに対応する年齢および性別を話者の性別や年齢と判定する。 In S6, voice analysis by the voice analysis device 4 is started. For example, as in Patent Documents 3 and 4, the speech language is identified by matching the sample speech pattern of the speech language database with the input speech. Note that the gender and age of the speaker can be determined based on the voice instead of or together with the image analysis. For example, as in Patent Document 14, the age and gender corresponding to the speech model having the highest degree of coincidence are determined as the gender and age of the speaker in accordance with matching between the sex model and the age-specific speech model and the input speech.

Ｓ７では、国籍判定装置５が、画像解析装置２の解析結果および音声解析装置４の解析結果に基づいて、人の国籍、性別、年齢を判定する。 In S <b> 7, the nationality determination device 5 determines the nationality, gender, and age of the person based on the analysis result of the image analysis device 2 and the analysis result of the voice analysis device 4.

まず、画像解析装置２の解析結果として出力可能なパターンの各々に対応する国籍を国籍情報ＤＢ６に予め格納しておき、国籍判定装置５は、画像解析装置２の実際の個別の解析結果（Ｓ３またはＳ４）に対応する国籍情報を国籍情報ＤＢ６から抽出する。 First, the nationality corresponding to each of the patterns that can be output as the analysis result of the image analysis apparatus 2 is stored in the nationality information DB 6 in advance, and the nationality determination apparatus 5 performs the actual individual analysis result (S3 of the image analysis apparatus 2). Alternatively, the nationality information corresponding to S4) is extracted from the nationality information DB6.

例えば、国籍情報ＤＢ６には、コーカソイド型骨格＝ヨーロッパ、モンゴロイド系骨格＝アジア、ネグロイド系骨格＝アフリカのように顔および顔パーツ（目、鼻、唇）の形状（ないし当該形状を示す特徴量）と出身地域とが対応づけられており、国籍判定装置５が画像解析装置２の実際の解析結果である顔および顔パーツの形状に対応する出身地域を国籍情報ＤＢ６から特定し、これを顔ベースの国籍情報ａとして抽出する。 For example, in the nationality information DB 6, the shape of the face and face parts (eyes, nose, lips) (or the feature amount indicating the shape) as in the Caucasian skeleton = Europe, the Mongoloid skeleton = Asia, and the Negroid skeleton = Africa. The nationality determination device 5 identifies from the nationality information DB 6 the birth region corresponding to the shape of the face and face parts, which are the actual analysis results of the image analysis device 2, and this is based on the face base. As nationality information a.

国籍情報ＤＢ６には、褐色＝アジア、白色＝ヨーロッパ、黒色＝アフリカなど顔領域の肌色（ないし当該色を示す特徴量）と出身地域とが対応づけられており（図３参照）、国籍判定装置５が画像解析装置２の実際の解析結果である顔領域の肌色に対応する出身地域を国籍情報ＤＢ６から特定し、これを肌色ベースの国籍情報ｂとして抽出する。 The nationality information DB 6 associates the skin color (or feature amount indicating the color) of the face region such as brown = Asia, white = Europe, black = Africa, and the region of origin (see FIG. 3). 5 designates a region of origin corresponding to the skin color of the face region, which is an actual analysis result of the image analysis device 2, from the nationality information DB 6, and extracts this as skin color-based nationality information b.

国籍情報ＤＢ６には、顔および顔パーツの形状ならびに肌色の組み合わせとその出身地域とを対応づけていてもよい。例えば、コーカソイド型骨格かつ褐色系肌色＝ヒスパニック（中南米）などとできる。 The nationality information DB 6 may associate a combination of the shape of the face and face parts and the skin color with the region of origin. For example, a Caucasian skeleton and brown skin color = Hispanic (Latin America) can be used.

あるいは、国籍情報ＤＢ６には、サリー＝インド、チマチョゴリ＝朝鮮半島、ターバン＝中東、着物＝日本など各国の民族衣裳の画像特徴量パターンとその出身地域とが対応づけられて格納されており、国籍判定装置５が画像解析装置２の実際の解析結果である服装を示す特徴量に対応する出身地域を国籍情報ＤＢ６から特定し、これを衣裳ベースの国籍情報ｃとして抽出する。 Alternatively, the nationality information DB 6 stores the image feature pattern of national costumes of each country such as Sally = India, Chimachogori = Korean Peninsula, Turban = Middle East, Kimono = Japan, and the region from which they are associated. The determination device 5 identifies the region of origin corresponding to the feature quantity indicating the clothing, which is the actual analysis result of the image analysis device 2, from the nationality information DB 6, and extracts this as the costume-based nationality information c.

あるいは、国籍情報ＤＢ６には、日本語＝日本、英語（アメリカ英語）＝アメリカ合衆国、英語（ブリティッシュ英語）＝イギリス連邦、スペイン語＝スペインまたはラテンアメリカ、北京語＝北京周辺、広東語＝広東州、香港、マカオ、のように、使用言語と出身地域とが対応づけられており、国籍判定装置５が、画像解析装置２の実際の解析結果である言語に対応する出身地域を国籍情報ＤＢ６から特定し、これを画像周辺情報ベースの国籍情報ｄとして抽出する。ここでいう使用言語は書き言葉であるが、話し言葉でも同様の対応づけが可能である。 Alternatively, the nationality information DB 6 includes Japanese = Japan, English (American English) = United States, English (British English) = United Kingdom, Spanish = Spain or Latin America, Mandarin = Beijing, Cantonese = Guangdong, Like Hong Kong and Macau, the language used and the region of origin are associated with each other, and the nationality determination device 5 identifies the region of origin corresponding to the language that is the actual analysis result of the image analysis device 2 from the nationality information DB 6. This is extracted as nationality information d based on image peripheral information. The language used here is written language, but the same correspondence is possible with spoken language.

すなわち、国籍判定装置５が、音声解析装置４の解析結果である言語に対応する出身地域を国籍情報ＤＢ６から特定し、これを発話ベースの国籍情報ｅとして抽出する。同一言語の方言による出身地域の細分はあってもなくてもよいが、特に母語人口の多い言語（中国語、英語、スペイン語など）では、記述言語のつづりや発話言語アクセントで区別可能な範囲で出身地域を細分化した方が、人物の国籍の判定結果は正確になる。 That is, the nationality determination device 5 identifies a region of origin corresponding to the language that is the analysis result of the speech analysis device 4 from the nationality information DB 6 and extracts this as utterance-based nationality information e. There may or may not be a subdivision of the region of origin in the same language dialect, but especially in languages with a large native population (Chinese, English, Spanish, etc.), the range that can be distinguished by spelling of spoken language and spoken language accents If you subdivide your home region, your nationality will be more accurate.

このように、国籍判定装置５は、画像解析装置２または音声解析装置４による個々の解析結果に対応する国籍情報を国籍情報ＤＢ６から抽出する。ただし、個々の解析自体が失敗したり、解析結果に対応する国籍情報が国籍情報ＤＢ６にない場合は、国籍情報は「不明」とする。 As described above, the nationality determination device 5 extracts nationality information corresponding to each analysis result by the image analysis device 2 or the voice analysis device 4 from the nationality information DB 6. However, if the individual analysis itself fails or the nationality information corresponding to the analysis result is not present in the nationality information DB 6, the nationality information is “unknown”.

次に、国籍判定装置５は、国籍情報ＤＢ６から抽出された個々の国籍情報に基づいて人物の最終的な国籍を判定する。これは例えば、各解析結果に対応する個別の国籍情報に優先度を予め国籍情報ＤＢ６などの記憶媒体に定義しておき、最も高い優先度を有する個別の国籍情報を最終的な国籍とする。 Next, the nationality determination device 5 determines the final nationality of the person based on the individual nationality information extracted from the nationality information DB 6. For example, the priority is defined in advance in a storage medium such as the nationality information DB 6 for individual nationality information corresponding to each analysis result, and the individual nationality information having the highest priority is defined as the final nationality.

例えば、薄橙の肌をしたスーツ姿の４０才の日本人男性が、アメリカ英語でスペリングされた記事の記載された英字新聞を持っており、日本語を話しているとし、この日本人男性が被写体となって画像が撮影され、発話音声が集音された結果、国籍情報ａ＝アジア、国籍情報ｂ＝アジア、国籍情報ｃ＝不明、国籍情報ｄ＝英語、国籍情報ｅ＝日本、性別＝男、年齢＝４０代前半となったとする。 For example, a 40-year-old Japanese man in a suit with light orange skin has an English newspaper with an article spelled in American English and speaks Japanese. As a result of taking an image as a subject and collecting speech, nationality information a = Asia, nationality information b = Asia, nationality information c = unknown, nationality information d = English, nationality information e = Japan, gender = Suppose a man is age = early 40s.

また、国籍情報ＤＢ６には、発話ベースの国籍情報ｅ＞顔ベースの国籍情報ａ＞衣裳ベースの国籍情報ｃ＞肌色ベースの国籍情報ｂ＞画像周辺情報ベースの国籍情報ｄという順序が格納されているとする。そうすると、国籍情報の優先度は、発話ベースの国籍情報ｅである「日本」が最上位に来るため、国籍判定装置５は、人物の国籍を「日本」と最終的に判定する。なお性別と年齢は最終国籍判定に利用されない。 The nationality information DB 6 stores the order of utterance-based nationality information e> face-based nationality information a> costume-based nationality information c> skin color-based nationality information b> image peripheral information-based nationality information d. Suppose that Then, since the priority of the nationality information is “Japan” which is the speech-based nationality information e, the nationality determination device 5 finally determines the nationality of the person as “Japan”. Gender and age are not used for final nationality determination.

上記の順序づけは、正確性の高い国籍情報を上位に持ってくると判断精度が高くなる。ただし、国籍判定システム１００の設置場所など使用環境に応じた適切な判定を可能にするため、どのような順序を付けるかは権限のあるユーザにより操作装置９から任意に設定できてもよい。例えば、国籍判定システム１００が人の往来が激しく騒々しい場所に設置される場合は、音声による国籍判定の精度は低いものと考えられるから、発話ベースの国籍情報ｅの優先度を低くするとよい。 In the above-mentioned ordering, the accuracy of determination becomes high when highly accurate nationality information is brought to the top. However, in order to enable appropriate determination according to the usage environment such as the installation location of the nationality determination system 100, the order of the order may be arbitrarily set by the authorized user from the operation device 9. For example, when the nationality determination system 100 is installed in a noisy place where people are busy, it is considered that the accuracy of nationality determination by voice is considered to be low, so the priority of the utterance-based nationality information e may be lowered. .

また、常に全てのジャンルの国籍情報ａ〜ｅがそろわない場合もあるが、存在しない国籍情報は優先度の判定対象から除外される。例えば、録画はできたが録音ができなかった場合は、発話ベースの国籍情報ｅが存在せず、存在しない発話ベースの国籍情報ｅを除く国籍情報の中で最も順位の高いものを最終的な国籍と判定する。 In addition, nationality information a to e for all genres may not always be available, but nationality information that does not exist is excluded from priority determination targets. For example, when recording is possible but recording is not possible, the utterance-based nationality information e does not exist, and the highest-ranked nationality information excluding the non-existent utterance-based nationality information e is finally obtained. Judged as nationality.

最終的な国籍判定は順序づけによらなくてもよい。例えば、個々の国籍情報の中の多数派（最も多い同一の国籍）を最終的な国籍と判定してもよい。国籍情報ａ＝アジア、国籍情報ｂ＝アジア、国籍情報ｃ＝不明、国籍情報ｄ＝英語、国籍情報ｅ＝日本では、アジアが２つ、英語が１つ、日本が１つであり、アジアが多数派であるから、アジアが最終的な国籍となる。ただし、日本はアジアに含まれるから、日本も多数派であるアジアの中に含まれるとみなし、この場合、アジアよりも下位概念の国籍情報である日本を最終的な国籍と判定してもよい。 Final nationality determination does not have to be based on ordering. For example, the majority (the most common nationality) in the individual nationality information may be determined as the final nationality. Nationality information a = Asia, Nationality information b = Asia, Nationality information c = Unknown, Nationality information d = English, Nationality information e = In Japan, there are 2 Asia, 1 English, 1 Japan, Asia is the final nationality because of the majority. However, since Japan is included in Asia, it is considered that Japan is also included in the majority Asia, and in this case, Japan, which is nationality information of a lower concept than Asia, may be determined as the final nationality. .

普遍情報（性別および年齢）の判定についても、画像解析と音声解析の結果で得られた性別および年齢に基づいて総合的に判定する。年齢は一意的な数字でなくてもよく、１０代前半、１０代後半、２０代前半、といった数値範囲（年齢層）でもよい。例えば、性別については、画像解析、音声解析のそれぞれの結果で得られた性別が一致すれば、その一致する性別を最終的な性別と判定する。一致しなければ、「性別不明」と判定する。また、年齢については、画像解析、音声解析のそれぞれの結果で得られた年齢層の論理和（ＯＲ）を取った範囲を最終的な年齢層とする。両解析結果の論理積（ＡＮＤ）つまり両者の重複する年齢を最終的な年齢層と判定してもよいが、重複がなければ「年齢不明」と判定する。 Universal information (gender and age) is also determined based on the gender and age obtained from the results of image analysis and speech analysis. The age may not be a unique number, but may be a numerical range (age group) such as early teens, late 10s, early 20s. For example, regarding the gender, if the genders obtained from the results of the image analysis and the voice analysis match, the matching gender is determined as the final gender. If they do not match, it is determined that the gender is unknown. Regarding the age, a range obtained by taking a logical sum (OR) of the age groups obtained from the results of the image analysis and the voice analysis is set as the final age group. The logical product (AND) of both analysis results, that is, the overlapping age of both may be determined as the final age group, but if there is no overlapping, it is determined as “age unknown”.

Ｓ８では、Ｓ７で最終的に判定した国籍を表示装置８に表示する。あるいは、最終判定した国籍そのものを表示する代わりに、あるいは判定した国籍とともに、最終判定した国籍に対応する出力情報（映像）を国籍対応情報ＤＢ７から取得し、その取得した出力情報を表示してもよい。また、図示は省略するが、国籍判定システム１００が音声デコーダ、アンプ、スピーカといった公知の音声再生装置１０を備えており、出力情報が音声情報を含んでいれば、最終判定した国籍に対応する音声情報を当該音声再生装置１０にて再生してもよい。もちろん、出力情報が映像と音声の両方を含んでいてもよく、この両者を表示装置８および音声再生装置１０で同期して表示してもよい。 In S8, the nationality finally determined in S7 is displayed on the display device 8. Alternatively, instead of displaying the final determined nationality itself or together with the determined nationality, output information (video) corresponding to the final determined nationality is acquired from the nationality correspondence information DB 7 and the acquired output information is displayed. Good. Although illustration is omitted, if the nationality determination system 100 includes a known audio reproduction device 10 such as an audio decoder, an amplifier, and a speaker, and the output information includes audio information, the audio corresponding to the final determined nationality. Information may be reproduced by the audio reproduction device 10. Of course, the output information may include both video and audio, and both may be displayed synchronously on the display device 8 and the audio reproduction device 10.

例えば、上記のように人物の国籍を「日本」と最終的に判定した場合は、「日本」に対応する出力情報（日本語の空港案内や広告メッセージ）を映像または音声にて再生する。「日本」に対応する出力情報が、国籍対応情報ＤＢ７にて普遍情報（性別や年齢）ごとにサブカテゴライズされている場合は、最終判定国籍「日本」、判定年齢「４０代」および判定性別「男」に対応する出力情報を国籍対応情報ＤＢ７から取得して再生する。 For example, when the nationality of the person is finally determined as “Japan” as described above, the output information (Japanese airport guide or advertisement message) corresponding to “Japan” is reproduced by video or audio. When the output information corresponding to “Japan” is subcategorized for each universal information (gender and age) in the nationality correspondence information DB 7, the final determination nationality “Japan”, the determination age “40s”, and the determination gender “ Output information corresponding to “male” is acquired from the nationality correspondence information DB 7 and reproduced.

あるいは、人物の国籍を「アジア」と最終的に判定した場合は、「アジア」に対応する出力情報（日本語・朝鮮語・北京語併記の空港案内や広告メッセージ）を映像および／または音声にて再生する。 Alternatively, if the nationality of the person is finally determined to be “Asia”, the output information corresponding to “Asia” (airport information and advertisement messages in both Japanese, Korean, and Mandarin) will be displayed in video and / or audio. To play.

国籍を最終判定した人物が複数存在する場合は、各人物の最終判定国籍および／またはそれに対応した出力情報を入力画像中の被写体人物と対応づけて表示してもよい。これは画像が静止画でも動画でも可能である。画像が複数であれば、各画像を１枚ずつあるいは分割表示で１画面に所定の表示期間し、かつ、画像内の被写体人物に対応する国籍および／またはそれに対応した出力情報を表示する。両者の対応づけは、被写体人物付近に配置された吹き出しなどを用いればよい。 When there are a plurality of persons whose final nationality is determined, the final determined nationality of each person and / or output information corresponding thereto may be displayed in association with the subject person in the input image. This can be a still image or a moving image. If there are a plurality of images, each image is displayed one by one or dividedly on one screen for a predetermined display period, and nationality and / or output information corresponding to the subject person in the image is displayed. For the correspondence between the two, a balloon or the like arranged near the subject person may be used.

また、上述の多数派で最終国籍判定する場合、同数の多数派が複数存在したり、個々の国籍情報が相互に矛盾して多数派が１つもないなどの理由で、一意に最終判定することが不可能な場合は、個々の国籍情報の確からしさの順に個々の国籍情報を国籍候補として表示装置８にリスト表示し、権限のあるユーザが国籍候補から操作装置９を介して任意に選択した国籍候補を最終的な国籍と判定してもよい。あるいは、個々の国籍情報がいずれも「不明」である場合は、国籍対応情報ＤＢ７に蓄積されたデフォルト情報、例えば英語で記述された案内や広告メッセージを出力情報として取得し、表示あるいは音声再生する。 Also, when determining the final nationality with the majority mentioned above, the final determination should be made uniquely because there are multiple majoritys of the same number or because there is no single majority due to conflicting individual nationality information. Is impossible, the individual nationality information is displayed as a list of nationality candidates on the display device 8 in the order of accuracy of the individual nationality information, and the authorized user arbitrarily selects the nationality candidate from the nationality candidates via the operation device 9. The nationality candidate may be determined as the final nationality. Alternatively, when each nationality information is “unknown”, default information stored in the nationality correspondence information DB 7, for example, guidance or advertisement message written in English is acquired as output information, and displayed or reproduced by voice. .

国籍対応情報ＤＢ７から取得した出力情報の再生が完了した場合は、Ｓ１に戻り、同様の処理を繰り返すことができる。タイマー機能で本処理の開始・終了を自動的に制御してもよいし、権限のあるユーザが国籍候補から操作装置９を介して任意に処理の開始・継続・終了を指令できてもよい。 When the reproduction of the output information acquired from the nationality correspondence information DB 7 is completed, the process returns to S1 and the same processing can be repeated. The start / end of this process may be automatically controlled by the timer function, or an authorized user may arbitrarily command the start / continuation / end of the process from the nationality candidate via the operation device 9.

以上の処理により、人物の国籍を総合的に正確に判定でき、また判定された人物の国籍に合った内容の情報を出力でき、当該人物に理解しやすい形で情報を提示できる。 Through the above processing, the nationality of the person can be accurately and comprehensively determined, information on the content that matches the determined nationality of the person can be output, and the information can be presented in a form that is easy for the person to understand.

１：画像入力装置、２：画像解析装置、３：音声入力装置、４：音声解析装置、５：国籍判定装置、６：国籍情報ＤＢ、７：国籍対応情報ＤＢ、８：表示装置 1: Image input device, 2: Image analysis device, 3: Audio input device, 4: Audio analysis device, 5: Nationality determination device, 6: Nationality information DB, 7: Nationality correspondence information DB, 8: Display device

Claims

An image input unit for inputting an image;
An image nationality that extracts a plurality of feature quantities related to the attributes of a person from an image input by the image input unit, and individually determines nationality information of the person corresponding to the individual feature quantities based on the extracted individual feature quantities A determination unit;
Based on the nationality information of the person corresponding to the individual feature amounts individually determined by the image nationality determination unit, and the priority order of the nationality information defined in advance , the ranking of the determined individual nationality information is determined. A final nationality determination unit that determines and determines the nationality information of the highest ranking as the final nationality of the person,
Equipped with a,
The final nationality determination unit is a nationality determination device that excludes nationality information that cannot be determined from the determination target of the ranking of the nationality information .

The nationality determination device according to claim 1, wherein the individual feature amount includes a feature amount of a face area detected from the image and a feature amount of a peripheral area of the face region.

The nationality determination device according to claim 2, wherein the feature amount of the face area includes a color, a position, and a shape of a face part.

The nationality determination device according to claim 2 or 3, wherein the feature amount of the peripheral region of the face region includes at least one of character information and clothing information.

A voice input unit for inputting voice;
A voice nationality determination unit that determines nationality information of a person corresponding to the voice based on the voice input by the voice input unit;
With
The final nationality determination unit includes nationality information corresponding to the individual features determined by the image nationality determination unit and nationality information corresponding to the sound determined by the audio nationality determination unit, and the predefined nationality The order of the determined individual nationality information is determined based on the priority order of information, and the nationality of the final person is determined based on the highest nationality information . Nationality determination device.

The nationality determination device according to claim 5, wherein the voice nationality determination unit recognizes an utterance language from the voice and determines a nationality corresponding to the voice based on the recognized utterance language.

The image nationality determination unit extracts a feature amount related to the universal attribute of the person from the image, and determines a universal attribute of the person corresponding to the image based on the extracted feature amount related to the universal attribute of the person. Judgment,
The voice nationality determination unit determines a universal attribute of the person corresponding to the voice based on the voice,
The final nationality determination unit is configured to determine a final attribute based on the universal attribute of the person corresponding to the image determined by the image nationality determination unit and the universal attribute of the person corresponding to the sound determined by the audio nationality determination unit. The nationality determination device according to claim 5 or 6, wherein a universal attribute of a typical person is determined.

The nationality determination device according to claim 7, wherein the universal attribute of the final person includes at least one of gender and age.

The final nationality determination unit is defined in advance corresponding to the nationality information of the person corresponding to the individual feature amount determined by the image nationality determination unit and the nationality information corresponding to the sound determined by the audio nationality determination unit. The nationality determination device according to any one of claims 5 to 8, wherein a rank of the determined individual nationality information is determined based on priority, and the highest nationality information is determined as the final nationality of a person. .

The nationality determination device according to claim 1, further comprising a final determination result output unit that outputs information indicating the nationality of the final person determined by the final nationality determination unit to a predetermined playback device.

A reproduction information storage unit for storing the final person's nationality and desired reproduction information in association with each other;
A reproduction information output unit that extracts reproduction information corresponding to the nationality of a final person determined by the final nationality determination unit from the reproduction information storage unit and outputs the extracted reproduction information to a predetermined reproduction device. The nationality determination device according to any one of 1 to 10.

Computer
Inputting an image;
Extracting a plurality of feature quantities related to the attributes of the person from the input image, and individually determining the nationality of the person corresponding to the individual feature quantities based on the extracted individual feature quantities;
Based on the nationality information of the person corresponding to the individually determined individual feature amount and the priority order of the nationality information defined in advance , the ranking of the determined individual nationality information is determined, and the highest ranking and nationality and determining the final person highly nationality information of,
Excluding indeterminate nationality information from the determination target of the nationality information,
Nationality determination method including

A program for causing a computer to execute the nationality determination method according to claim 12.