JP2010152866A

JP2010152866A - Sex-age identification method and device based on sound and image

Info

Publication number: JP2010152866A
Application number: JP2009182589A
Authority: JP
Inventors: Hejin Kim; ヘジンキム; Ho Seop Yoon; ホソプユン; Dae Hwan Hwang; デファンファン
Original assignee: Electronics and Telecommunications Research Institute ETRI
Current assignee: Electronics and Telecommunications Research Institute ETRI
Priority date: 2008-12-23
Filing date: 2009-08-05
Publication date: 2010-07-08
Anticipated expiration: 2029-08-05
Also published as: KR101189765B1; JP4881980B2; KR20100073845A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a sex-age identification method and device based on sound and image. <P>SOLUTION: The present invention relates to an identification device and method capable of accurately computing sex and age by performing sound recognition and face recognition in combination, considering a mutual relevancy between sex information and age information. The sex-age identification method includes: a step of collecting image information and sound information; a sound information-used sex and age identification step of extracting at least one characteristic value for the collected sound information and identifying sex and age by use of the extracted characteristic value; a face information-used sex and age identification step of extracting at least one characteristic value for the collected image information and identifying sex and age using the extracted characteristic value; and a step of finally determining sex and age by performing combination operation of the sex and age identified using the sound information and the sex and age identified using the face information. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、入力された映像情報及び音声情報から特定人の性別及び年齢を識別することができる方法及びその装置に関し、さらに詳しくは、性別情報と年齢情報の相互関連性を考慮して音声認識及び顔認識を組み合わせて行うことによって正確に性別及び年齢を演算することができる識別装置及び方法に関する。 The present invention relates to a method and apparatus for identifying gender and age of a specific person from input video information and audio information, and more particularly, speech recognition in consideration of the reciprocal relationship between gender information and age information. In addition, the present invention relates to an identification apparatus and method that can accurately calculate gender and age by combining face recognition.

従来技術によるユーザの性別及び年齢識別技術として、電子住民証のような個人識別手段を用いる方法、顔認識を用いる方法、音声認識を用いる方法などが存在する。 As a conventional technique for gender and age identification of a user, there are a method using personal identification means such as an electronic resident card, a method using face recognition, a method using voice recognition, and the like.

個人識別手段を用いる方法のうちの一つである電子住民証を用いた年齢認識方法（韓国公開特許第１９９９−０００８６７９号）は、各個人が電子住民証のような個人識別手段をいつも携帯しなければならないという不便がある。また電子住民証のような個人識別手段は紛失、破損、偽造などが発生し易い問題点がある。 One of the methods using personal identification means is an age recognition method using an electronic resident ID (Korea Published Patent No. 1999-0008679), in which each individual always carries an individual identification means such as an electronic resident ID. There is an inconvenience of having to. In addition, personal identification means such as electronic resident cards are prone to loss, damage, forgery and the like.

従来の性別−年齢識別技術として使用される顔認識方法の場合には、顔映像情報のみで性別及び年齢を判断しているため、各個人ごとの特徴を反映し難く認識正確度が低い。また、音声認識を用いた認識方法の場合には、音声情報のみで性別及び年齢を判断するため、女性と子供のように音声的特徴が類似する場合などにおいてその認識正確度が劣る問題点がある。 In the case of a face recognition method used as a conventional gender-age identification technique, the gender and age are determined based only on the face image information, so that it is difficult to reflect the characteristics of each individual and the recognition accuracy is low. In addition, in the case of a recognition method using voice recognition, since gender and age are determined only by voice information, there is a problem that the recognition accuracy is inferior when voice characteristics are similar, such as a woman and a child. is there.

また、従来の顔認識または音声認識に基づく識別方式は、性別によって特徴の分布が異なる特異性、または年齢によって性別の特徴分布が異なる特異性などを反映して年齢及び性別を識別することができないため、演算の正確度が低く演算量も多いという短所がある。 In addition, conventional identification methods based on face recognition or voice recognition cannot identify age and gender, reflecting the specificity that the distribution of features differs by gender, or the specificity that the distribution of gender features varies by age. Therefore, the calculation accuracy is low and the calculation amount is large.

韓国公開特許第１９９９−０００８６７９号公報Korean Published Patent No. 1999-0008679

本発明は、上述の問題点に鑑みてなされたもので、その目的は、性別情報と年齢情報の相互関連性を利用し、また音声認識及び顔認識を組み合わせることによって認識の正確度を向上させることのできる性別−年齢識別方法及びその装置を提供することにある。 The present invention has been made in view of the above-mentioned problems, and its purpose is to improve the accuracy of recognition by utilizing the correlation between gender information and age information and combining voice recognition and face recognition. An object is to provide a gender-age identification method and apparatus capable of performing the same.

上記目的を達成すべく、本発明の一態様による性別−年齢識別方法は、映像情報及び音声情報を収集するステップと、前記収集された音声情報に対して一つ以上の特徴値を抽出し、前記抽出された特徴値を用いて性別及び年齢を識別する音声情報を用いた性別及び年齢識別ステップと、前記収集された映像情報に対して一つ以上の特徴値を抽出し、前記抽出された特徴値を用いて性別及び年齢を識別する顔情報を用いた性別及び年齢識別ステップと、前記音声情報を用いて識別された性別及び年齢と前記顔情報を用いて識別された性別及び年齢とを組み合わせて演算を行って性別及び年齢を最終決定するステップと、を含む。 In order to achieve the above object, a gender-age identification method according to an aspect of the present invention includes a step of collecting video information and audio information, and extracting one or more feature values from the collected audio information, Gender and age identification step using audio information for identifying gender and age using the extracted feature value, and one or more feature values are extracted for the collected video information, and the extracted Gender and age identification step using face information for identifying gender and age using feature values; gender and age identified using voice information; and gender and age identified using face information. Performing a combinational operation to finally determine gender and age.

本発明の他の態様による性別−年齢識別装置は、映像情報及び音声情報を収集する入力部と、前記収集された音声情報に対して特徴値を抽出し、抽出された特徴値を用いて前記音声情報から性別及び年齢を識別する音声処理部と、前記収集された映像情報に対して特徴値を抽出し、抽出された特徴値を用いて前記映像情報から性別及び年齢を識別する映像処理部と、前記映像処理部で識別された性別及び年齢と前記音声処理部で識別された性別及び年齢とを組み合わせて演算を行って前記特定人の性別及び年齢を最終決定する最終識別部と、を含む。 According to another aspect of the present invention, there is provided a gender-age identification apparatus, an input unit that collects video information and audio information, and a feature value extracted from the collected audio information, and the extracted feature value is used to extract the feature value. An audio processing unit for identifying gender and age from audio information, and a video processing unit for extracting feature values from the collected video information and identifying gender and age from the video information using the extracted feature values And a final identification unit that finally determines the sex and age of the specific person by performing a calculation by combining the gender and age identified by the video processing unit and the gender and age identified by the audio processing unit. Including.

本発明によれば、音声認識及び顔認識を組み合わせて行うので、従来の音声認識のみを用いた方法または顔認識のみを用いた方法に比べて認識正確度が向上する効果がある。 According to the present invention, since voice recognition and face recognition are performed in combination, the recognition accuracy is improved as compared with the conventional method using only voice recognition or the method using only face recognition.

また本発明は、性別情報と年齢情報の相互関連性、例えば、年齢識別は性別によって特徴の分布が異なる特異性、または年齢によって性別の特徴分布が異なる特異性などを反映して年齢及び性別を認識するので、従来の認識方法に比べて高い正確度を保障することができる効果がある。 In addition, the present invention reflects the interrelationship between sex information and age information, for example, age identification reflects the specificity of the distribution of characteristics depending on the sex, or the specificity of the distribution of characteristics of the sex depending on the age. Since the recognition is performed, there is an effect that a higher accuracy can be ensured as compared with the conventional recognition method.

さらに本発明は、特徴抽出において入力された情報に対して各入力情報別に容易に区別できる特徴を基準として一次的に音声情報をグループ化し、前記基準によって区別された各グループに対して各グループ別特徴を反映して特徴値を抽出する方法を用いることによって、識別の正確性を確保することができ、また演算の重複性を排除して迅速な識別を行うことができる効果がある。 Furthermore, the present invention provides a method for grouping speech information primarily on the basis of features that can be easily distinguished for each input information with respect to information input in feature extraction, and for each group distinguished by the criteria. By using the method of extracting feature values reflecting the features, it is possible to ensure the accuracy of the identification, and it is possible to eliminate the duplication of operations and perform the quick identification.

本発明に係る性別−年齢識別装置の一実施形態を示す構成図である。It is a block diagram which shows one Embodiment of the sex-age identification apparatus which concerns on this invention. 図１に係る音声処理部の詳細構成図である。It is a detailed block diagram of the audio | voice processing part which concerns on FIG. 図１に係る映像処理部の詳細構成図である。FIG. 2 is a detailed configuration diagram of a video processing unit according to FIG. 1. 本発明に係る性別−年齢識別方法のフローチャートである。3 is a flowchart of a gender-age identification method according to the present invention. 図４の音声類似度識別ステップの詳細なフローチャートである。It is a detailed flowchart of the audio | voice similarity identification step of FIG. 図４の映像類似度識別ステップの詳細なフローチャートである。5 is a detailed flowchart of a video similarity identification step in FIG. 4.

以下、本発明の好ましい実施の形態を、添付図面に基づき詳細に説明する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

図１は、本発明による性別−年齢識別装置の一実施形態を示す構成図である。 FIG. 1 is a block diagram showing an embodiment of a gender-age identification apparatus according to the present invention.

図１に示すように、本発明による性別−年齢識別装置は、入力部１０、年齢−性別演算部２０及び出力部３０を含んで構成される。 As shown in FIG. 1, the gender-age identification apparatus according to the present invention includes an input unit 10, an age-gender calculation unit 20, and an output unit 30.

入力部１０は、特定人の映像情報及び音声情報を収集する。 The input unit 10 collects video information and audio information of a specific person.

このような入力部１０は、映像情報を取得できるカメラのような映像情報取得手段及び音響情報を取得できるスピーカー（マイク）のような音響情報取得手段を含んで構成することができる。 Such an input unit 10 can be configured to include video information acquisition means such as a camera that can acquire video information and acoustic information acquisition means such as a speaker (microphone) that can acquire acoustic information.

また入力部１０は、映像情報取得手段によって取得された映像情報から特定人の顔情報のみを別途抽出する顔抽出手段及び音響情報取得手段によって取得された音響情報から特定人の音声情報のみを別途抽出できる音声抽出手段を含んで構成することができる。この場合、年齢−性別演算部２０の各特徴抽出手段が毎度顔情報及び音声情報を映像情報及び音響情報から別途抽出する必要がないので迅速な演算が可能になる。 Further, the input unit 10 separately extracts only the voice information of the specific person from the acoustic information acquired by the face extraction means and the acoustic information acquisition means that separately extracts only the face information of the specific person from the video information acquired by the video information acquisition means. A voice extraction means that can be extracted can be included. In this case, each feature extraction unit of the age-gender calculation unit 20 does not need to separately extract face information and audio information from the video information and the sound information every time, so that quick calculation is possible.

かかる顔抽出手段及び音声抽出手段は、従来の顔検出技術を用いて具現されることができる。例えば、顔抽出のために知識ベース手法（Ｋｎｏｗｌｅｄｇｅ-ｂａＳｅｄＭｅｔｈｏｄＳ）、特徴ベース手法（Ｆｅａｔｕｒｅ-ｂａＳｅｄＭｅｔｈｏｄＳ）、テンプレートマッチング手法（Ｔｅｍｐｌａｔｅ-ｍａｔｃｈｉｎｇＭｅｔｈｏｄＳ）、見え方に基づいた手法（Ａｐｐｅａｒａｎｃｅ-ｂａＳｅｄＭｅｔｈｏｄＳ）、熱赤外線（ＩｎｆｒａＲｅｄ）方法、３次元顔認識方法、マルチモーダル方法などを用いて顔抽出手段を具現することができる。 Such face extraction means and voice extraction means can be implemented using conventional face detection technology. For example, for face extraction, a knowledge-based method (Knowledge-baSed MethodS), a feature-based method (Feature-baSed MethodS), a template matching method (Template-matching MethodS), a method based on appearance (Appearance-baSedM) The face extraction means can be implemented using a thermal infrared (Infra Red) method, a three-dimensional face recognition method, a multimodal method, or the like.

年齢−性別演算部２０は、音声情報を基に年齢及び性別を識別する音声処理部１００と、映像情報を基に年齢及び性別を識別する映像処理部２００と、音声処理部１００と映像処理部２００の演算結果を総合して年齢及び性別を決定する最終識別部３００と、を含んで構成される。 The age-gender computing unit 20 includes an audio processing unit 100 that identifies age and gender based on audio information, a video processing unit 200 that identifies age and gender based on video information, an audio processing unit 100, and a video processing unit. And a final discriminating unit 300 that determines the age and sex by combining the 200 calculation results.

出力部３０は、年齢−性別演算部２０から伝達された年齢及び性別を出力する。 The output unit 30 outputs the age and sex transmitted from the age-sex calculating unit 20.

以下図２及び図３を参照して、年齢−性別演算部２０について詳しく説明する。 Hereinafter, the age-sex calculator 20 will be described in detail with reference to FIGS. 2 and 3.

図２は、図１による音声処理部１００の詳細構成図である。 FIG. 2 is a detailed configuration diagram of the audio processing unit 100 according to FIG.

図２に示すように、音声処理部１００は音声情報から特徴値を抽出する音声特徴抽出部１１０及びその抽出された特徴値から性別及び年齢を識別する音声演算部１２０を含んで構成される。 As shown in FIG. 2, the speech processing unit 100 includes a speech feature extraction unit 110 that extracts feature values from speech information and a speech calculation unit 120 that identifies gender and age from the extracted feature values.

さらに詳しく説明すると、音声特徴抽出部１１０は、音声情報に対して一つ以上の特徴値または特徴ベクトル（以下、「特徴値」と通称する）を抽出する。このような音声特徴抽出部１１０は、線形予測係数（ＬｉｎｅａｒＰｒｅｄｉｃｔｉｖｅＣｏｅｆｆｉｃｉｅｎｔ）方法、ケプストラム（ＣｅｐＳｔｒｕｍ）方法、メルフリークエンシーケプストラム（ＭｅｌＦｒｅｑｕｅｎｃｙＣｅｐＳｔｒａｌＣｏｅｆｆｉｃｉｅｎｔ）方法、フィルタバンクエネルギ（ＦｉｌｔｅｒＢａｎｋＥｎｅｒｇｙ）方法などを用いたり、これらを組み合わせて特徴値を抽出することができる。 More specifically, the speech feature extraction unit 110 extracts one or more feature values or feature vectors (hereinafter referred to as “feature values”) from the speech information. The speech feature extraction unit 110 uses a linear predictive coefficient method, a cepstrum method, a mel frequency cepstrum coefficient method, a filter bank energy method such as a filter bank energy method. These can be combined to extract feature values.

音声特徴抽出部１１０は、前述の特徴値識別方法を複数適用して同一の音声情報から複数個の特徴値を抽出するか、単一の特徴値識別方法を使用し複数のサンプルを用いて複数個の特徴値を識別することができる。Ｎ個の特徴識別方法でＭ個の音声サンプルを対象に特徴値を得ると（Ｎ＊Ｍ）の行列形態に特徴値を現すことができる。 The voice feature extraction unit 110 extracts a plurality of feature values from the same voice information by applying a plurality of the above-described feature value identification methods, or uses a plurality of samples using a single feature value identification method. Individual feature values can be identified. When feature values are obtained for M speech samples by N feature identification methods, the feature values can be expressed in a matrix form of (N * M).

本発明の実施形態では、音声に対する特徴抽出を正確且つ迅速に行うために、性別特徴抽出部１１１、年齢別特徴抽出部−Ｍ１１２、年齢別特徴抽出部−ＦＣ１１３、年齢別特徴抽出部−Ｆ１１４及び性別特徴抽出部−Ｃ１１５を含んで音声特徴抽出部１１０を構成する。 In the embodiment of the present invention, in order to accurately and quickly extract features from speech, a gender feature extraction unit 111, an age-specific feature extraction unit-M112, an age-specific feature extraction unit-FC113, an age-specific feature extraction unit-F114, and A speech feature extraction unit 110 is configured including a gender feature extraction unit-C115.

性別特徴抽出部１１１は、入力された音声情報に対する男性と女性の相違点、即ち、性別特徴を反映して特徴値を抽出し、抽出された特徴値を基準に音声情報を男性グループ（Ｍ）または女性及び子供グループ（ＦＣ）に区分する。 The gender feature extraction unit 111 extracts a difference between male and female with respect to the input voice information, that is, extracts a feature value reflecting the gender feature, and sets the voice information as a male group (M) based on the extracted feature value. Or categorize into women and children groups (FC).

年齢別特徴抽出部−Ｍ１１２は、性別特徴抽出部１１１によって男性グループ（Ｍ）に区分された音声情報に対して特徴値を抽出する。この場合に入力される音声情報は男性の音声情報であると判断された音声情報であるため、それに対して男性の年齢別特徴を反映して特徴値を抽出することができる。 The age-specific feature extraction unit-M112 extracts feature values for the speech information classified into the male group (M) by the gender feature extraction unit 111. Since the voice information input in this case is voice information determined to be male voice information, a feature value can be extracted by reflecting male age characteristics.

年齢別特徴抽出部−ＦＣ１１３は、性別特徴抽出部１１１によって女性及び子供グループ（ＦＣ）に区分された音声情報に対して、女性及び子供の年齢別特徴を反映して特徴値を抽出することができる。その後、入力音声情報を改めて女性グループ（Ｆ）と子供グループ（Ｃ）に区分する。ここで、子供グループ（Ｃ）は男女の特徴を区分し難い変声期以前の人を対象とするグループである。 The age-specific feature extraction unit-FC 113 may extract feature values reflecting the age-specific features of women and children from the voice information classified into the women and children group (FC) by the gender feature extraction unit 111. it can. Thereafter, the input voice information is again divided into a female group (F) and a child group (C). Here, the child group (C) is a group for people before the voice change period, in which it is difficult to distinguish the characteristics of men and women.

年齢別特徴抽出部−Ｆ１１４は、年齢別特徴抽出部−ＦＣ１１３によって女性グループ（Ｆ）に区分された音声情報に対して、女性の年齢別特徴を反映して特徴値を抽出することができる。 The age-specific feature extraction unit-F114 can extract the feature value by reflecting the age-specific feature of the woman in the voice information classified into the female group (F) by the age-specific feature extraction unit-FC113.

性別特徴抽出部−Ｃ１１５は、年齢別特徴抽出部−ＦＣ１１３によって子供グループ（Ｃ）に区分された前記音声情報に対して、子供の性別特徴を反映して特徴値を抽出する。 The gender feature extraction unit-C115 extracts a feature value reflecting the gender feature of the child from the audio information classified into the child group (C) by the age-specific feature extraction unit-FC113.

音声演算部１２０は、前述のように音声特徴抽出部１１０によって抽出された特徴値の入力を受けて入力音声の性別及び年齢を識別することができる。 The voice calculation unit 120 can identify the gender and age of the input voice in response to the input of the feature value extracted by the voice feature extraction unit 110 as described above.

このために音声演算部１２０は、音声特徴抽出部１１０から抽出された特徴値に対して加重値を反映して代表特徴値を決定する組合演算部と、決定された代表特徴値に基づき、性別及び年齢別基準特徴値または音声及び映像基準サンプルを保存している基準ＤＢを参照して性別及び年齢を識別する識別部とを含む。 For this purpose, the speech computation unit 120 includes a combination computation unit that determines a representative feature value by reflecting a weighted value for the feature value extracted from the speech feature extraction unit 110, and a gender based on the determined representative feature value. And an identification unit that identifies gender and age with reference to a standard DB storing age standard feature values or audio and video standard samples.

また音声演算部１２０は、図２に示すように、音声特徴抽出部１１０でグループ化した男性グループ（Ｍ）、女性グループ（Ｆ）及び子供グループ（Ｃ）に対して各々最適化された組合演算部及び識別部を各々備えるように構成することが好ましい。 Further, as shown in FIG. 2, the voice calculation unit 120 is a combination calculation optimized for each of the male group (M), the female group (F), and the child group (C) grouped by the voice feature extraction unit 110. It is preferable to comprise so that a part and an identification part may be provided respectively.

以下では、このように各々組合演算部及び識別部が備えられた図２に示す実施の形態を基準に説明する。 Below, it demonstrates on the basis of embodiment shown in FIG. 2 with which the combination calculating part and the identification part were each provided in this way.

音声演算部１２０は、音声特徴抽出部１１０で男性グループ（Ｍ）に区分された音声情報から抽出された特徴値の入力を受けて性別及び年齢を演算する音声演算部−Ｍ１２１と、女性グループ（Ｆ）に区分された音声情報から抽出された特徴値の入力を受けて性別及び年齢を演算する音声演算部−Ｆ１２２と、子供グループ（Ｃ）に区分された音声情報から抽出された特徴値の入力を受けて性別及び年齢を演算する音声演算部−Ｃ１２３と、から構成することができる。 The voice calculation unit 120 receives a feature value extracted from the voice information classified into the male group (M) by the voice feature extraction unit 110, calculates a gender and age, and a female group (M121). F) a voice calculation unit F122 that calculates the gender and age in response to the input of the feature value extracted from the voice information classified into F), and the feature value extracted from the voice information classified into the child group (C) The voice calculation unit -C123 that receives the input and calculates gender and age can be used.

さらに詳しく説明すると、音声演算部−Ｍ１２１は組合演算部−Ｍ１２１Ａと識別部−Ｍ１２１Ｂとを含む。組合演算部−Ｍ１２１Ａは、男性グループ（Ｍ）に区分された音声情報から抽出された一つ以上の特徴値に対し加重値を付与して代表特徴値を決定する。識別部−Ｍ１２１Ｂは、その代表特徴値を基に基準ＤＢを参照して性別及び年齢を識別することができる。また組合演算部−Ｍ１２１Ａは、男性グループに区分された音声情報の入力を受けて組合演算を行うので、前述のように性別特徴抽出部１１１及び年齢別特徴抽出部−Ｍ１１２から抽出された特徴値の入力を受けることができる。 More specifically, the voice calculation unit-M121 includes a combination calculation unit-M121A and an identification unit-M121B. The combination calculation unit-M121A determines a representative feature value by assigning a weight value to one or more feature values extracted from the speech information classified into the male group (M). The identification unit-M121B can identify gender and age with reference to the standard DB based on the representative feature value. Further, since the union calculation unit-M121A receives the input of the voice information divided into male groups and performs the union calculation, the feature values extracted from the gender feature extraction unit 111 and the age-specific feature extraction unit-M112 as described above Can be input.

同様に音声演算部-Ｆ１２２は、女性グループ（Ｆ）に区分された音声情報から抽出された一つ以上の特徴値に対し加重値を付与して代表特徴値を決定する組合演算部−Ｆ１２２Ａと、その代表特徴値を基に基準ＤＢを参照して性別及び年齢を識別する識別部−Ｆ１２２Ｂと、を含む。前述のように、組合演算部−Ｆ１２２Ａは、性別特徴抽出部１１１、年齢別特徴抽出部−ＦＣ１１３及び年齢別特徴抽出部−Ｆ１１４から抽出された特徴値の入力を受けることができる。 Similarly, the voice calculation unit-F122 is a combination calculation unit-F122A that determines a representative feature value by assigning a weight to one or more feature values extracted from the voice information divided into the female group (F). , And an identification unit -F122B that identifies gender and age with reference to the standard DB based on the representative feature value. As described above, the combination calculation unit-F122A can receive the feature values extracted from the gender feature extraction unit 111, the age feature extraction unit-FC113, and the age feature extraction unit-F114.

また音声演算部−Ｃ１２３は、子供グループ（Ｃ）に区分された音声情報から抽出された一つ以上の特徴値に対し加重値を付与して代表特徴値を決定する組合演算部−Ｃ１２３Ａと、その代表特徴値を基に基準ＤＢを参照して性別及び年齢を識別する識別部−Ｃ１２３Ｂと、を含む。また前述のように、組合演算部−Ｃ１２３Ａは、性別特徴抽出部１１１、年齢別特徴抽出部−ＦＣ１１３及び年齢別特徴抽出部−Ｃ１１５から抽出された特徴値の入力を受けることができる。 The voice calculation unit -C123 includes a combination calculation unit -C123A that determines a representative feature value by assigning a weight value to one or more feature values extracted from the voice information divided into child groups (C); An identification unit -C123B that identifies gender and age with reference to the standard DB based on the representative feature value. Further, as described above, the combination calculation unit-C123A can receive the feature values extracted from the gender feature extraction unit 111, the age feature extraction unit-FC113, and the age feature extraction unit-C115.

このような年齢及び性別を識別するために、ＧＭＭ（ＧａｕＳＳｉａｎＭｉｘｔｕｒｅＭｏｄｅｌ）、ＮＮ（ＮｅｕｒａｌＮｅｔｗｏｒｋ）、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）などのアルゴリズムを用いて年齢及び性別を識別することができる。しかし、前述したアルゴリズムは例示的なものに過ぎず、前述したアルゴリズム以外にも様々なアルゴリズムを用いて特徴値から年齢及び性別を識別できることは言うまでもない。 In order to identify such age and sex, the age and sex can be identified using algorithms such as GMM (GauSSian Mixture Model), NN (Neural Network), SVM (Support Vector Machine) and the like. However, the algorithm described above is merely an example, and it goes without saying that the age and sex can be identified from the feature values using various algorithms other than the algorithm described above.

例えば、ＧＭＭのアルゴリズムを用いる場合、各組合演算部１２１Ａ、１２２Ａ、１２３Ａは、特徴識別方法の数Ｎまたは複数個のサンプルの数Ｎ個に対応してＮ個の尤度値（ｌｉｋｅｌｉｈｏｏｄ）を計算し、かかるＮ個の尤度値で代表値を決定することができる。代表値を決定するために、組合演算部１２１Ａ、１２２Ａ、１２３ＡはＮ個尤度値の平均値を求めるか、最大値を求めるか、最小値を求めるか、全体値を合算して代表値を決定することができる。 For example, when the GMM algorithm is used, each combination calculation unit 121A, 122A, 123A calculates N likelihood values (likelihood) corresponding to the number N of feature identification methods or the number N of a plurality of samples. Then, the representative value can be determined by such N likelihood values. In order to determine the representative value, the combination calculation units 121A, 122A, and 123A determine the average value of the N likelihood values, the maximum value, the minimum value, or the total value to obtain the representative value. Can be determined.

また組合演算部１２１Ａ、１２２Ａ、１２３Ａは、代表特徴値の識別において加重値を付与して代表特徴値を識別することができる。このような加重値は、場合によって設定されるかまたは経験的に蓄積された情報を用いて設定することができる。例えば、騷音の発生が頻繁な環境においては、騷音帯域に該当する特徴値部分は加重値を低く設定し、一般的な音声帯域のうち中間程度の帯域に該当する特徴値部分は高い加重値を付与することができる。また各組合演算部１２１Ａ、１２２Ａ、１２３Ａは、前述した各グループ（男性、女性、子供）に対して音声的特徴を反映して各々異なる加重値を付与して代表特徴値を決定することができる。 In addition, the combination calculation units 121A, 122A, and 123A can identify the representative feature value by assigning a weight value in identifying the representative feature value. Such weights can be set by case or using information accumulated empirically. For example, in an environment where stuttering frequently occurs, the feature value portion corresponding to the stuttering band is set to a low weight value, and the feature value portion corresponding to the middle band of the general voice band is highly weighted. A value can be assigned. Each union operation unit 121A, 122A, 123A can determine representative feature values by assigning different weights to the above-described groups (male, female, child) and reflecting voice features. .

以上では音声情報を男性グループ、女性グループ及び子供グループに分けて説明したが、音声情報から抽出した特徴値を特定グループに区分することが難しい場合には、抽出した音声情報を各グループに重複適用することが好ましい。即ち、グループに区分することが難しい音声情報の場合には、該当する各グループに対して演算を各々適用した後、各識別部の結果同士の類似度や正常識別確率、信頼度などを考慮して最終的に最終識別部３００によって年齢及び性別を決定する。 In the above, audio information was divided into male groups, female groups, and child groups. However, if it is difficult to classify feature values extracted from audio information into specific groups, the extracted audio information is applied to each group in duplicate. It is preferable to do. In other words, in the case of speech information that is difficult to classify into groups, after applying the calculation to each corresponding group, the similarity between the results of each identification unit, the normal identification probability, the reliability, etc. are considered. Finally, the final identification unit 300 determines the age and sex.

図３は、図１による映像処理部の詳細構成図である。 FIG. 3 is a detailed block diagram of the video processing unit shown in FIG.

図３に示すように、映像処理部２００は、映像情報から特徴値を抽出する映像特徴抽出部２１０及びその抽出された特徴値から性別及び年齢を演算する映像演算部２２０を含んで構成される。 As shown in FIG. 3, the video processing unit 200 includes a video feature extraction unit 210 that extracts feature values from video information, and a video calculation unit 220 that calculates gender and age from the extracted feature values. .

映像特徴抽出部２１０は映像情報の入力を受けて特徴値を抽出することができる。このような映像特徴抽出部２１０は、さらに年齢別特徴抽出部２１１、年齢別特徴抽出部−Ｃ２１２、性別特徴抽出部−Ｃ２１３、性別特徴抽出部−Ａ２１４、年齢別特徴抽出部−Ｍ２１５及び年齢別特徴抽出部−Ｆ２１６を含むことができる。 The video feature extraction unit 210 can extract feature values in response to input of video information. The video feature extraction unit 210 further includes an age-specific feature extraction unit 211, an age-specific feature extraction unit-C212, a sex-specific feature extraction unit-C213, a sex-specific feature extraction unit-A214, an age-specific feature extraction unit-M215, and an age-specific feature. A feature extraction unit-F216 may be included.

年齢別特徴抽出部２１１は、入力された顔情報に対し年齢別特徴を反映して特徴値を抽出し、抽出された特徴値を基準に入力された顔情報を大人グループ（Ａ）または子供グループ（Ｃ）に区分する。例えば、顔情報の場合、顔の大きさと目の大きさとの比例、目元のシワの有無などを基に大人と子供を区分することが容易である。また年齢別特徴抽出部２１１は、前記のような年齢別特徴を反映して入力された顔情報に対して特徴値を抽出することができる。 The age-specific feature extraction unit 211 extracts feature values by reflecting the age-specific features with respect to the input face information, and the input face information based on the extracted feature values is converted into an adult group (A) or a child group. Classify into (C). For example, in the case of face information, it is easy to distinguish adults and children based on the proportionality between the size of the face and the size of the eyes, the presence or absence of wrinkles at the eyes, and the like. The age-specific feature extraction unit 211 can extract a feature value for face information input reflecting the above-mentioned age-specific features.

年齢別特徴抽出部２１１によって子供グループ（Ｃ）に区分された顔情報に対して、年齢別特徴抽出部−Ｃ２１２は子供の年齢別特徴を反映して特徴値を抽出し、性別特徴抽出部−Ｃ２１３は子供の性別特徴を反映して特徴値を抽出する。 For face information classified into child groups (C) by the age-specific feature extraction unit 211, an age-specific feature extraction unit -C212 extracts feature values reflecting the child's age-specific features, and a gender feature extraction unit- C213 extracts a feature value reflecting the sex characteristics of the child.

性別特徴抽出部−Ａ２１４は、年齢別特徴抽出部２１１によって大人グループ（Ａ）に区分された顔情報に対して、大人の性別特徴を反映して特徴値を抽出することができる。その抽出された特徴値を基準に、入力された顔情報を男性グループ（Ｍ）と女性グループ（Ｆ）とに区分する。 The gender feature extraction unit-A 214 can extract a feature value by reflecting the gender feature of the adult in the face information classified into the adult group (A) by the age-specific feature extraction unit 211. Based on the extracted feature value, the input face information is divided into a male group (M) and a female group (F).

年齢別特徴抽出部−Ｍ２１５は、性別特徴抽出部−Ａ２１４によって男性グループ（Ｍ）に区分された顔情報に対して、男性の年齢別特徴を反映して特徴値を抽出する。また年齢別特徴抽出部−Ｆ２１６は、性別特徴抽出部−Ａ２１４によって女性グループ（Ｆ）に区分された顔情報に対して、女性の年齢別特徴を反映して特徴値を抽出する。 The age-specific feature extraction unit-M215 extracts a feature value by reflecting the male age-specific feature in the face information classified into the male group (M) by the gender feature extraction unit-A214. The age-specific feature extraction unit-F216 extracts feature values reflecting the female age-specific features in the face information divided into the female group (F) by the gender feature extraction unit-A214.

映像演算部２２０は、前記のように映像特徴抽出部２１０によって抽出された特徴値を用いて映像情報から性別及び年齢を識別する。 The video calculation unit 220 identifies gender and age from the video information using the feature values extracted by the video feature extraction unit 210 as described above.

即ち、映像演算部２２０は、映像特徴抽出部２１０から抽出された一つ以上の特徴値に対し加重値を反映して代表特徴値を決定する組合演算部及び代表特徴値を基に基準ＤＢを参照して性別及び年齢を識別する識別部から構成される。 That is, the video calculation unit 220 reflects the weight value on one or more feature values extracted from the video feature extraction unit 210 to determine a representative feature value and a reference DB based on the representative feature value. It consists of an identification unit that identifies gender and age with reference.

また映像演算部２２０は、図３に示すように、前述のように映像特徴抽出部２１０によってグループ化された男性グループ（Ｍ）、女性グループ（Ｆ）及び子供グループ（Ｃ）に対して各々最適化された組合演算部及び識別部を有するように構成される。即ち、映像演算部２２０は、男性グループに区分された映像情報から抽出された特徴値の入力を受けて年齢及び性別を演算する映像演算部−Ｍ２２１と、女性グループに区分された音声情報から抽出された特徴値の入力を受けて年齢及び性別を演算する映像演算部−Ｆ２２２と、子供グループに区分された音声情報から抽出された特徴値の入力を受けて年齢及び性別を演算する映像演算部−Ｃ２２３と、から構成される。 Further, as shown in FIG. 3, the video calculation unit 220 is optimal for each of the male group (M), the female group (F), and the child group (C) grouped by the video feature extraction unit 210 as described above. The combination operation unit and the identification unit are configured. That is, the video calculation unit 220 receives the input of the feature value extracted from the video information classified into the male group, and extracts the video calculation unit-M221 that calculates the age and sex, and the audio information classified into the female group. -F222 for calculating the age and sex by receiving the input of the feature value, and the video calculation unit for calculating the age and sex by receiving the input of the feature value extracted from the audio information divided into the child groups -C223.

映像演算部−Ｍ２２１は、男性グループ（Ｍ）に区分された顔情報から抽出された一つ以上の特徴値の入力を受けて代表特徴値を決定する組合演算部−Ｍ２２１Ａと、その代表特徴値を基に基準ＤＢを参照して性別及び年齢を識別する識別部−Ｍ２２１Ｂと、を含むことができる。組合演算部−Ｍ２２１Ａは男性グループ（Ｍ）に区分された顔情報から抽出された特徴値の入力を受けるので、年齢別特徴抽出部２１１、性別特徴抽出部−Ａ２１４及び年齢別特徴抽出部−Ｍ２１５から抽出された特徴値の入力を受けて代表特徴値を識別することができる。 The image calculation unit-M221 receives a combination of one or more feature values extracted from the face information divided into the male group (M) and determines a representative feature value, and the representative feature value. An identification unit -M221B that identifies gender and age with reference to the standard DB based on the standard DB. Since the combination calculation unit-M221A receives input of feature values extracted from face information divided into male groups (M), the age-specific feature extraction unit 211, the gender feature extraction unit-A214, and the age-specific feature extraction unit-M215 The representative feature value can be identified by receiving the input of the feature value extracted from.

映像演算部−Ｆ２２２は、女性グループ（Ｆ）に区分された顔情報から抽出された特徴値の入力を受けて代表特徴値を決定する組合演算部−Ｆ２２２Ａと、その代表特徴値と基準ＤＢを用いて性別及び年齢を識別する識別部−Ｆ２２２Ｂと、を含むことができる。 The video calculation unit-F222 receives the input of feature values extracted from the face information divided into the female group (F) and determines the representative feature value, the combination calculation unit-F222A, the representative feature value, and the reference DB And an identification unit-F222B for identifying gender and age.

この場合、組合演算部−Ｆ２２２Ａは、女性グループ（Ｆ）に区分された顔情報を対象とするので、年齢別特徴抽出部２１１、性別特徴抽出部−Ａ２１４及び年齢別特徴抽出部−Ｆ２１６から抽出された特徴値の入力を受けて代表特徴値を識別することができる。 In this case, since the combination calculation unit-F222A targets the face information divided into the female group (F), it is extracted from the age-specific feature extraction unit 211, the gender feature extraction unit-A214, and the age-specific feature extraction unit-F216. The representative feature value can be identified by receiving the inputted feature value.

映像演算部−Ｃ２２３は、子供グループ（Ｃ）に区分された顔情報から抽出された特徴値の入力を受けて代表特徴値を決定する組合演算部−Ｃ２２３Ａと、その代表特徴値と基準ＤＢを用いて性別及び年齢を識別する識別部−Ｃ２２３Ｂと、を含むことができる。この場合、組合演算部−Ｃ２２３Ａは、子供グループ（Ｃ）に区分された顔情報から抽出された特徴値の入力を受けるので、年齢別特徴抽出部２１１、年齢別特徴抽出部−Ｃ２１２及び性別特徴抽出部−Ｃ２１３から抽出された特徴値の入力を受けて代表特徴値を識別することができる。 The video calculation unit-C223 receives the input of feature values extracted from the face information divided into the child group (C) and determines the representative feature value, and the combination calculation unit-C223A, the representative feature value and the reference DB And an identification unit -C223B for identifying gender and age. In this case, the combination calculation unit -C223A receives the input of the feature value extracted from the face information divided into the child group (C), so the age-specific feature extraction unit 211, the age-specific feature extraction unit -C212, and the gender feature The representative feature value can be identified by receiving the input of the feature value extracted from the extraction unit-C213.

各識別部２２１Ｂ、２２２Ｂ、２２３Ｂは、前述した各組合演算部２２１Ａ、２２２Ａ、２２３Ａから代表特徴値の入力を受けて基準ＤＢを参照して性別及び年齢を演算することができる。これに対する具体的な説明は音声演算部１２０を参照して前述したものと類似するため、更なる詳細な説明は省略する。 Each of the identification units 221B, 222B, and 223B can calculate gender and age with reference to the reference DB by receiving the representative feature values from the combination calculation units 221A, 222A, and 223A described above. Since the specific description thereof is similar to that described above with reference to the voice calculation unit 120, further detailed description thereof is omitted.

また、このような映像処理部２００を用いて年齢及び性別を演算する場合にも、前述のように、顔情報が男性グループ（Ｍ）、女性グループ（Ｆ）及び子供グループ（Ｃ）のうち何れか一つのグループに区分することが難しい場合には、その顔情報を各グループに重複適用することができる。 In addition, when calculating age and gender using such a video processing unit 200, as described above, the face information is any of the male group (M), the female group (F), and the child group (C). When it is difficult to divide into one group, the face information can be applied to each group.

以下に、最終識別部３００について詳しく説明する。 Hereinafter, the final identification unit 300 will be described in detail.

最終識別部３００では、識別部１２１Ｂ、１２２Ｂ、１２３Ｂ、２２１Ｂ、２２２Ｂ、２２３Ｂのうち一部または全部から出力された性別及び年齢の入力を受け、その入力を受けた性別及び年齢を組み合わせ演算を行って最終的な性別及び年齢を識別することができる。 The final identification unit 300 receives gender and age input from some or all of the identification units 121B, 122B, 123B, 221B, 222B, and 223B, and performs a combination operation on the received gender and age. To identify the final gender and age.

即ち、入力を受けた複数個の性別及び年齢に対して各々相互類似度を計算し、相互類似度が最も高い性別及び年齢を最終性別及び年齢に決定することができる。または、入力を受けた複数個の性別及び年齢に対する正常識別確率や、信頼度指数を識別する毎に把握して保存しておき、これを用いて最終性別及び年齢を決定することができる。 That is, it is possible to calculate the mutual similarity for each of the plurality of sexes and ages received, and determine the sex and age having the highest mutual similarity as the final sex and age. Alternatively, it is possible to grasp and store normal identification probabilities for a plurality of genders and ages that have been input and the reliability index every time they are identified, and use them to determine the final gender and age.

このような最終識別部３００は、音声処理部１００から出力された性別及び年齢に対して相互類似度を用いて性別及び年齢を識別し、また映像処理部２００から出力された性別及び年齢に対して相互類似度を用いて性別及び年齢を識別した後、二つの識別された性別及び年齢を用いて最終的な性別及び年齢を識別し出力するように実施することができる。 The final identification unit 300 identifies the gender and the age using the mutual similarity with respect to the gender and the age output from the audio processing unit 100, and the gender and the age output from the video processing unit 200. After identifying the gender and age using the mutual similarity, the final gender and age can be identified and output using the two identified genders and ages.

または、最終識別部３００は、音声処理部１００及び映像処理部２００から出力された性別及び年齢の識別結果全体に対して相互類似度を用いて最終的な性別及び年齢を識別し出力するように実施することもできる。 Alternatively, the final identification unit 300 may identify and output the final gender and age using the mutual similarity with respect to the entire sex and age identification results output from the audio processing unit 100 and the video processing unit 200. It can also be implemented.

以下では基準ＤＢについて詳しく説明する。 Hereinafter, the reference DB will be described in detail.

基準ＤＢは、性別及び年齢別基準特徴値または音声及び映像基準サンプルを保存しており、顔情報または音声情報から抽出された特徴値と、前記特徴値に対する性別及び年齢の関係モデルと、から構成される。 The reference DB stores gender and age-specific reference feature values or audio and video reference samples, and includes a feature value extracted from face information or audio information, and a relationship model of sex and age with respect to the feature value. Is done.

このような基準ＤＢに保存された特徴値−性別及び年齢対応関係を用いて、音声演算部１２０または映像演算部２２０は、前述した代表特徴値を基に基準ＤＢを参照して性別及び年齢を獲得することができる。例えば、識別部は、代表特徴値と基準ＤＢの関係モデルとの間の距離値を用いて性別及び年齢を識別することができる。 Using the feature value-gender and age correspondence stored in the reference DB, the voice calculation unit 120 or the video calculation unit 220 refers to the reference DB based on the representative feature value described above to determine the gender and age. Can be earned. For example, the identification unit can identify gender and age using the distance value between the representative feature value and the relationship model of the reference DB.

また基準ＤＢは、特徴値を円滑に抽出し難い場合などにおいて、映像または音声情報を直接用いて性別及び年齢を識別できるように、映像データ及び音声データとそれに対応する性別と年齢を含んで構成される。 In addition, the reference DB includes video data and audio data, and corresponding gender and age so that the gender and age can be identified directly using video or audio information when it is difficult to extract feature values smoothly. Is done.

基準ＤＢに含まれた映像データは、例えば、カメラと人を各々０．５ｍ、１ｍ、３ｍの距離だけ離隔させて獲得することができる。この時、隔離距離が３ｍである場合は、人の全身が全て含まれるように撮る。このような映像データは１０秒間１００ｆｒａｍｅになるように撮影することができる。前記のように撮影された映像に対して顔検出器、身長検出器、目検出器などを用いて各々の被写体である人の顔、髪型、髭、眉毛の形などを取得して詳細ＤＢを構成することができる。このように構成された詳細ＤＢを用いて特徴値を識別するように本発明を実施することができる。 The video data included in the reference DB can be acquired, for example, by separating a camera and a person by a distance of 0.5 m, 1 m, and 3 m, respectively. At this time, if the separation distance is 3 m, the image is taken so that the whole body of the person is included. Such video data can be photographed to be 100 frames for 10 seconds. Use the face detector, height detector, eye detector, etc. for the images shot as described above to acquire the face, hairstyle, eyelid, eyebrow shape, etc. of each human subject, and use the detailed DB. Can be configured. The present invention can be implemented so as to identify feature values using the detailed DB configured as described above.

基準ＤＢに含まれた音声データの場合には、例えば予め用意した５０個の文章を３回繰り返し発声して得ることができる。このような音声データは１６ｋＨｚ、１６ｂｉｔ、ｍｏｎｏタイプなどの様々な形態を有することができる。 In the case of speech data included in the reference DB, for example, 50 sentences prepared in advance can be obtained by uttering three times repeatedly. Such audio data can have various forms such as 16 kHz, 16 bit, and mono type.

かかる基準ＤＢは標本性を備えるために、例えば１２０名を対象にデータを構成することができる。この時、全体男性女性比は１：１になるようにし、各年齢帯に対する比率も１：１になるように構成することができる。 Since such a reference DB has a sample property, for example, data can be configured for 120 persons. At this time, the overall male / female ratio can be 1: 1, and the ratio to each age zone can also be 1: 1.

基準ＤＢは学習能力を保有しており、本発明の実施の形態によって性別−年齢に対する演算が行われると、演算の結果値（演算の代表特徴値と最終的な性別及び年齢）を現在構成しているデータに反映してＤＢを再構成（更新）して信頼度を持続的に向上できるようにすることが好ましい。もちろんＤＢ更新に活用される結果値は信頼性が確認された結果値でなければならないのは言うまでもない。 The reference DB has learning ability, and when the calculation for gender-age is performed according to the embodiment of the present invention, the result value of the calculation (the representative characteristic value of the calculation and the final gender and age) is currently configured. It is preferable to reconstruct (update) the DB by reflecting it in the data so that the reliability can be continuously improved. Of course, it goes without saying that the result value used for DB update must be a result value for which reliability has been confirmed.

図４は、本発明による性別−年齢識別方法のフローチャートである。 FIG. 4 is a flowchart of the gender-age identification method according to the present invention.

入力部１０は、性別及び年齢を識別しようとする特定人の顔情報及び音声情報を収集する（Ｓ１００）。 The input unit 10 collects face information and voice information of a specific person who wants to identify gender and age (S100).

収集された音声情報から音声処理部１００が年齢別特徴及び性別特徴を反映して特徴値を抽出し、抽出された一つ以上の特徴値に対して代表特徴値を識別する。そしてその代表特徴値を基準ＤＢに問合せて性別及び年齢を識別する（Ｓ２００）。 The speech processing unit 100 extracts feature values from the collected speech information by reflecting the age-specific features and the gender features, and identifies representative feature values for the extracted one or more feature values. Then, the representative feature value is inquired of the reference DB to identify the sex and age (S200).

それと共に、映像処理部２００が顔情報に対して年齢別特徴及び性別特徴を反映して特徴値を抽出し、抽出された一つ以上の特徴値に対して代表特徴値を識別する。そして前記代表特徴値を基準ＤＢに問合せて性別及び年齢を識別する（Ｓ３００）。 At the same time, the video processing unit 200 extracts the feature value by reflecting the age-specific feature and the gender feature from the face information, and identifies the representative feature value for the extracted one or more feature values. Then, the representative feature value is inquired of the reference DB to identify gender and age (S300).

最終識別部３００は、ステップＳ２００及びステップＳ３００によって識別された少なくとも一つの性別及び年齢に対して相互類似度または確率を考慮して最終的に性別及び年齢を識別する（Ｓ４００）。 The final identification unit 300 finally identifies gender and age in consideration of the mutual similarity or probability with respect to at least one gender and age identified in steps S200 and S300 (S400).

以下では、図５を参照して図４の音声から性別と年齢を識別するステップ（Ｓ２００）について詳しく説明する。 Hereinafter, with reference to FIG. 5, the step of identifying gender and age from the voice of FIG. 4 (S200) will be described in detail.

一般に女性の音声情報と子供の音声情報は類似しているので区別し難いが、女性及び子供の音声情報と男性の音声情報とは区別が容易である点に着目して、音声信号に対し性別特徴を優先的に反映して特徴値を抽出し男性と女性及び子供グループとを分類する（Ｓ２１０）。 In general, it is difficult to distinguish female and child's voice information because they are similar to each other. However, it is easy to distinguish between female and child's voice information and male's voice information. The feature value is preferentially reflected and the feature value is extracted to classify the male, female and child groups (S210).

このように、音声情報に対して性別特徴を優先的に反映することは、音声情報では性別特徴による差が大きいことを利用したものであり、これにより演算を迅速且つ效率良く行うことができるようになる。 As described above, the preferential reflection of the gender feature on the voice information is based on the fact that the voice information has a large difference depending on the gender feature, so that the calculation can be performed quickly and efficiently. become.

分類結果によって、入力された音声情報を男性グループまたは女性及び子供グループに区別し、男性グループに分類された音声情報に対して男性の年齢別特徴を反映した一つ以上の年齢別特徴値を抽出する（Ｓ２２０）。 Based on the classification results, the input voice information is classified into male groups or female and child groups, and one or more age-specific feature values reflecting male age characteristics are extracted from the voice information classified into male groups. (S220).

また、女性及び子供グループに分類された音声情報に対しては、音声情報が女性グループであるかまたは子供グループであるかを区別できるように、女性及び子供の年齢別特徴を反映した年齢別特徴値を抽出し、女性と子供とを区別する（Ｓ２３０）。 In addition, for voice information classified into female and child groups, age-specific features that reflect the age-specific features of women and children so that the voice information can be distinguished from female or child groups. A value is extracted and a woman and a child are distinguished (S230).

その後、女性グループに区別された音声情報に対して女性の年齢別特徴を反映した年齢別特徴抽出を行う（Ｓ２４０）。 After that, age-specific feature extraction that reflects the age-specific features of women is performed on the voice information that is classified into the female groups (S240).

また、子供グループに区別された音声情報に対しては子供の性別及び年齢別特徴抽出を行う（Ｓ２５０）。 In addition, the child's sex and age-specific features are extracted from the voice information distinguished by the child group (S250).

このように抽出された特徴値のうち音声情報に対する代表特徴値を決定し、対象者の性別及び年齢を識別する。 Among the extracted feature values, a representative feature value for the voice information is determined, and the sex and age of the target person are identified.

例えば、音声演算部１２０が音声特徴抽出部１１０によって抽出された一つ以上の特徴値に対して代表特徴値を決定し、決定された代表特徴値を基に基準ＤＢを用いて性別及び年齢を識別できる。代表特徴値の決定または性別及び年齢の識別は、前述のように男性グループ、女性グループ及び子供グループ別に各々行われることが好ましい。 For example, the voice calculation unit 120 determines a representative feature value for one or more feature values extracted by the voice feature extraction unit 110, and uses the reference DB to determine the gender and age based on the determined representative feature value. Can be identified. The determination of the representative feature value or the identification of the sex and the age is preferably performed for each of the male group, the female group, and the child group as described above.

即ち、男性グループに区分された音声情報の特徴値に対して音声演算−Ｍを行うか（Ｓ２２５）、女性グループに区分された音声情報の特徴値に対して音声演算−Ｆを行うか（Ｓ２４５）、子供グループに区分された音声情報の特徴値に対して音声演算−Ｃを行って（Ｓ２５５）性別及び年齢を識別することができる。 That is, whether voice calculation-M is performed on feature values of voice information classified into male groups (S225) or whether voice calculation-F is performed on feature values of voice information classified into female groups (S245). ), Voice calculation-C is performed on the feature values of the voice information divided into child groups (S255), and the sex and age can be identified.

前述のように区別が容易な特徴（例えば、音声情報は性別による特徴）を基準に一次的に音声情報をグループ化し、前記基準によって区別された各グループに対し各グループ別特徴を反映して特徴値を抽出する方法を用いることは本発明の大きな特徴の一つである。前記のような段階的な抽出方法を用いることによって、本発明は識別の正確性を確保することができ、また演算の重複性を排除して対象者の年齢及び性別を迅速に識別することができる。 As described above, voice information is primarily grouped on the basis of features that can be easily distinguished (for example, voice information is a feature by gender), and features that reflect the characteristics of each group for each group distinguished by the criteria. The use of a method for extracting values is one of the major features of the present invention. By using the stepwise extraction method as described above, the present invention can ensure the accuracy of identification, and can quickly identify the age and gender of the subject by eliminating the duplication of computation. it can.

以下では、図６を参照して図４の映像によって性別と年齢を識別するステップ（Ｓ３００）について詳しく説明する。 Hereinafter, with reference to FIG. 6, the step of identifying the gender and age from the video of FIG. 4 (S300) will be described in detail.

映像情報の場合には、一般的に大人と子供を区別することが容易である。例えば、身長のような生体情報を用いるか、顔の大きさと耳、目、口、鼻の大きさとの相対的な比率などを用いて大人と子供を容易に区別することができる。 In the case of video information, it is generally easy to distinguish adults from children. For example, an adult and a child can be easily distinguished by using biological information such as height or using a relative ratio between the size of the face and the size of the ears, eyes, mouth, and nose.

このような点を用いて、本発明の映像類似度識別ステップでは、一次的に入力を受けた映像情報（顔情報または顔情報を含む映像情報。以下、「顔情報」と称する）に対して前記の年齢別特徴を考慮した特徴値抽出を行う（Ｓ３１０）。前記のようなステップによって入力された顔情報は、子供グループと大人グループに容易に区分されることができる。 By using such points, in the video similarity identification step of the present invention, for the video information (video information including face information or face information; hereinafter referred to as “face information”) that is primarily input. Feature value extraction is performed in consideration of the age-specific features (S310). The face information input through the above steps can be easily divided into a child group and an adult group.

その後、子供グループに区分された顔情報に対して子供の年齢別特徴を考慮した年齢別特徴を抽出し（Ｓ３２０）、子供の性別特徴を考慮した性別特徴抽出を行う（Ｓ３３０）。 Then, age-specific features are extracted from the face information divided into child groups in consideration of the age-specific features of the children (S320), and gender feature extraction is performed in consideration of the child's gender features (S330).

大人グループに区分された顔情報に対し、大人の性別特徴を考慮した性別特徴抽出を行って大人グループの顔情報を男性グループまたは女性グループに区別する（Ｓ３４０）。 The face information classified into the adult groups is subjected to sex feature extraction considering the sex characteristics of the adults to distinguish the face information of the adult groups into male groups or female groups (S340).

その後、男性グループに区分された顔情報に対し男性の年齢別特徴を考慮した特徴抽出方法を用いて一つ以上の特徴値を抽出する（Ｓ３５０）。女性グループに区分された顔情報に対しては、女性の年齢別特徴を考慮して一つ以上の特徴値を抽出する（Ｓ３６０）。 Thereafter, one or more feature values are extracted from the face information classified into male groups using a feature extraction method that takes into account the male age-specific features (S350). For face information classified into female groups, one or more feature values are extracted in consideration of the age-specific features of women (S360).

映像演算部２２０は、前述のように映像特徴抽出部２１０によって抽出された特徴値に対し加重値を反映して代表特徴値を決定し、その決定された代表特徴値及び基準ＤＢを用いて性別及び年齢を識別する。このような映像情報による性別及び年齢識別は、図６に示すように、子供グループ、男性グループ及び女性グループ別に各々行われることが好ましい（Ｓ３２５、Ｓ３５５、Ｓ３６５）。 The video calculation unit 220 determines a representative feature value by reflecting the weight value with respect to the feature value extracted by the video feature extraction unit 210 as described above, and uses the determined representative feature value and the reference DB to determine the gender. And identify the age. Such sex and age identification based on video information is preferably performed for each of a child group, a male group, and a female group, as shown in FIG. 6 (S325, S355, S365).

以上、本発明について添付図面を参照して詳しく説明したが、これは例示したものに過ぎず、本発明の技術的な思想の範囲内で様々な変形と変更が可能であることは自明である。従って、本発明の保護範囲は、前述した実施の形態に限定されてはならず、添付した特許請求範囲の記載による範囲及びそれと均等な範囲を含んで決定されなければならない。 The present invention has been described in detail with reference to the accompanying drawings. However, this is merely an example, and it is obvious that various modifications and changes can be made within the scope of the technical idea of the present invention. . Therefore, the protection scope of the present invention should not be limited to the above-described embodiments, but should be determined including the scope described in the appended claims and a scope equivalent thereto.

１０入力部
２０年齢−性別演算部
３０出力部
１００音声処理部
２００映像処理部
３００最終識別部 DESCRIPTION OF SYMBOLS 10 Input part 20 Age-sex calculating part 30 Output part 100 Audio | voice processing part 200 Image | video processing part 300 Final identification part

Claims

Collecting video information and audio information;
Extracting one or more feature values from the collected voice information, and using the extracted feature values to identify gender and age;
Extracting one or more feature values for the collected video information, and using the extracted feature values to identify gender and age;
Finally determining the gender and age by performing a combination operation of the gender and age identified using the voice information and the gender and age identified using the face information;
A gender-age identification method comprising:

The gender and age identification step using the audio information includes:
A first gender feature extraction step of extracting a feature value reflecting the gender feature of the voice with respect to the input voice information;
A first age-specific feature extraction step for extracting feature values reflecting the male age-specific features for the voice information divided into male groups by the first gender feature extraction step;
A second age-specific feature extracting step of extracting feature values reflecting the age-specific features of women and children with respect to the voice information divided into the female and child groups by the first sex-feature extracting step;
The gender-age identification method according to claim 1, further comprising:

The gender and age identification step using the audio information includes:
A third age-specific feature extraction step for extracting a feature value reflecting the female age-specific feature for the voice information classified into the female group by the second age-specific feature extracting step;
A second sex feature extraction step for extracting a feature value reflecting the sex feature of the child with respect to the voice information divided into child groups by the second age feature extraction step;
The gender-age identification method according to claim 2, further comprising:

The gender-age identification method according to claim 2 or 3, wherein the feature values are extracted from M samples by applying different N feature value identification methods.

The gender and age identification step using the audio information includes:
A representative feature value determining step of determining a representative feature value by reflecting a weight value with respect to the one or more extracted feature values;
An identification step of identifying the gender and age with reference to a standard DB based on the representative feature value;
The gender-age identification method according to claim 1, further comprising:

The gender-age identification method according to claim 5, wherein the representative feature value determination step and the identification step are performed for each of a male group, a female group, and a child group.

The representative feature value determining step includes a step of determining any one of an average value, a maximum value, a minimum value, and a sum value of feature values reflecting one or more of the weight values as the representative feature value. The gender-age identification method according to claim 5, further comprising:

Gender and age identification step using the video information,
A first feature extracting step of extracting a feature value reflecting the age-specific feature with respect to the collected video information;
A second feature extracting step of distinguishing between an adult and a child according to a result of the first feature extracting step, and then classifying the group into a group of men, women and children and extracting one or more feature values for each group;
The gender-age identification method according to claim 1, further comprising:

Gender and age identification step using the video information,
A representative feature value determining step of determining a representative feature value by reflecting a weight value with respect to the one or more extracted feature values;
An identification step of identifying the gender and age using a reference DB based on the representative feature value,
The gender-age identification method according to claim 8, wherein the representative feature value determination step and the calculation step include being performed for each of a male group, a female group, and a child group.

The final step of gender and age is
Calculating a gender and age identified using at least one of the audio information and a gender and age identified using at least one of the video information;
Determining the gender and age with the highest mutual similarity as the final gender and age;
The gender-age identification method according to claim 1, further comprising:

10. The reference DB according to claim 5, wherein the reference DB includes gender and age characteristic values, and is continuously reconstructed to reflect characteristic values whose reliability is confirmed according to gender and age. Described gender-age identification method.

An input unit for collecting video information and audio information;
A voice processing unit that extracts a feature value for the collected voice information, and uses the extracted feature value to identify gender and age from the voice information;
Extracting a feature value for the collected video information, and using the extracted feature value, a video processing unit for identifying gender and age from the video information;
A final identification unit that finally determines the gender and age of the specific person by performing a combination operation of the sex and age identified by the video processing unit and the gender and age identified by the audio processing unit;
A gender-age identification device comprising:

The voice processing unit
A voice feature extraction unit that extracts a feature value by reflecting a gender feature or an age feature of the voice with respect to the collected voice information;
A representative feature value extracted from the feature value extracted from the voice feature extraction unit, and a voice calculation unit that identifies the gender and age using the representative feature value;
The gender-age identification apparatus according to claim 12, comprising:

The speech feature extraction unit may determine whether or not the collected speech is a male speech before extracting one or more feature values for each group of men, women, and children. Item 14. The gender-age identification device according to Item 13.

The gender-age identification apparatus according to claim 14, wherein the voice calculation unit determines a representative characteristic value for each of the male, female, and child groups, and identifies the gender and the age using the representative characteristic value.

The video processing unit
A video feature extraction unit that extracts a feature value by reflecting a gender feature or an age feature of the video with respect to the collected video information;
A video feature that extracts representative feature values from the feature values extracted from the video feature extractor, and identifies the sex and age using the representative feature values;
The gender-age identification apparatus according to claim 12, comprising:

The video feature extraction unit may determine whether the collected audio is an adult or a child, and then extract one or more feature values for each group of men, women, and children. Item 17. The gender-age identification device according to Item 16.

The gender-age identification apparatus according to claim 17, wherein the video calculation unit determines representative characteristic values for each of the male, female, and child groups, and identifies gender and age using the representative characteristic values.

The final identification unit calculates each mutual similarity for at least one age and sex identified by the audio processing unit or the video processing unit, and finally determines the gender and age having the highest mutual similarity. The gender-age identification apparatus according to claim 12.