JP6927495B2

JP6927495B2 - Person evaluation equipment, programs, and methods

Info

Publication number: JP6927495B2
Application number: JP2017237950A
Authority: JP
Inventors: 武士阿野
Original assignee: 株式会社テイクアンドシー; 株式会社カラーチップス
Priority date: 2017-12-12
Filing date: 2017-12-12
Publication date: 2021-09-01
Anticipated expiration: 2037-12-12
Also published as: JP2019105729A

Description

本発明は、人物を評価するために用いられる人物評価装置、並びに、そのような人物評価装置において用いられる人物評価プログラム及び人物評価方法等に関する。 The present invention relates to a person evaluation device used for evaluating a person, a person evaluation program and a person evaluation method used in such a person evaluation device, and the like.

例えば、企業が新たな社員を採用する際には、多数の応募者の人物評価を行って採用すべき応募者を選考するために、採用担当者が多大な労力と時間を費やしている。また、採用担当者によって評価基準が異なることもあり、応募者の人物評価のために客観的な評価基準を設けることが要望されている。そこで、応募者の人物評価を行う際に参考となる情報が応募者の音声又は動画像に基づいて得られれば、採用担当者の選考業務が効率化されると共に、評価基準の客観性を向上させることができる。 For example, when a company hires a new employee, the hiring manager spends a great deal of effort and time in order to evaluate the individuality of a large number of applicants and select the applicants to be hired. In addition, the evaluation criteria may differ depending on the hiring manager, and it is required to establish objective evaluation criteria for evaluating the applicant's personality. Therefore, if information that can be used as a reference when evaluating an applicant's person can be obtained based on the applicant's audio or moving image, the selection work of the hiring manager will be streamlined and the objectivity of the evaluation criteria will be improved. Can be made to.

一方、電話やインターネットを介して不特定多数の人とコミュニケーションをとる場合においても、コミュニケーションの相手が信頼できるか否かを判断できるツールが求められている。関連する技術として、特許文献１には、コミュニケーションの相手が信頼できるか否かを判断できる仕組みのないコミュニティーにおいて、利用者が安心してコミュニケーションを行うための人物評価装置が開示されている。 On the other hand, even when communicating with an unspecified number of people via telephone or the Internet, there is a demand for a tool that can judge whether or not the communication partner can be trusted. As a related technique, Patent Document 1 discloses a person evaluation device for users to communicate with peace of mind in a community without a mechanism for determining whether or not a communication partner can be trusted.

この人物評価装置は、統一された評価尺度に準じた評価内容ごとに、学習用データに含まれる特徴語と該特徴語に対するスコアとが対応付けられた辞書を構築する辞書構築手段と、学習用データを構成する複数のユーザーのデータ（例えば、記事）及び上記辞書を参照することにより、複数のユーザーの識別情報及び基礎値に基づくユーザー単位の評価を行う評価手段とを備えている。 This person evaluation device is a dictionary construction means for constructing a dictionary in which a feature word included in the learning data and a score for the feature word are associated with each evaluation content according to a unified evaluation scale, and for learning. By referring to the data (for example, articles) of a plurality of users constituting the data and the above dictionary, it is provided with an evaluation means for evaluating each user based on the identification information and the basic value of the plurality of users.

辞書には、統一された評価尺度を用いて、学習用データ中の特徴語と、その出現頻度に応じたスコアとが、評価内容ごとに登録されている。また、学習用データを構成するデータに対する評価は、上記辞書を参照しながら、ユーザーの識別情報に基づいて行われる。従って、特許文献１によれば、ユーザーが入力するデータが集まるインターネットコミュニティーの利用者に関して、データ単位のみならず人単位で高精度な評価を行うことができる。 In the dictionary, feature words in learning data and scores according to their frequency of appearance are registered for each evaluation content using a unified evaluation scale. Further, the evaluation of the data constituting the learning data is performed based on the user's identification information while referring to the above dictionary. Therefore, according to Patent Document 1, it is possible to perform highly accurate evaluation not only on a data-by-data basis but also on a person-by-person basis with respect to a user of the Internet community in which data input by the user is collected.

特開２００６−１９０１９６号公報（段落０００２−０００８、図１）Japanese Unexamined Patent Publication No. 2006-190196 (paragraph 0002-0008, FIG. 1)

特許文献１において人物評価を行うためには、複数のユーザーが作成した記事等が必要となる。しかしながら、例えば、ユーザーが作成した記事にユーザー本来の思想や思考が反映されていない場合や、ユーザーが記事を作成する際に他人の記事を盗用したような場合には、正確な人物評価を行うことができない。 In order to evaluate a person in Patent Document 1, articles and the like created by a plurality of users are required. However, for example, if the article created by the user does not reflect the user's original thoughts and thoughts, or if the user plagiarizes another person's article when creating the article, accurate person evaluation is performed. Can't.

そこで、上記の点に鑑み、本発明の第１の目的は、人物評価の対象となる被検者の音声に基づいて、あるいは、被検者の動画像及び音声に基づいて、被検者の人物評価を行う際に参考となる情報を提供できる人物評価装置を提供することである。さらに、本発明の第２の目的は、そのような人物評価装置において用いられる人物評価プログラム及び人物評価方法等を提供することである。 Therefore, in view of the above points, a first object of the present invention is to obtain a subject's voice based on the voice of the subject to be evaluated, or based on the moving image and voice of the subject. It is to provide a person evaluation device that can provide information that can be used as a reference when performing a person evaluation. A second object of the present invention is to provide a person evaluation program, a person evaluation method, and the like used in such a person evaluation device.

以上の課題の少なくとも一部を解決するため、本発明の第１の観点に係る人物評価装置は、被検者の音声を収録して得られる音声データを単位時間当りのデータブロック毎にフーリエ変換し、データブロック毎に複数の周波数帯域における音圧分布を表す声紋データを生成する音声処理部と、複数の周波数帯域における音圧の大きさ及び広がりに応じてデータブロックを分類し、所定数のデータブロックの分類結果に基づいて被検者の音声に関する評価を行う際に、いずれかの周波数領域において音圧が閾値を超えるデータブロックに対して、全ての周波数領域において音圧が閾値以下であるデータブロックの得点よりも高い得点を与え、音圧が閾値を超えて極大となる周波数帯域の数が所定の値を超えるデータブロックに対して、音圧が閾値を超えて極大となる周波数帯域の数が所定の値以下であるデータブロックの得点よりも高い得点を与えて、所定数のデータブロックの得点の合計値又は平均値に基づいて被検者の音声に関するランクを判定する音声解析部とを備える。 In order to solve at least a part of the above problems, the person evaluation device according to the first aspect of the present invention converts the voice data obtained by recording the voice of the subject into Fourier transform for each data block per unit time. Then, the voice processing unit that generates voiceprint data representing the sound pressure distribution in a plurality of frequency bands for each data block and the data blocks are classified according to the magnitude and spread of the sound pressure in the plurality of frequency bands, and a predetermined number of data blocks are classified. When evaluating the voice of a subject based on the classification result of the data block, the sound pressure is below the threshold in all frequency regions for the data block whose sound pressure exceeds the threshold in any frequency region. For a data block that gives a score higher than the score of the data block and the number of frequency bands where the sound pressure exceeds the threshold and reaches the maximum exceeds a predetermined value, the frequency band where the sound pressure exceeds the threshold and reaches the maximum With a voice analysis unit that gives a score higher than the score of the data block whose number is less than or equal to a predetermined value and determines the rank of the subject's voice based on the total value or the average value of the scores of the predetermined number of data blocks. To be equipped with.

また、本発明の第１の観点に係る人物評価プログラムは、被検者の音声を収録して得られる音声データを単位時間当りのデータブロック毎にフーリエ変換し、データブロック毎に複数の周波数帯域における音圧分布を表す声紋データを生成する手順（ａ）と、複数の周波数帯域における音圧の大きさ及び広がりに応じてデータブロックを分類し、所定数のデータブロックの分類結果に基づいて被検者の音声に関する評価を行う際に、いずれかの周波数領域において音圧が閾値を超えるデータブロックに対して、全ての周波数領域において音圧が閾値以下であるデータブロックの得点よりも高い得点を与え、音圧が閾値を超えて極大となる周波数帯域の数が所定の値を超えるデータブロックに対して、音圧が閾値を超えて極大となる周波数帯域の数が所定の値以下であるデータブロックの得点よりも高い得点を与えて、所定数のデータブロックの得点の合計値又は平均値に基づいて被検者の音声に関するランクを判定する手順（ｂ）とをＣＰＵに実行させる。 Further, in the person evaluation program according to the first aspect of the present invention, the voice data obtained by recording the voice of the subject is Fourier-converted for each data block per unit time, and a plurality of frequency bands for each data block. Data blocks are classified according to the procedure (a) for generating voiceprint data representing the sound pressure distribution in the above and the magnitude and spread of sound pressure in a plurality of frequency bands, and the data blocks are classified based on the classification result of a predetermined number of data blocks. When evaluating the examiner's voice, for a data block whose sound pressure exceeds the threshold in any frequency region, a score higher than the score of the data block whose sound pressure is below the threshold in all frequency regions is given. Data in which the number of frequency bands in which the sound pressure exceeds the threshold and becomes the maximum is equal to or less than the predetermined value for a data block in which the number of frequency bands in which the sound pressure exceeds the threshold and becomes the maximum exceeds a predetermined value. A score higher than the score of the block is given, and the CPU is made to execute the procedure (b) of determining the rank of the subject's voice based on the total value or the average value of the scores of a predetermined number of data blocks.

また、本発明の第１の観点に係る人物評価方法は、被検者の音声を収録して得られる音声データを単位時間当りのデータブロック毎にフーリエ変換し、データブロック毎に複数の周波数帯域における音圧分布を表す声紋データを生成するステップ（ａ）と、複数の周波数帯域における音圧の大きさ及び広がりに応じてデータブロックを分類し、所定数のデータブロックの分類結果に基づいて被検者の音声に関する評価を行う際に、いずれかの周波数領域において音圧が閾値を超えるデータブロックに対して、全ての周波数領域において音圧が閾値以下であるデータブロックの得点よりも高い得点を与え、音圧が閾値を超えて極大となる周波数帯域の数が所定の値を超えるデータブロックに対して、音圧が閾値を超えて極大となる周波数帯域の数が所定の値以下であるデータブロックの得点よりも高い得点を与えて、所定数のデータブロックの得点の合計値又は平均値に基づいて被検者の音声に関するランクを判定するステップ（ｂ）とを備える。 Further, in the person evaluation method according to the first aspect of the present invention, the voice data obtained by recording the voice of the subject is Fourier-converted for each data block per unit time, and a plurality of frequency bands for each data block. Data blocks are classified according to the step (a) of generating voiceprint data representing the sound pressure distribution in the above and the magnitude and spread of sound pressure in a plurality of frequency bands, and the data blocks are classified based on the classification result of a predetermined number of data blocks. When evaluating the examiner's voice, for a data block whose sound pressure exceeds the threshold in any frequency region, a score higher than the score of the data block whose sound pressure is below the threshold in all frequency regions is given. Data in which the number of frequency bands in which the sound pressure exceeds the threshold and becomes the maximum is equal to or less than the predetermined value for a data block in which the number of frequency bands in which the sound pressure exceeds the threshold and becomes the maximum exceeds a predetermined value. A step (b) is provided in which a score higher than the score of the block is given, and the rank of the subject's voice is determined based on the total value or the average value of the scores of a predetermined number of data blocks.

本発明の第１の観点によれば、被検者の音声を収録して得られる音声データからデータブロック毎に複数の周波数帯域における音圧分布を表す声紋データを生成して、複数の周波数帯域における音圧の大きさ及び広がりに応じてデータブロックを分類し、所定数のデータブロックの分類結果に基づいて被検者の音声に関する評価を行う際に、データブロックに与える得点に基づいて被検者の音声に関するランクを判定することにより、人物評価の対象となる被検者の音声に基づいて、被検者の人物評価を行う際に参考となる情報を提供することができる。 According to the first aspect of the present invention, voiceprint data representing the sound pressure distribution in a plurality of frequency bands is generated for each data block from the voice data obtained by recording the voice of the subject, and the plurality of frequency bands. Data blocks are classified according to the magnitude and spread of the sound pressure in, and when the voice of the subject is evaluated based on the classification result of a predetermined number of data blocks, the test is performed based on the score given to the data blocks. By determining the rank of the person's voice, it is possible to provide information that can be used as a reference when evaluating the person of the subject based on the voice of the subject to be evaluated.

本発明の第２の観点に係る人物評価装置は、本発明の第１の観点に係る人物評価装置において、被検者の顔を撮像して得られる動画像データに対してフレーム毎に顔認識処理を施すことにより、被検者の顔において認識される複数の特徴点を抽出し、複数の特徴点の座標を求める画像処理部と、所定数のフレームにおける複数の特徴点の座標に基づいて被検者の顔の動き量を算出し、評価期間における被検者の顔の動き量の統計処理に基づいて被検者の視覚的な評価を行う画像解析部と、音声解析部による評価結果と画像解析部による評価結果とに基づいて被検者の人物評価を行う総合評価部とをさらに備える。 The person evaluation device according to the second aspect of the present invention recognizes each frame of the moving image data obtained by imaging the face of the subject in the person evaluation device according to the first aspect of the present invention. By performing the processing, a plurality of feature points recognized on the subject's face are extracted, and the image processing unit for obtaining the coordinates of the plurality of feature points and the coordinates of the plurality of feature points in a predetermined number of frames are used as the basis for the processing. Evaluation results by the image analysis unit and the voice analysis unit that calculate the amount of facial movement of the subject and visually evaluate the subject based on the statistical processing of the amount of facial movement of the subject during the evaluation period. It is further provided with a comprehensive evaluation unit that evaluates the person of the subject based on the evaluation result by the image analysis unit.

また、本発明の第２の観点に係る人物評価プログラムは、本発明の第１の観点に係る人物評価プログラムにおいて、被検者の顔を撮像して得られる動画像データに対してフレーム毎に顔認識処理を施すことにより、被検者の顔において認識される複数の特徴点を抽出し、複数の特徴点の座標を求める手順（ｃ）と、所定数のフレームにおける複数の特徴点の座標に基づいて被検者の顔の動き量を算出し、評価期間における被検者の顔の動き量の統計処理に基づいて被検者の視覚的な評価を行う手順（ｄ）と、手順（ｂ）における評価結果と手順（ｄ）における評価結果とに基づいて被検者の人物評価を行う手順（ｅ）とをさらにＣＰＵに実行させる。 In addition, the person evaluation program according to the second aspect of the present invention is the person evaluation program according to the first aspect of the present invention for each frame with respect to the moving image data obtained by imaging the face of the subject. The procedure (c) of extracting a plurality of feature points recognized on the subject's face by performing face recognition processing and obtaining the coordinates of the plurality of feature points, and the coordinates of the plurality of feature points in a predetermined number of frames. The procedure (d) and the procedure (d) in which the amount of movement of the subject's face is calculated based on the above and the visual evaluation of the subject is performed based on the statistical processing of the amount of movement of the subject's face during the evaluation period. Further, the CPU is made to execute the procedure (e) for evaluating the person of the subject based on the evaluation result in b) and the evaluation result in the procedure (d).

また、本発明の第２の観点に係る人物評価方法は、本発明の第１の観点に係る人物評価方法において、被検者の顔を撮像して得られる動画像データに対してフレーム毎に顔認識処理を施すことにより、被検者の顔において認識される複数の特徴点を抽出し、複数の特徴点の座標を求めるステップ（ｃ）と、所定数のフレームにおける複数の特徴点の座標に基づいて被検者の顔の動き量を算出し、評価期間における被検者の顔の動き量の統計処理に基づいて被検者の視覚的な評価を行うステップ（ｄ）と、ステップ（ｂ）における評価結果とステップ（ｄ）における評価結果とに基づいて被検者の人物評価を行うステップ（ｅ）とをさらに備える。 Further, the person evaluation method according to the second aspect of the present invention is the person evaluation method according to the first aspect of the present invention for each frame with respect to the moving image data obtained by imaging the face of the subject. By performing face recognition processing, a plurality of feature points recognized on the subject's face are extracted, and the step (c) of obtaining the coordinates of the plurality of feature points and the coordinates of the plurality of feature points in a predetermined number of frames are obtained. The step (d) and the step (d) in which the amount of movement of the subject's face is calculated based on the above and the visual evaluation of the subject is performed based on the statistical processing of the amount of movement of the subject's face during the evaluation period. Further, the step (e) for evaluating the person of the subject based on the evaluation result in b) and the evaluation result in step (d) is further provided.

本発明の第２の観点によれば、被検者の顔を撮像して得られる動画像データから被検者の顔において認識される複数の特徴点の座標を求めて、所定数のフレームにおける複数の特徴点の座標に基づいて被検者の顔の動き量を算出し、評価期間における被検者の顔の動き量の統計処理に基づいて被検者の視覚的な評価を行うことにより、人物評価の対象となる被検者の動画像及び音声に基づいて、被検者の人物評価を行う際に参考となる情報を提供することができる。 According to the second aspect of the present invention, the coordinates of a plurality of feature points recognized on the subject's face are obtained from the moving image data obtained by imaging the subject's face, and the coordinates are obtained in a predetermined number of frames. By calculating the amount of movement of the subject's face based on the coordinates of multiple feature points and performing a visual evaluation of the subject based on the statistical processing of the amount of movement of the subject's face during the evaluation period. , Information that can be used as a reference when evaluating a subject's person can be provided based on the moving image and sound of the subject to be evaluated.

本発明の一実施形態に係る人物評価装置の構成例を示すブロック図。The block diagram which shows the structural example of the person evaluation apparatus which concerns on one Embodiment of this invention. 音声データによって表される音声波形の例を示す図。The figure which shows the example of the voice waveform represented by the voice data. 声紋データによって表される音圧分布の例を示す図。The figure which shows the example of the sound pressure distribution represented by the voiceprint data. 声紋データに基づく音声の評価例を説明するための図。The figure for demonstrating the evaluation example of the voice based on the voiceprint data. 第１の軸を回転中心とする被検者の顔の動き量を算出するために用いられる画像の例を示す図。The figure which shows the example of the image used for calculating the amount of movement of the face of a subject about the 1st axis as a rotation center. 第１の軸を回転中心とする被検者の顔の動きによる第１の三角形と第２の三角形との高さの比の変化を説明するための図。The figure for demonstrating the change of the height ratio of the 1st triangle and the 2nd triangle by the movement of the subject's face about the 1st axis of rotation. 第２の軸を回転中心とする被検者の顔の動き量を算出するために用いられる画像の例を示す図。The figure which shows the example of the image used for calculating the amount of movement of a subject's face about the 2nd axis as a rotation center. 第２の軸を回転中心とする被検者の顔の動きによる第１の三角形と第２の三角形との面積比の変化を説明するための図。The figure for demonstrating the change of the area ratio between the 1st triangle and the 2nd triangle by the movement of the subject's face about the 2nd axis as a rotation center. 第３の軸を回転中心とする被検者の顔の動き量を算出するために用いられる画像の例を示す図。The figure which shows the example of the image used for calculating the amount of movement of a subject's face about the 3rd axis as a rotation center. 評価期間における被検者の顔の向きを表す量の分散値の確率分布の例を示す図。The figure which shows the example of the probability distribution of the variance value of the quantity which shows the direction of the face of a subject in the evaluation period. 被検者の人物評価を行うために用いられるマッピングエリアの例を示す図。The figure which shows the example of the mapping area used for person evaluation of a subject. 本発明の一実施形態に係る人物評価方法を示すフローチャート。The flowchart which shows the person evaluation method which concerns on one Embodiment of this invention. 動画像データの処理フローの例を示すフローチャート（前半）。A flowchart showing an example of a processing flow of moving image data (first half). 動画像データの処理フローの例を示すフローチャート（後半）。A flowchart showing an example of a processing flow of moving image data (second half).

以下、本発明の実施形態について、図面を参照しながら詳細に説明する。なお、同一の構成要素には同一の参照符号を付して、重複する説明を省略する。
＜人物評価装置＞
図１は、本発明の一実施形態に係る人物評価装置の構成例を示すブロック図である。人物評価装置としては、例えば、パーソナルコンピューター、タブレット端末、又は、スマートフォン等を使用することができる。以下においては、一例として、人物評価装置としてパーソナルコンピューターを使用する場合について説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. The same components are designated by the same reference numerals, and duplicate description will be omitted.
<Person evaluation device>
FIG. 1 is a block diagram showing a configuration example of a person evaluation device according to an embodiment of the present invention. As the person evaluation device, for example, a personal computer, a tablet terminal, a smartphone, or the like can be used. In the following, as an example, a case where a personal computer is used as a person evaluation device will be described.

図１に示すように、この人物評価装置は、操作部１０と、表示部２０と、入出力インターフェース３０と、ネットワークインターフェース４０と、ＣＰＵ（中央演算装置）５０と、メモリー６０と、格納部７０とを含んでいる。入出力インターフェース３０〜格納部７０は、バスラインを介して互いに接続されている。なお、図１に示す構成要素の一部を省略又は変更しても良いし、あるいは、図１に示す構成要素に他の構成要素を付加しても良い。 As shown in FIG. 1, the person evaluation device includes an operation unit 10, a display unit 20, an input / output interface 30, a network interface 40, a CPU (central processing unit) 50, a memory 60, and a storage unit 70. And is included. The input / output interfaces 30 to the storage unit 70 are connected to each other via a bus line. A part of the component shown in FIG. 1 may be omitted or changed, or another component may be added to the component shown in FIG.

操作部１０は、キーボードやマウス等で構成され、各種の命令やデータを入力するために用いられる。表示部２０は、例えば、ＬＣＤ（液晶表示装置）等を含み、操作画面や評価画面等を表示する。入出力インターフェース３０は、操作部１０及び表示部２０に接続されており、操作部１０を用いて入力される各種の命令やデータをＣＰＵ５０又はメモリー６０に供給し、ＣＰＵ５０によって生成される表示データを表示部２０に供給する。 The operation unit 10 is composed of a keyboard, a mouse, and the like, and is used for inputting various commands and data. The display unit 20 includes, for example, an LCD (liquid crystal display) and the like, and displays an operation screen, an evaluation screen, and the like. The input / output interface 30 is connected to the operation unit 10 and the display unit 20, supplies various commands and data input using the operation unit 10 to the CPU 50 or the memory 60, and outputs the display data generated by the CPU 50. It is supplied to the display unit 20.

また、入出力インターフェース３０は、外部から音声データ又は動画像データを入力したり、ＵＳＢ（ユニバーサルシリアルバス）メモリー等の周辺機器との間でデータのシリアル転送を行うことが可能である。さらに、入出力インターフェース３０は、アナログの音声信号又は画像信号をデジタルの音声データ又は動画像データに変換するアナログ／デジタル変換器を含んでも良い。 Further, the input / output interface 30 can input audio data or moving image data from the outside, and serially transfer data to and from a peripheral device such as a USB (universal serial bus) memory. Further, the input / output interface 30 may include an analog / digital converter that converts an analog audio signal or image signal into digital audio data or moving image data.

ネットワークインターフェース４０は、ＣＰＵ５０をＬＡＮ又はインターネット等のネットワークに接続する。ＣＰＵ５０は、格納部７０に格納されているソフトウェアに従って、各種の演算やデータ処理を行う。メモリー６０は、入出力インターフェース３０から供給される各種の命令やデータ、ネットワークインターフェース４０から供給されるデータ、及び、ＣＰＵ５０によって生成又は処理されるデータ等を一時的に記憶する。 The network interface 40 connects the CPU 50 to a network such as a LAN or the Internet. The CPU 50 performs various calculations and data processing according to the software stored in the storage unit 70. The memory 60 temporarily stores various instructions and data supplied from the input / output interface 30, data supplied from the network interface 40, data generated or processed by the CPU 50, and the like.

格納部７０は、各種のデータや、ＣＰＵ５０に動作を行わせるための各種のソフトウェア等を記録媒体に格納する。記録媒体としては、内蔵のハードディスクの他に、外付けハードディスク、フレキシブルディスク、ＭＯ、ＭＴ、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、又は、各種のメモリー等を用いることができる。 The storage unit 70 stores various data, various software for causing the CPU 50 to operate, and the like in a recording medium. As the recording medium, in addition to the built-in hard disk, an external hard disk, a flexible disk, MO, MT, CD-ROM, DVD-ROM, various memories, or the like can be used.

ここで、ＣＰＵ５０とソフトウェア（人物評価プログラムを含む）とによって、音声処理部５１と、音声解析部５２と、画像処理部５３と、画像解析部５４と、総合評価部５５とが、機能ブロックとして構成される。 Here, the voice processing unit 51, the voice analysis unit 52, the image processing unit 53, the image analysis unit 54, and the comprehensive evaluation unit 55 are used as functional blocks by the CPU 50 and software (including the person evaluation program). It is composed.

図１に示す人物評価装置には、人物評価の対象となる被検者の音声を収録して得られる音声データが供給される。あるいは、被検者の顔を撮像して得られる動画像データが音声データと共に供給される。その場合には、動画像データと音声データとが結合されていても良い。あるいは、入出力インターフェース３０のアナログ／デジタル変換器が、人物評価装置に供給されるアナログの音声信号又は画像信号をデジタルの音声データ又は動画像データに変換しても良い。 The person evaluation device shown in FIG. 1 is supplied with voice data obtained by recording the voice of a subject to be evaluated. Alternatively, moving image data obtained by imaging the subject's face is supplied together with audio data. In that case, the moving image data and the audio data may be combined. Alternatively, the analog / digital converter of the input / output interface 30 may convert the analog audio signal or image signal supplied to the person evaluation device into digital audio data or moving image data.

例えば、マイクロフォン、携帯電話機（スマートフォン等）、タブレット端末、ビデオカメラ、又は、スカイプ（Skype）等を用いて得られる音声データ又は動画像データ等が、リアルタイムで人物評価装置に供給されても良い。あるいは、音声レコーダー又はビデオムービー等に予め記録された音声データ又は動画像データ等が、バッチ処理で人物評価装置に供給されても良い。 For example, voice data or moving image data obtained by using a microphone, a mobile phone (smartphone or the like), a tablet terminal, a video camera, or Skype (Skype) or the like may be supplied to the person evaluation device in real time. Alternatively, the voice data or the moving image data recorded in advance in the voice recorder, the video movie, or the like may be supplied to the person evaluation device by batch processing.

＜音声処理＞
入出力インターフェース３０又はネットワークインターフェース４０等から供給される音声データ又は動画像データは、生データ格納部７１に格納される。音声処理部５１は、生データ格納部７１から音声データを読み出して音声データを取得する。 <Voice processing>
The audio data or moving image data supplied from the input / output interface 30 or the network interface 40 or the like is stored in the raw data storage unit 71. The voice processing unit 51 reads the voice data from the raw data storage unit 71 and acquires the voice data.

図２は、音声データによって表される音声波形の例を示す図である。図２において、横軸は、時間［秒］を表しており、縦軸は、音声波形の振幅を表している。例えば、音声評価のために、収録開始後５秒〜３５秒の３０秒間の評価期間における音声を表す音声データが用いられる。なお、音声データにおける音声波形の振幅は、ピーク値等に基づいて正規化されても良い。 FIG. 2 is a diagram showing an example of a voice waveform represented by voice data. In FIG. 2, the horizontal axis represents time [seconds], and the vertical axis represents the amplitude of the voice waveform. For example, for voice evaluation, voice data representing voice in an evaluation period of 30 seconds from 5 seconds to 35 seconds after the start of recording is used. The amplitude of the voice waveform in the voice data may be normalized based on the peak value or the like.

図１に示す音声処理部５１は、人物評価の対象となる被検者の音声を収録して得られる音声データを単位時間（例えば、約０．０７秒）当りのデータブロック毎にフーリエ変換し、データブロック毎に複数の周波数帯域における音圧分布を表す声紋データを生成する。なお、音声データのデータブロックは、一般的には、音声データのフレームに相当するものであるが、本願においては、画像データのフレームと区別するために、音声データについてはデータブロックという用語が用いられる。 The voice processing unit 51 shown in FIG. 1 Fourier transforms the voice data obtained by recording the voice of the subject to be evaluated as a person for each data block per unit time (for example, about 0.07 seconds). , Voiceprint data representing the sound pressure distribution in a plurality of frequency bands is generated for each data block. The data block of audio data generally corresponds to a frame of audio data, but in the present application, the term data block is used for audio data in order to distinguish it from a frame of image data. Be done.

図３は、声紋データによって表される音圧分布の例を示す図である。図３において、横軸は、２×時間［秒］を表す時間軸であり、縦軸は、周波数を対数表示で表す周波数軸である。図３においては、各々の周波数領域における明度が音圧［ｄＢ］を表しており、音圧が高い周波数領域ほど白に近く表示されている。あるいは、時間軸及び周波数軸に直交する音圧軸が音圧［ｄＢ］を表す３次元表示が用いられても良い。 FIG. 3 is a diagram showing an example of the sound pressure distribution represented by the voiceprint data. In FIG. 3, the horizontal axis is the time axis representing 2 × time [seconds], and the vertical axis is the frequency axis representing the frequency in logarithmic representation. In FIG. 3, the brightness in each frequency region represents the sound pressure [dB], and the higher the sound pressure in the frequency region, the closer to white it is displayed. Alternatively, a three-dimensional display in which the sound pressure axis orthogonal to the time axis and the frequency axis represents the sound pressure [dB] may be used.

ここで、音声データに基づいて声紋データを生成する手法の一例について説明する。図１に示す音声処理部５１は、音声データによって表される音声波形にハミング窓をかけることにより、時系列の音声データを所定の時間毎に区切って、時間軸に沿った複数のデータブロックを作成する。例えば、サンプリング周波数が約４４ｋＨｚである場合に、１つのデータブロックが、２０４８サンプルの音声データを含んでいる。なお、連続する２つのデータブロックの各々が、オーバーラップする複数のサンプルを含んでも良い。 Here, an example of a method for generating voiceprint data based on voice data will be described. The voice processing unit 51 shown in FIG. 1 divides time-series voice data at predetermined time intervals by applying a humming window to the voice waveform represented by the voice data, and divides a plurality of data blocks along the time axis. create. For example, if the sampling frequency is about 44 kHz, one data block contains 2048 samples of audio data. It should be noted that each of the two consecutive data blocks may include a plurality of overlapping samples.

次に、音声処理部５１は、データブロック毎に音声データをフーリエ変換することにより、複数の周波数成分を抽出する。例えば、音声処理部５１は、音声データに高速フーリエ変換（ＦＦＴ）処理を施しても良い。フーリエ変換によって求められる周波数成分は複素数であるので、音声処理部５１は、各々の周波数成分の絶対値を求める。 Next, the voice processing unit 51 extracts a plurality of frequency components by Fourier transforming the voice data for each data block. For example, the voice processing unit 51 may perform a fast Fourier transform (FFT) process on the voice data. Since the frequency component obtained by the Fourier transform is a complex number, the voice processing unit 51 obtains the absolute value of each frequency component.

音声処理部５１は、それらの周波数成分の絶対値に、オクターブ毎の周波数領域の窓、又は、メル尺度（音高の知覚的尺度）に基づいて定められた周波数領域の窓をかけて積分することにより、各窓の周波数帯域における積分値を求め、さらに、積分値の対数をとって音圧［ｄＢ］を求める。それにより、周波数領域の窓が２０個であれば、２０個の周波数帯域における音圧が得られる。 The voice processing unit 51 integrates the absolute values of those frequency components by multiplying the windows of the frequency domain for each octave or the windows of the frequency domain determined based on the mel scale (perceptual scale of pitch). As a result, the integrated value in the frequency band of each window is obtained, and further, the logarithm of the integrated value is taken to obtain the sound pressure [dB]. As a result, if there are 20 windows in the frequency domain, sound pressure in the 20 frequency bands can be obtained.

＜音声解析＞
音声処理部５１は、このようにして生成された声紋データを声紋データ格納部７２に格納する。音声解析部５２は、声紋データ格納部７２から声紋データを読み出して、複数の周波数帯域における音圧の大きさ及び広がりに応じてデータブロックを分類し、所定数のデータブロックの分類結果に基づいて被検者の音声に関する評価を行う。 <Voice analysis>
The voice processing unit 51 stores the voiceprint data generated in this way in the voiceprint data storage unit 72. The voice analysis unit 52 reads the voiceprint data from the voiceprint data storage unit 72, classifies the data blocks according to the magnitude and spread of the sound pressure in a plurality of frequency bands, and based on the classification result of a predetermined number of data blocks. Evaluate the subject's voice.

例えば、音声解析部５２は、いずれかの周波数領域において音圧が閾値を超えるか否かに応じてデータブロックを分類し、さらに、いずれかの周波数領域において音圧が閾値を超えるデータブロックを、音圧が閾値を超えて極大となる周波数帯域の数に応じて分類する。 For example, the voice analysis unit 52 classifies the data blocks according to whether or not the sound pressure exceeds the threshold in any frequency region, and further classifies the data blocks in which the sound pressure exceeds the threshold in any frequency region. Classify according to the number of frequency bands in which the sound pressure exceeds the threshold and becomes maximum.

図４は、声紋データに基づく音声の評価例を説明するための図である。図４（Ａ）〜図４（Ｄ）は、４種類の声紋データによって表される音圧分布の例を示している。図３及び図４において、各データブロックにおける黒い周波数領域は、音圧が閾値（例えば、１５ｄＢ）以下の周波数領域であり、その周波数成分が無声又はノイズであると判定される。 FIG. 4 is a diagram for explaining an example of voice evaluation based on voiceprint data. 4 (A) to 4 (D) show an example of the sound pressure distribution represented by four types of voiceprint data. In FIGS. 3 and 4, the black frequency region in each data block is a frequency region in which the sound pressure is equal to or less than a threshold value (for example, 15 dB), and it is determined that the frequency component is silent or noise.

図４（Ａ）及び図４（Ｂ）に示すように、被検者が流暢に話して音声の途切れが少ない場合には、いずれかの周波数領域において音圧が閾値を超えるデータブロックの割合が大きくなる。特に、図４（Ａ）に示すように、被検者の音声において倍音の伸びが豊かで声紋の縞模様が多く、輪郭がはっきりして明るく良く通る声質の場合には、音圧が閾値を超えて極大となる周波数帯域が多くなる。 As shown in FIGS. 4 (A) and 4 (B), when the subject speaks fluently and there is little interruption in voice, the proportion of data blocks in which the sound pressure exceeds the threshold value in any frequency region is high. growing. In particular, as shown in FIG. 4 (A), in the case of a voice quality in which the overtones are richly extended, the voiceprint has many striped patterns, the outline is clear, and the voice quality passes well, the sound pressure sets a threshold value. The frequency band that exceeds the maximum becomes large.

一方、図４（Ｃ）及び図４（Ｄ）に示すように、被検者が言葉に詰まって音声が途切れがちな場合には、全ての周波数領域において音圧が閾値以下であるデータブロックの割合が大きくなる。特に、図４（Ｄ）に示すように、被検者の音声において倍音の伸びが不足して声紋の縞模様が少なく、輪郭がぼけたような暗くてこもった声質の場合には、音圧が閾値を超えて極大となる周波数帯域が少なくなる。 On the other hand, as shown in FIGS. 4 (C) and 4 (D), when the subject is clogged with words and the voice tends to be interrupted, the sound pressure of the data block is below the threshold value in all frequency regions. The ratio increases. In particular, as shown in FIG. 4 (D), in the case of a dark and muffled voice quality such as a blurred outline, the sound pressure is insufficient in the voice of the subject due to insufficient extension of overtones and few stripes of the voiceprint. Exceeds the threshold and the maximum frequency band is reduced.

そこで、音声解析部５２は、全ての周波数領域において音圧が閾値以下であるデータブロックに対して得点Ｓ０を与え、いずれかの周波数領域において音圧が閾値を超えるデータブロックに対して、全ての周波数領域において音圧が閾値以下であるデータブロックの得点Ｓ０よりも高い得点を与える。 Therefore, the voice analysis unit 52 gives a score S0 to the data blocks whose sound pressure is equal to or lower than the threshold value in all frequency regions, and all the data blocks whose sound pressure exceeds the threshold value in any frequency region. A score higher than the score S0 of the data block whose sound pressure is below the threshold in the frequency domain is given.

さらに、音声解析部５２は、いずれかの周波数領域において音圧が閾値を超えるデータブロックについて、音圧が閾値を超えて極大となる周波数帯域の数を求める。図３及び図４を参照すると、各データブロックにおいて、ある周波数領域の明るさがその上下両側の周波数領域の明るさよりも明るい場合には、その周波数領域において音圧が極大となっている。 Further, the voice analysis unit 52 obtains the number of frequency bands in which the sound pressure exceeds the threshold value and becomes maximum for the data block in which the sound pressure exceeds the threshold value in any of the frequency regions. Referring to FIGS. 3 and 4, when the brightness of a certain frequency domain is brighter than the brightness of the frequency domains on both the upper and lower sides of each data block, the sound pressure is maximized in that frequency domain.

あるいは、時間軸及び周波数軸に直交する音圧軸が音圧［ｄＢ］を表す３次元表示が用いられる場合に、各データブロックにおいて、ある周波数領域の音圧がその上下両側の周波数領域の音圧よりも高く、音圧が高音圧側に凸である場合には、その周波数領域において音圧が極大となっている。 Alternatively, when a three-dimensional display is used in which the sound pressure axis orthogonal to the time axis and the frequency axis represents the sound pressure [dB], in each data block, the sound pressure in a certain frequency region is the sound in the frequency regions on both the upper and lower sides thereof. When the sound pressure is higher than the pressure and the sound pressure is convex toward the high sound pressure side, the sound pressure is maximized in the frequency region.

音声解析部５２は、いずれかの周波数領域において音圧が閾値を超えるデータブロックの内で、音圧が閾値を超えて極大となる周波数帯域の数が所定の値以下であるデータブロックに対して得点Ｓ１を与え、音圧が閾値を超えて極大となる周波数帯域の数が所定の値を超えるデータブロックに対して得点Ｓ１よりも高い得点Ｓ２を与える。 The voice analysis unit 52 refers to a data block in which the number of frequency bands in which the sound pressure exceeds the threshold and becomes maximum is equal to or less than a predetermined value among the data blocks in which the sound pressure exceeds the threshold in any of the frequency regions. A score S1 is given, and a score S2 higher than the score S1 is given to a data block in which the number of frequency bands in which the sound pressure exceeds the threshold value and becomes maximum exceeds a predetermined value.

次に、音声解析部５２は、所定数のデータブロックの得点の合計値又は平均値に基づいて被検者の音声に関するランクを判定する。例えば、得点Ｓ０のデータブロックの数Ｎ０と、得点Ｓ１のデータブロックの数Ｎ１と、得点Ｓ２のデータブロックの数Ｎ２とを用いて、所定数（Ｎ個）のデータブロックの得点の平均値ＡＶＥが、次式（１）によって表される。
ＡＶＥ＝（Ｓ０×Ｎ０＋Ｓ１×Ｎ１＋Ｓ２×Ｎ２）／Ｎ・・・（１）
ここで、Ｎ０〜Ｎ２はゼロ以上の整数であり、Ｎは３以上の整数であって、次式（２）が成立する。
Ｎ＝Ｎ０＋Ｎ１＋Ｎ２・・・（２）
例えば、式（１）において、Ｓ０＝０、Ｓ１＝１、Ｓ２＝３〜５としても良い。 Next, the voice analysis unit 52 determines the rank of the subject's voice based on the total value or the average value of the scores of a predetermined number of data blocks. For example, using the number N0 of data blocks with a score S0, the number N1 of data blocks with a score S1, and the number N2 of data blocks with a score S2, the average value AVE of the scores of a predetermined number (N) of data blocks. Is expressed by the following equation (1).
AVE = (S0 x N0 + S1 x N1 + S2 x N2) / N ... (1)
Here, N0 to N2 are integers of zero or more, N is an integer of 3 or more, and the following equation (2) holds.
N = N0 + N1 + N2 ... (2)
For example, in the formula (1), S0 = 0, S1 = 1, and S2 = 3 to 5 may be set.

音声解析部５２は、所定数のデータブロックの得点の合計値又は平均値を、予め設定された少なくとも１つの基準値と比較して、被検者の音声に関するランクを判定しても良い。そのために、学習データ格納部７３には、例えば、インターシップ応募者等の疑似被検者の音声を収録して得られた音声データと、実際に評価者がその音声を評価して判定したランク等を表す評価データとが、判定学習データとして予め格納されている。音声解析部５２は、判定学習データを用いて機械学習を行うＡＩ（人工知能）として機能することにより、判定学習データに近い判定結果が得られるように少なくとも１つの基準値を設定して、被検者の音声に関するランクを判定しても良い。 The voice analysis unit 52 may compare the total value or the average value of the scores of a predetermined number of data blocks with at least one preset reference value to determine the rank of the subject regarding the voice. Therefore, in the learning data storage unit 73, for example, the voice data obtained by recording the voice of a pseudo subject such as an internship applicant, the rank actually evaluated by the evaluator, and the like, etc. The evaluation data representing the above is stored in advance as the determination learning data. The voice analysis unit 52 sets at least one reference value so that a judgment result close to the judgment learning data can be obtained by functioning as an AI (artificial intelligence) that performs machine learning using the judgment learning data. The rank of the examiner's voice may be determined.

例えば、音声解析部５２は、被検者の音声を４段階で評価する場合に、平均値ＡＶＥを第１〜第３の基準値と比較する。音声解析部５２は、平均値ＡＶＥが第１の基準値以下の場合に、その被検者の音声をランクＲＡ０（極めて悪い音声）と判定し、平均値ＡＶＥが第１の基準値よりも大きく第２の基準値以下の場合に、その被検者の音声をランクＲＡ１（悪い音声）と判定する。また、音声解析部５２は、平均値ＡＶＥが第２の基準値よりも大きく第３の基準値以下の場合に、その被検者の音声をランクＲＡ２（普通の音声）と判定し、平均値ＡＶＥが第３の基準値よりも大きい場合に、その被検者の音声をランクＲＡ３（良い音声）と判定する。 For example, the voice analysis unit 52 compares the average value AVE with the first to third reference values when evaluating the voice of the subject in four stages. When the average value AVE is equal to or less than the first reference value, the voice analysis unit 52 determines that the subject's voice is rank RA0 (extremely bad voice), and the average value AVE is larger than the first reference value. When it is equal to or less than the second reference value, the voice of the subject is determined to be rank RA1 (bad voice). Further, the voice analysis unit 52 determines that the voice of the subject is rank RA2 (ordinary voice) when the average value AVE is larger than the second reference value and equal to or less than the third reference value, and the average value. When the AVE is larger than the third reference value, the voice of the subject is determined to be rank RA3 (good voice).

なお、評価期間に相当する音声データにおいて音声が収録されている期間が一定の期間（例えば１５秒）に達しないような場合には、音声解析部５２は、その被検者の音声をランクＲＡ０と判定しても良い。音声解析部５２は、このようにして得られた音声評価結果を表す音声評価データを評価データ格納部７４に格納する。 If the voice data corresponding to the evaluation period does not reach a certain period (for example, 15 seconds), the voice analysis unit 52 ranks the subject's voice at rank RA0. May be determined. The voice analysis unit 52 stores the voice evaluation data representing the voice evaluation result thus obtained in the evaluation data storage unit 74.

＜画像処理＞
生データ格納部７１に動画像データが格納された場合には、画像処理部５３が、生データ格納部７１から動画像データを読み出して動画像データを取得する。例えば、動画像データは、１秒間に２４フレームの画像を表しており、視覚的評価のために、撮像開始後５秒〜６０秒の５５秒間の評価期間における画像を表す動画像データが用いられる。 <Image processing>
When the moving image data is stored in the raw data storage unit 71, the image processing unit 53 reads the moving image data from the raw data storage unit 71 to acquire the moving image data. For example, the moving image data represents an image of 24 frames per second, and for visual evaluation, moving image data representing an image in an evaluation period of 55 seconds from 5 seconds to 60 seconds after the start of imaging is used. ..

画像処理部５３は、被検者の顔を撮像して得られる動画像データに対してフレーム毎に顔認識処理を施すことにより、被検者の顔において認識される複数の特徴点を抽出し、複数の特徴点の座標を求める。ここで、画像処理の一種である顔認識処理の一例について説明する。 The image processing unit 53 extracts a plurality of feature points recognized on the subject's face by performing face recognition processing on each frame for the moving image data obtained by imaging the subject's face. , Find the coordinates of multiple feature points. Here, an example of face recognition processing, which is a kind of image processing, will be described.

まず、画像処理部５３は、１フレームの動画像データによって表される画像（以下においては、「入力画像」ともいう）における被検者の顔の位置を検出する。例えば、画像処理部５３は、ＯｐｅｎＣＶ等のソフトウェアを用いて、被検者の顔の位置や領域等を検出することができる。 First, the image processing unit 53 detects the position of the subject's face in an image represented by one frame of moving image data (hereinafter, also referred to as an “input image”). For example, the image processing unit 53 can detect the position, region, and the like of the face of the subject by using software such as OpenCV.

次に、画像処理部５３は、１フレームの動画像データと、学習データ格納部７３に予め格納されている顔認識学習データとを用いて、入力画像における被検者の顔を認識する。この顔認識処理においては、例えば、アクティブ・アピアランス・モデル（ＡＡＭ）が用いられる。その後、画像処理部５３は、被検者の顔を認識できたか否かを判定する。 Next, the image processing unit 53 recognizes the face of the subject in the input image by using the moving image data of one frame and the face recognition learning data stored in advance in the learning data storage unit 73. In this face recognition process, for example, an active appearance model (AAM) is used. After that, the image processing unit 53 determines whether or not the face of the subject can be recognized.

被検者の顔を認識できた場合に、画像処理部５３は、被検者の顔において認識される複数の特徴点を抽出し、入力画像における複数の特徴点の座標を求める。さらに、画像処理部５３は、複数の特徴点の座標を、フレーム番号と共に座標データ格納部７５に格納する。なお、画像処理部５３は、入力画像における複数の特徴点の座標をピクセル番号として求めても良い。 When the face of the subject can be recognized, the image processing unit 53 extracts a plurality of feature points recognized in the face of the subject and obtains the coordinates of the plurality of feature points in the input image. Further, the image processing unit 53 stores the coordinates of the plurality of feature points in the coordinate data storage unit 75 together with the frame number. The image processing unit 53 may obtain the coordinates of a plurality of feature points in the input image as pixel numbers.

＜顔認識処理の詳細＞
学習データ格納部７３には、例えば、標準的な人間の顔又はその模型等を用いて予め撮影された画像を表す画像データと、その画像において設定された複数の特徴点の座標とが、顔認識学習データとして予め格納されている。画像処理部５３は、顔認識学習データに基づいて、被検者の顔を撮像して得られる画像データに顔認識処理を施すことにより、被検者の顔から複数の特徴点を抽出し、それらの特徴点の座標を求める。 <Details of face recognition processing>
In the learning data storage unit 73, for example, image data representing an image taken in advance using a standard human face or a model thereof, and coordinates of a plurality of feature points set in the image are stored in the face. It is stored in advance as recognition learning data. The image processing unit 53 extracts a plurality of feature points from the face of the subject by performing face recognition processing on the image data obtained by imaging the face of the subject based on the face recognition learning data. Find the coordinates of those feature points.

上記の顔認識処理において用いることができるアクティブ・アピアランス・モデルとは、対象となる物体の画像を形状（shape）とテクスチャー（appearance）とに分けて、それぞれを主成分分析（principal component analysis）によって次元圧縮することにより、少ないパラメーターで対象の形状の変化とテクスチャーの変化とを表現できるようにしたモデルである。形状及びテクスチャーの情報は、低次元のパラメーターで表現することができる。 The active appearance model that can be used in the above face recognition process divides the image of the target object into a shape and an appearance, and each of them is subjected to principal component analysis. By dimensional compression, it is a model that can express changes in the shape and texture of an object with a small number of parameters. Shape and texture information can be represented by low-dimensional parameters.

アクティブ・アピアランス・モデルにおいて、全特徴点を並べた形状ベクトルｘは、予め顔認識学習データから求められた平均形状ベクトルｕと、平均形状ベクトルｕからの偏差を主成分分析して得られる固有ベクトル行列Ｐ_ｓとを用いて、次式（３）によって表される。
ｘ＝ｕ＋Ｐ_ｓｂ_ｓ・・・（３）
ここで、ｂ_ｓは、パラメーターベクトルであり、形状パラメーターと呼ばれる。 In the active appearance model, the shape vector x in which all the feature points are arranged is an eigenvector matrix obtained by principal component analysis of the average shape vector u obtained in advance from the face recognition learning data and the deviation from the average shape vector u. by using the P _s, it is represented by the following formula (3).
x = u + P _s b _s ... (3)
Here, b _s are parameters vector, called shape parameter.

また、正規化されたテクスチャーの輝度値を並べたアピアランスベクトルｇは、予め顔認識学習データから求められた平均アピアランスベクトルｖと、平均アピアランスベクトルｖからの偏差を主成分分析して得られる固有ベクトル行列Ｐ_ｇとを用いて、次式（４）によって表される。
ｇ＝ｖ＋Ｐ_ｇｂ_ｇ・・・（４）
ここで、ｂ_ｇは、パラメーターベクトルであり、アピアランスパラメーターと呼ばれる。形状パラメーターｂ_ｓ及びアピアランスパラメーターｂ_ｇは、平均からの変化を表すパラメーターであり、これらを変化させることによって、形状及びアピアランスを変化させることができる。 Further, the appearance vector g in which the brightness values of the normalized textures are arranged is an eigenvector matrix obtained by principal component analysis of the average appearance vector v obtained from the face recognition learning data in advance and the deviation from the average appearance vector v. It is expressed by the following equation (4) using _{P g.}
g = v + P _g b _g ... (4)
Here, b _g is a parameter vector and is called an appearance parameter. The shape parameter b _s and the appearance parameter b _g are parameters representing changes from the average, and the shape and appearance can be changed by changing these.

また、形状とアピアランスとの間に相関があることから、形状パラメーターｂ_ｓ及びアピアランスパラメーターｂ_ｇをさらに主成分分析することにより、形状とアピアランスとの両方を制御する低次元のパラメーターベクトル（以下においては、「結合パラメーター」ともいう）ｃを用いて、形状ベクトルｘ（ｃ）及びテクスチャーベクトルｇ（ｃ）が、次式（５）及び（６）によって表される。
ｘ（ｃ）＝ｕ＋Ｐ_ｓＷ_ｓ ^−１Ｑ_ｓｃ・・・（５）
ｇ（ｃ）＝ｖ＋Ｐ_ｇＱ_ｇｃ・・・（６）
ここで、Ｗ_ｓは、形状ベクトルとアピアランスベクトルとの単位の違いを正規化する行列であり、Ｑ_ｓは、形状に関する固有ベクトル行列であり、Ｑ_ｇは、アピアランスに関する固有ベクトル行列である。このようにして、結合パラメーターｃを制御することによって、形状とアピアランスとを同時に扱い、対象の変化を表現することが可能となる。 In addition, since there is a correlation between the shape and the appearance, a low-dimensional parameter vector that controls both the shape and the appearance by further principal component analysis of the _{shape parameter b s} and the appearance parameter b _{g (in the following).} The shape vector x (c) and the texture vector g (c) are represented by the following equations (5) and (6) using (also referred to as “combination parameter”) c.
x (c) = u + P _s W _s ^-1 Q _s c ... (5)
g (c) = v + P _g Q _g c ... (6)
Here, W _s is a matrix that normalizes the difference in units between the shape vector and the appearance vector, Q _s is an eigenvector matrix related to the shape, and Q _g is an eigenvector matrix related to the appearance. By controlling the coupling parameter c in this way, it is possible to handle the shape and appearance at the same time and express the change of the object.

次に、対象が、画像中のどこに、どんなサイズで、どんな向きで存在するかという広域的な変化に関するパラメーター（以下においては、「姿勢パラメーター」ともいう）ｑを考慮する。姿勢パラメーターｑは、次式（７）によって表される。
ｑ＝[roll scale trans_x trans_y] ・・・（７）
ここで、rollは、画像平面に対するモデルの回転角度を表し、scaleは、モデルのサイズを表し、trans_x及びtrans_yは、それぞれｘ軸方向及びｙ軸方向におけるモデルの平行移動量を表している。 Next, consider a parameter (hereinafter, also referred to as “posture parameter”) q relating to a wide range of changes in where, in what size, and in what direction the object exists in the image. The posture parameter q is expressed by the following equation (7).
q = [roll scale trans_x trans_y] ・・・ (7)
Here, roll represents the rotation angle of the model with respect to the image plane, scale represents the size of the model, and trans_x and trans_y represent the amount of translation of the model in the x-axis direction and the y-axis direction, respectively.

アクティブ・アピアランス・モデルにおいて、モデルの探索とは、モデルを結合パラメーターｃ及び姿勢パラメーターｑによって局所的及び広域的に変化させて対象の画像を生成し、生成された画像と入力画像とを比較して、誤差が最小となるような結合パラメーターｃ及び姿勢パラメーターｑを求めることである。アクティブ・アピアランス・モデルによれば、対象の方向の変化に対して頑健かつ高速に特徴点を抽出することが可能である。 In the active appearance model, the model search is to generate the target image by changing the model locally and widely by the coupling parameter c and the posture parameter q, and compare the generated image with the input image. Therefore, the coupling parameter c and the attitude parameter q are obtained so that the error is minimized. According to the active appearance model, it is possible to extract feature points robustly and at high speed in response to changes in the direction of the target.

具体的には、ある結合パラメーターｃ'及び姿勢パラメーターｑ'に対して、結合パラメーターｃ'から得られる形状パラメーターｂ_ｓ'と姿勢パラメーターｑ'とによって形状Ｘを変形する関数をＷ（Ｘ；ｑ'，ｂ_ｓ'）とする。また、入力画像Ｉｍｇと形状Ｘとが与えられたときに形状Ｘ内の輝度値を求める関数をＩ（Ｉｍｇ，Ｘ）とすると、モデルの探索における誤差値Ｅｒは、次式（８）によって表される。
Ｅｒ＝［（ｖ＋Ｐ_ｇＱ_ｇｃ'）−Ｉ（Ｉｍｇ，Ｗ（Ｘ；ｑ'，ｂ_ｓ'））］^２
・・・（８） Specifically, for a certain coupling parameter c'and posture parameter q', a function that transforms the shape X by the shape _{parameter b s'obtained from the coupling parameter c'and the posture parameter q'is W (X; q).} ', B _s '). Further, assuming that the function for obtaining the luminance value in the shape X when the input image Img and the shape X are given is I (Img, X), the error value Er in the model search is expressed by the following equation (8). Will be done.
Er = [(v + P _g Q _g c')-I (Img, W (X; q', b _s '))] ²
... (8)

例えば、被検者の顔を構成するＫ個の形状Ｘ（１）、Ｘ（２）、・・・、Ｘ（Ｋ）についてそれぞれの誤差値が求められる場合に（Ｋは自然数）、それぞれの誤差値をＥｒ（１）、Ｅｒ（２）、・・・、Ｅｒ（Ｋ）とすると、顔認識処理における認識誤差を表す指標であるフィット率Ｆｒは、次式（９）によって表される。
Ｆｒ＝（Ｅｒ（１）＋Ｅｒ（２）＋・・・＋Ｅｒ（Ｋ））／Ｋ・・・（９）
従って、誤差値Ｅｒ又はフィット率Ｆｒが小さくなるような結合パラメーターｃ及び姿勢パラメーターｑを決定することにより、高精度な顔認識処理を行うことができる。 For example, when the error values of the K shapes X (1), X (2), ..., X (K) constituting the subject's face are obtained (K is a natural number), each of them. Assuming that the error values are Er (1), Er (2), ..., Er (K), the fit ratio Fr, which is an index representing the recognition error in the face recognition process, is expressed by the following equation (9).
Fr = (Er (1) + Er (2) + ... + Er (K)) / K ... (9)
Therefore, highly accurate face recognition processing can be performed by determining the coupling parameter c and the posture parameter q such that the error value Er or the fit rate Fr becomes small.

次に、画像処理部５３は、入力画像における顔認識の結果として求められた被検者の顔のフィット率が予め設定された閾値以下であるか否かを判定する。画像処理部５３は、フィット率が閾値以下である場合に、被検者の顔を認識できたと判定し、フィット率が閾値を超えた場合に、被検者の顔を認識できなかったと判定する。 Next, the image processing unit 53 determines whether or not the fit rate of the subject's face obtained as a result of face recognition in the input image is equal to or less than a preset threshold value. The image processing unit 53 determines that the face of the subject could be recognized when the fit rate is equal to or less than the threshold value, and determines that the face of the subject could not be recognized when the fit rate exceeds the threshold value. ..

＜画像解析＞
画像解析部５４は、座標データ格納部７５から所定数のフレームにおける複数の特徴点の座標を読み出して、所定数のフレームにおける複数の特徴点の座標に基づいて被検者の顔の動き量を算出し、評価期間における被検者の顔の動き量の統計処理に基づいて被検者の視覚的な評価を行う。動画像データが１秒間に２４フレームの画像を表す場合に、所定数のフレームは、１秒間に相当する２４フレームでも良い。 <Image analysis>
The image analysis unit 54 reads out the coordinates of a plurality of feature points in a predetermined number of frames from the coordinate data storage unit 75, and calculates the amount of movement of the subject's face based on the coordinates of the plurality of feature points in the predetermined number of frames. The calculation is performed, and the subject is visually evaluated based on the statistical processing of the amount of movement of the subject's face during the evaluation period. When the moving image data represents an image of 24 frames per second, the predetermined number of frames may be 24 frames corresponding to 1 second.

例えば、画像解析部５４は、動画像データによって表される被検者の画像における特定の部位の位置に基づいて定められる第１の軸、第１の軸に略直交する第２の軸、又は、第１及び第２の軸に略直交する第３の軸を回転中心とする被検者の顔の動き量を算出しても良い。その場合に、被検者の顔の動き量は、所定数のフレームにおける被検者の顔の向きの変化に基づいて算出される。 For example, the image analysis unit 54 has a first axis determined based on the position of a specific part in the image of the subject represented by the moving image data, a second axis substantially orthogonal to the first axis, or a second axis. , The amount of movement of the subject's face with the third axis substantially orthogonal to the first and second axes as the center of rotation may be calculated. In that case, the amount of movement of the subject's face is calculated based on the change in the orientation of the subject's face in a predetermined number of frames.

図５は、第１の軸を回転中心とする被検者の顔の動き量を算出するために用いられる画像の例を示す図である。この例において、第１の軸は、被検者の右目頭と左目頭とを結ぶ線（図中のＸ軸）に平行な軸である。例えば、第１の軸を回転中心とする被検者の顔の動き量を算出するために、被検者の右目頭と左目頭との中点に位置する特徴点Ｐ０と、被検者の鼻の右端に位置する特徴点Ｐ１と、被検者の鼻の左端に位置する特徴点Ｐ２と、被検者の右口角と左口角との中点に位置する特徴点Ｐ３とが用いられる。 FIG. 5 is a diagram showing an example of an image used for calculating the amount of movement of the subject's face with the first axis as the center of rotation. In this example, the first axis is an axis parallel to the line connecting the right and left inner corners of the subject (X-axis in the figure). For example, in order to calculate the amount of movement of the subject's face with the first axis as the center of rotation, the feature point P0 located at the midpoint between the right and left inner corners of the subject and the subject's A feature point P1 located at the right end of the nose, a feature point P2 located at the left end of the nose of the subject, and a feature point P3 located at the midpoint between the right and left mouth corners of the subject are used.

図５に示すように、特徴点Ｐ０〜Ｐ２によって図中上側の第１の三角形が形成され、特徴点Ｐ１〜Ｐ３によって図中下側の第２の三角形が形成される。ビデオカメラ等の撮像素子から見た第１の三角形と第２の三角形との面積又は高さの比の値が、第１の軸を回転中心とする動きにおける被検者の顔の向きを表す量として用いられる。 As shown in FIG. 5, the feature points P0 to P2 form the first triangle on the upper side in the figure, and the feature points P1 to P3 form the second triangle on the lower side in the figure. The value of the area or height ratio of the first triangle and the second triangle as seen from an image sensor such as a video camera represents the orientation of the subject's face in the movement centered on the first axis. Used as a quantity.

図６は、第１の軸を回転中心とする被検者の顔の動きによる第１の三角形と第２の三角形との高さの比の変化を説明するための図である。図６の左側に示すように、被検者がビデオカメラ等の撮像素子の前面に顔を向けている場合に、撮像素子から見た第１の三角形の高さＨ１と第２の三角形の高さＨ２との比の値Ｈ１／Ｈ２がＡであるものとする。 FIG. 6 is a diagram for explaining a change in the height ratio of the first triangle and the second triangle due to the movement of the subject's face with the first axis as the center of rotation. As shown on the left side of FIG. 6, when the subject faces the front surface of the image sensor such as a video camera, the heights of the first triangle H1 and the heights of the second triangles seen from the image sensor are high. It is assumed that the value H1 / H2 of the ratio with H2 is A.

一方、図６の右側に示すように、被検者がうなずく等してビデオカメラ等の撮像素子の前面よりも下側に顔を向けている場合には、第２の三角形が第１の三角形よりも撮像素子の前面から遠くなると共に角度が変化するので、撮像素子から見た第１の三角形の高さＨ１'と第２の三角形の高さＨ２'との比の値Ｈ１'／Ｈ２'がＡ'（Ａ'＞Ａ）になる。なお、画像解析部５４は、第１の三角形及び第２の三角形の高さ等をピクセル数として求めても良い。それにより、距離の算出が簡単になる。 On the other hand, as shown on the right side of FIG. 6, when the subject nods and faces the face below the front surface of the image sensor such as a video camera, the second triangle is the first triangle. Since the angle changes as the distance from the front surface of the image sensor increases, the value of the ratio between the height H1'of the first triangle and the height H2' of the second triangle seen from the image sensor H1'/ H2' Becomes A'(A'> A). The image analysis unit 54 may obtain the heights of the first triangle and the second triangle as the number of pixels. This simplifies the calculation of the distance.

図７は、第２の軸を回転中心とする被検者の顔の動き量を算出するために用いられる画像の例を示す図である。この例において、第２の軸は、被検者の右目頭と左目頭との中点と右口角と左口角との中点とを結ぶ線（図中のＹ軸）に平行な軸である。例えば、第２の軸を回転中心とする被検者の顔の動き量を算出するために、被検者の鼻の右端に位置する特徴点Ｐ１と、被検者の鼻の左端に位置する特徴点Ｐ２と、被検者の右口角に位置する特徴点Ｐ４と、被検者の左口角に位置する特徴点Ｐ５と、被検者の右目頭に位置する特徴点Ｐ６と、被検者の左目頭に位置する特徴点Ｐ７とが用いられる。 FIG. 7 is a diagram showing an example of an image used for calculating the amount of movement of the subject's face with the second axis as the center of rotation. In this example, the second axis is an axis parallel to the line (Y-axis in the figure) connecting the midpoint between the right and left inner corners of the subject and the midpoint between the right and left corners of the mouth. .. For example, in order to calculate the amount of movement of the subject's face with the second axis as the center of rotation, the feature point P1 located at the right end of the subject's nose and the feature point P1 located at the left end of the subject's nose are located. The feature point P2, the feature point P4 located at the right mouth corner of the subject, the feature point P5 located at the left mouth corner of the subject, the feature point P6 located at the right inner corner of the subject, and the subject. The feature point P7 located at the inner corner of the left eye is used.

図７に示すように、特徴点Ｐ１、Ｐ４、Ｐ６によって図中左側の第１の三角形が形成され、特徴点Ｐ２、Ｐ５、Ｐ７によって図中右側の第２の三角形が形成される。ビデオカメラ等の撮像素子から見た第１の三角形と第２の三角形との面積の比の値が、第２の軸を回転中心とする動きにおける被検者の顔の向きを表す量として用いられる。 As shown in FIG. 7, the feature points P1, P4, and P6 form the first triangle on the left side of the figure, and the feature points P2, P5, and P7 form the second triangle on the right side of the figure. The value of the ratio of the area of the first triangle and the area of the second triangle as seen from the image sensor of a video camera or the like is used as a quantity indicating the orientation of the subject's face in the movement centered on the second axis. Be done.

図８は、第２の軸を回転中心とする被検者の顔の動きによる第１の三角形と第２の三角形との面積比の変化を説明するための図である。図８の左側に示すように、被検者がビデオカメラ等の撮像素子の前面に顔を向けている場合に、撮像素子から見た第１の三角形の面積Ｓ１と第２の三角形の面積Ｓ２との比の値Ｓ１／Ｓ２がＢであるものとする。 FIG. 8 is a diagram for explaining a change in the area ratio between the first triangle and the second triangle due to the movement of the subject's face with the second axis as the center of rotation. As shown on the left side of FIG. 8, when the subject faces the front surface of the image sensor such as a video camera, the area S1 of the first triangle and the area S2 of the second triangle seen from the image sensor It is assumed that the value S1 / S2 of the ratio with and is B.

一方、図８の右側に示すように、被検者がビデオカメラ等の撮像素子の前面よりも右側に顔を向けている場合には、第１の三角形が第２の三角形よりも撮像素子の前面から遠くなると共に角度が変化するので、撮像素子から見た第１の三角形の面積Ｓ１'と第２の三角形の面積Ｓ２'との比の値Ｓ１'／Ｓ２'がＢ'（Ｂ'＜Ｂ）になる。 On the other hand, as shown on the right side of FIG. 8, when the subject faces the right side of the front surface of the image sensor such as a video camera, the first triangle is the image sensor rather than the second triangle. Since the angle changes as the distance from the front surface increases, the ratio value S1'/ S2'of the area S1'of the first triangle and the area S2' of the second triangle seen from the image sensor is B'(B'< B).

図９は、第３の軸を回転中心とする被検者の顔の動き量を算出するために用いられる画像の例を示す図である。この例において、第３の軸は、図中のＸ軸及びＹ軸に直交するＺ軸に平行な軸である。例えば、第３の軸を回転中心とする被検者の顔の動き量を算出するために、被検者の右目頭に位置する特徴点Ｐ６と、被検者の左目頭に位置する特徴点Ｐ７とが用いられる。 FIG. 9 is a diagram showing an example of an image used for calculating the amount of movement of the subject's face with the third axis as the center of rotation. In this example, the third axis is an axis parallel to the Z axis orthogonal to the X and Y axes in the figure. For example, in order to calculate the amount of movement of the subject's face with the third axis as the center of rotation, the feature point P6 located at the right inner corner of the subject and the feature point located at the left inner corner of the subject. P7 and is used.

図９に示すように、被検者の右目頭に位置する特徴点Ｐ６と左目頭に位置する特徴点Ｐ７とを結ぶ線（図中の実線）と、Ｘ軸に平行な線（図中の破線）とがなす角度θ、又は、角度θの三角関数値（ｓｉｎθ、ｃｏｓθ、ｔａｎθ等）が、第３の軸を回転中心とする動きにおける被検者の顔の向きを表す量として用いられる。 As shown in FIG. 9, a line connecting the feature point P6 located at the inner corner of the right eye of the subject and the feature point P7 located at the inner corner of the left eye (solid line in the figure) and a line parallel to the X axis (in the figure). The angle θ formed by the broken line) or the trigonometric function value of the angle θ (sinθ, cosθ, tanθ, etc.) is used as a quantity representing the orientation of the subject's face in the movement centered on the third axis. ..

再び図１を参照すると、画像解析部５４は、所定数のフレームにおける被検者の顔の向きを表す量の分散値を被検者の顔の動き量として算出し、評価期間における分散値の確率分布に基づいて被検者の視覚的評価に関するランクを判定する。例えば、１つの分散値を算出するために２４フレームを１ブロックとして扱う場合に、連続する２つのブロックの各々が、オーバーラップする１２フレームを含んでも良い。 Referring to FIG. 1 again, the image analysis unit 54 calculates a dispersion value of an amount representing the orientation of the subject's face in a predetermined number of frames as the amount of movement of the subject's face, and determines the dispersion value of the dispersion value during the evaluation period. Determine the rank of the subject for visual evaluation based on the probability distribution. For example, when 24 frames are treated as one block for calculating one dispersion value, each of two consecutive blocks may include 12 overlapping frames.

所定数（Ｌ個）のフレームについて、Ｘ軸に平行な第１の軸を回転中心とする動きにおける被検者の顔の向きを表す量Ｘ（ｉ）の分散値Ｖ_Ｘは、次式（１０）で定義される（Ｌは２以上の整数）。
Ｖ_Ｘ＝（１／Ｌ）Σ（Ｘ（ｉ）−Ｅ_Ｘ）^２・・・（１０）
ここで、ｉ＝１〜Ｌであり、Ｅ_Ｘは、Ｌ個のフレームにおけるＸ（ｉ）の平均値である。 _{For a predetermined number (L pieces) of frames, the dispersion value V X} of the quantity X (i) representing the orientation of the subject's face in the movement centered on the first axis parallel to the X axis is given by the following equation ( It is defined in 10) (L is an integer of 2 or more).
_{V X = (1 / L)} Σ (X (i) -E X) 2 ··· (10)
Here, a i = 1 to L, _{E X} is the mean value of X (i) in the L frames.

例えば、評価期間における２４×５５フレームの動画像データに基づいて、複数の分散値Ｖ_Ｘが得られる。画像解析部５４は、評価期間における複数の分散値Ｖ_Ｘを、その大きさに応じてＭ個の階級に分類することにより（Ｍは２以上の整数）、第ｊ番目の階級に属する分散値Ｖ_Ｘ（ｊ）の存在確率Ｐ_Ｘ（ｊ）を求める（ｊ＝１〜Ｍ）。 _{For example, a plurality of variance values V X} can be obtained based on the moving image data of 24 × 55 frames in the evaluation period. The image analysis unit 54 _{classifies the plurality of variance values V X} in the evaluation period into M classes according to their size (M is an integer of 2 or more), and thereby the variance values belonging to the jth class. determine the V _X existence probability _P X of the (j) (j) (j = 1~M).

同様に、Ｌ個のフレームについて、Ｙ軸に平行な第２の軸を回転中心とする動きにおける被検者の顔の向きを表す量Ｙ（ｉ）の分散値Ｖ_Ｙは、次式（１１）で定義される。
Ｖ_Ｙ＝（１／Ｌ）Σ（Ｙ（ｉ）−Ｅ_Ｙ）^２・・・（１１）
ここで、ｉ＝１〜Ｌであり、Ｅ_Ｙは、Ｌ個のフレームにおけるＹ（ｉ）の平均値である。画像解析部５４は、評価期間における複数の分散値Ｖ_Ｙを、その大きさに応じてＭ個の階級に分類することにより、第ｊ番目の階級に属する分散値Ｖ_Ｙ（ｊ）の存在確率Ｐ_Ｙ（ｊ）を求める（ｊ＝１〜Ｍ）。 _{Similarly, for L frames, the variance value V Y} of the quantity Y (i) representing the orientation of the subject's face in the movement centered on the second axis parallel to the Y axis is given by the following equation (11). ).
_{V Y = (1 / L)} Σ (Y (i) -E Y) 2 ··· (11)
Here, a i = 1 to L, _{E Y} is the average value of Y (i) in the L frames. The image analysis unit 54 _{classifies the plurality of variance values V Y} in the evaluation period into M classes according to their size, and thereby, the existence probability of the _{variance value V Y (j) belonging to the jth class.} P _Y (j) is obtained (j = 1 to M).

また、Ｌ個のフレームについて、Ｚ軸に平行な第３の軸を回転中心とする動きにおける被検者の顔の向きを表す量Ｚ（ｉ）の分散値Ｖ_Ｚは、次式（１２）で定義される。
Ｖ_Ｚ＝（１／Ｌ）Σ（Ｚ（ｉ）−Ｅ_Ｚ）^２・・・（１２）
ここで、ｉ＝１〜Ｌであり、Ｅ_Ｚは、Ｌ個のフレームにおけるＺ（ｉ）の平均値である。画像解析部５４は、評価期間における複数の分散値Ｖ_Ｚを、その大きさに応じてＭ個の階級に分類することにより、第ｊ番目の階級に属する分散値Ｖ_Ｚ（ｊ）の存在確率Ｐ_Ｚ（ｊ）を求める（ｊ＝１〜Ｍ）。 _{Further, for the L frames, the variance value V Z} of the quantity Z (i) representing the direction of the subject's face in the movement centered on the third axis parallel to the Z axis is given by the following equation (12). Defined in.
V _Z = (1 / L) Σ (Z (i) -E _Z ) ² ... (12)
Here, i = 1 to _{L, and E Z} is the average value of Z (i) in L frames. The image analysis unit 54 _{classifies the plurality of variance values V Z} in the evaluation period into M classes according to their size, and thereby, the existence probability of the _{variance value V Z (j) belonging to the jth class.} P _Z (j) is obtained (j = 1 to M).

図１０は、評価期間における被検者の顔の向きを表す量の分散値の確率分布の例を示す図である。図１０において、横軸は、３種類の分散値Ｖ_Ｘ（ｊ）、Ｖ_Ｙ（ｊ）、Ｖ_Ｚ（ｊ）を階級値５０刻みで表しており、縦軸は、その存在確率Ｐ_Ｘ（ｊ）、Ｐ_Ｙ（ｊ）、Ｐ_Ｚ（ｊ）を表している。なお、３種類の分散値を１つの図に表示するために、３種類の分散値は位置をずらして表示されている。それらの内の少なくとも１種類の分散値が、被検者の顔の動き量として用いられる。 FIG. 10 is a diagram showing an example of a probability distribution of a variance value of an amount representing the orientation of the subject's face during the evaluation period. 10, the horizontal axis three variance _{_{V X (j), V Y}} (j), represents in class value 50 increments the _V Z (j), the vertical axis, the existence probability _P X ( It represents j), P _Y (j), and P _Z (j). In order to display the three types of dispersion values in one figure, the three types of dispersion values are displayed with their positions shifted. At least one of them is used as the amount of facial movement of the subject.

図１に示す画像解析部５４は、例えば、評価期間に相当する動画像データにおいて被検者の顔の特徴点の座標を求めることができた割合が一定の割合（例えば６０％）に達しない場合に、その被検者の画像をランクＲＶ０（未評価）と判定する。一方、画像解析部５４は、被検者の顔の特徴点の座標を求めることができた割合が一定の割合以上である場合に、被検者の顔の動き量に応じて、その被検者の画像をランクＲＶ１以上の複数のランクのいずれかに分類する。 In the image analysis unit 54 shown in FIG. 1, for example, the ratio at which the coordinates of the feature points of the subject's face can be obtained in the moving image data corresponding to the evaluation period does not reach a certain ratio (for example, 60%). In this case, the image of the subject is determined to be rank RV0 (unevaluated). On the other hand, when the ratio at which the coordinates of the feature points of the subject's face can be obtained is equal to or more than a certain ratio, the image analysis unit 54 examines the subject according to the amount of movement of the subject's face. The image of the person is classified into one of a plurality of ranks of rank RV1 or higher.

一般に、被検者が言葉に詰まって考えながら話す場合には、顔の動きが止まりがちになり、被検者が説得力を持って流暢に話す場合には、顔の動きが活発になる。そこで、画像解析部５４は、被検者の顔の動き量が所定の基準量よりも総体的に小さければ、その被検者の画像をランクＲＶ１（小さい動き）と判定し、被検者の顔の動き量が基準量よりも総体的に大きければ、その被検者の画像をランクＲＶ２（大きい動き）と判定しても良い。 In general, when the subject speaks while thinking in words, the movement of the face tends to stop, and when the subject speaks convincingly and fluently, the movement of the face becomes active. Therefore, if the amount of movement of the subject's face is generally smaller than the predetermined reference amount, the image analysis unit 54 determines that the image of the subject is rank RV1 (small movement), and determines that the subject's image has a rank RV1 (small movement). If the amount of movement of the face is generally larger than the reference amount, the image of the subject may be determined to be rank RV2 (large movement).

例えば、画像解析部５４は、少なくとも１種類の分散値の確率分布を、予め設定された基準量の確率分布と比較して、被検者の視覚的評価に関するランクを判定しても良い。そのために、学習データ格納部７３には、例えば、インターシップ応募者等の疑似被検者の顔を撮像して得られた動画像データと、実際に評価者がその画像を評価して判定したランク等を表す評価データとが、判定学習データとして予め格納されている。画像解析部５４は、判定学習データを用いて機械学習を行うＡＩ（人工知能）として機能することにより、判定学習データに近い判定結果が得られるように基準量の確率分布や比較方法を設定して、被検者の視覚的評価に関するランクを判定しても良い。 For example, the image analysis unit 54 may compare the probability distribution of at least one type of dispersion value with the probability distribution of a preset reference amount to determine the rank regarding the visual evaluation of the subject. Therefore, in the learning data storage unit 73, for example, moving image data obtained by imaging the face of a pseudo-subject such as an internship applicant and a rank determined by the evaluator by actually evaluating the image. Evaluation data representing the above is stored in advance as determination learning data. The image analysis unit 54 sets the probability distribution of the reference amount and the comparison method so that the judgment result close to the judgment learning data can be obtained by functioning as an AI (artificial intelligence) that performs machine learning using the judgment learning data. Then, the rank regarding the visual evaluation of the subject may be determined.

あるいは、画像解析部５４は、少なくとも１つの階級に属する分散値Ｖ_Ｘ（ｊ）、Ｖ_Ｙ（ｊ）、Ｖ_Ｚ（ｊ）の合計値又は平均値を被検者の顔の動き量として求め、被検者の顔の動き量が所定の基準量よりも小さければ、その被検者の画像をランクＲＶ１（小さい動き）と判定し、被検者の顔の動き量が基準量よりも大きければ、その被検者の画像をランクＲＶ２（大きい動き）と判定しても良い。 Alternatively, the image analysis unit 54 _{obtains the total value or the average value of the variance values V X} (j), V _Y (j), and V _Z (j) belonging to at least one class as the amount of movement of the subject's face. If the amount of movement of the subject's face is smaller than the predetermined reference amount, the image of the subject is judged to be rank RV1 (small movement), and the amount of movement of the subject's face is larger than the reference amount. For example, the image of the subject may be determined to be rank RV2 (large movement).

なお、評価期間に相当する動画像データにおいて被検者の顔が録画されている期間が一定の期間（例えば１５秒）に達しないような場合には、画像解析部５４は、その被検者の画像をランクＲＶ０と判定しても良い。画像解析部５４は、このようにして得られた視覚的評価結果を表す視覚的評価データを評価データ格納部７４に格納する。 If the period in which the subject's face is recorded does not reach a certain period (for example, 15 seconds) in the moving image data corresponding to the evaluation period, the image analysis unit 54 determines the subject. The image of may be determined to be rank RV0. The image analysis unit 54 stores the visual evaluation data representing the visual evaluation result thus obtained in the evaluation data storage unit 74.

＜総合評価＞
同一被検者の音声評価データ及び視覚的評価データが評価データ格納部７４に格納された場合に、総合評価部５５は、評価データ格納部７４から音声評価データ及び視覚的評価データを読み出して、音声解析部５２による評価結果と画像解析部５４による評価結果とに基づいて被検者の人物評価を行う。例えば、総合評価部５５は、音声評価における複数のランクと視覚的評価における複数のランクとに基づいて２次元状に配列された複数のマッピングエリアを用いて被検者の人物評価を行う。 <Comprehensive evaluation>
When the voice evaluation data and the visual evaluation data of the same subject are stored in the evaluation data storage unit 74, the comprehensive evaluation unit 55 reads out the voice evaluation data and the visual evaluation data from the evaluation data storage unit 74, and then reads the voice evaluation data and the visual evaluation data. The person of the subject is evaluated based on the evaluation result by the voice analysis unit 52 and the evaluation result by the image analysis unit 54. For example, the comprehensive evaluation unit 55 evaluates the person of the subject using a plurality of mapping areas arranged two-dimensionally based on a plurality of ranks in the voice evaluation and a plurality of ranks in the visual evaluation.

図１１は、被検者の人物評価を行うために用いられるマッピングエリアの例を示す図である。図１１に示すように、音声評価は、ランクＲＡ０（極めて悪い音声）と、ランクＲＡ１（悪い音声）と、ランクＲＡ２（普通の音声）と、ランクＲＡ３（良い音声）とに分かれている。一方、視覚的評価は、ランクＲＶ０（未評価）と、ランクＲＶ１（小さい動き）と、ランクＲＶ２（大きい動き）とに分かれている。 FIG. 11 is a diagram showing an example of a mapping area used for character evaluation of a subject. As shown in FIG. 11, the voice evaluation is divided into rank RA0 (extremely bad voice), rank RA1 (bad voice), rank RA2 (normal voice), and rank RA3 (good voice). On the other hand, the visual evaluation is divided into rank RV0 (unevaluated), rank RV1 (small movement), and rank RV2 (large movement).

例えば、音声評価がランクＲＡ０又はＲＡ１であるエリア０〜５と、音声評価がランクＲＡ２であって視覚的評価がランクＲＶ１であるエリア７とが、不合格エリアに設定される。なお、音声評価がランクＲＡ２であっても視覚的評価がランクＲＶ０であるエリア６は、さらなる人間チェックが必要とされる人間チェックエリアに設定される。一方、音声評価がランクＲＡ２であって視覚的評価がランクＲＶ２であるエリア８と、音声評価がランクＲＡ３であるエリア９〜１１とは、合格エリアに設定される。 For example, areas 0 to 5 having a voice evaluation of rank RA0 or RA1 and areas 7 having a voice evaluation of rank RA2 and a visual evaluation of rank RV1 are set as rejected areas. The area 6 in which the visual evaluation is rank RV0 even if the voice evaluation is rank RA2 is set as a human check area in which further human check is required. On the other hand, the area 8 in which the audio evaluation is rank RA2 and the visual evaluation is rank RV2 and the areas 9 to 11 in which the audio evaluation is rank RA3 are set as pass areas.

図１に示す総合評価部５５は、音声評価データによって表される被検者の音声に関するランクと、視覚的評価データによって表される被検者の視覚的評価に関するランクとに基づいて、図１１に示すエリア０〜１１の内の１つを選択することにより、被検者の人物評価を行う。総合評価部５５は、このようにして得られた人物評価結果を表す人物評価データを評価データ格納部７４に格納する。 The comprehensive evaluation unit 55 shown in FIG. 1 is based on the rank related to the voice of the subject represented by the voice evaluation data and the rank related to the visual evaluation of the subject represented by the visual evaluation data. By selecting one of the areas 0 to 11 shown in the above, the person of the subject is evaluated. The comprehensive evaluation unit 55 stores the person evaluation data representing the person evaluation result thus obtained in the evaluation data storage unit 74.

＜人物評価方法＞
次に、本発明の一実施形態に係る人物評価装置において用いられる人物評価方法について、図１〜図１２を参照しながら説明する。図１２は、本発明の一実施形態に係る人物評価方法を示すフローチャートである。なお、互いに独立な処理については、それらを並列に行っても良い。 <Person evaluation method>
Next, the person evaluation method used in the person evaluation device according to the embodiment of the present invention will be described with reference to FIGS. 1 to 12. FIG. 12 is a flowchart showing a person evaluation method according to an embodiment of the present invention. For processes that are independent of each other, they may be performed in parallel.

図１２に示すステップＳ１１において、音声処理部５１が、被検者の音声を収録して得られる音声データを単位時間当りのデータブロック毎にフーリエ変換し、データブロック毎に複数の周波数帯域における音圧分布を表す声紋データを生成する。 In step S11 shown in FIG. 12, the voice processing unit 51 Fourier transforms the voice data obtained by recording the voice of the subject for each data block per unit time, and sounds in a plurality of frequency bands for each data block. Generate voiceprint data representing the pressure distribution.

ステップＳ１２において、音声解析部５２が、複数の周波数帯域における音圧の大きさ及び広がりに応じてデータブロックを分類し、所定数のデータブロックの分類結果に基づいて被検者の音声に関する評価を行う。それにより、被検者の音声に関するランクが判定される。 In step S12, the voice analysis unit 52 classifies data blocks according to the magnitude and spread of sound pressure in a plurality of frequency bands, and evaluates the voice of the subject based on the classification results of a predetermined number of data blocks. conduct. Thereby, the rank regarding the voice of the subject is determined.

ステップＳ１３において、画像処理部５３が、被検者の顔を撮像して得られる動画像データに対してフレーム毎に顔認識処理を施すことにより、被検者の顔において認識される複数の特徴点を抽出し、複数の特徴点の座標を求める。 In step S13, the image processing unit 53 performs face recognition processing on the moving image data obtained by imaging the face of the subject for each frame, so that a plurality of features recognized on the face of the subject are recognized. Extract points and obtain the coordinates of multiple feature points.

ステップＳ１４において、画像解析部５４が、所定数のフレームにおける複数の特徴点の座標に基づいて被検者の顔の動き量を算出し、評価期間における被検者の顔の動き量の統計処理に基づいて被検者の視覚的な評価を行う。それにより、被検者の視覚的評価に関するランクが判定される。 In step S14, the image analysis unit 54 calculates the amount of movement of the subject's face based on the coordinates of a plurality of feature points in a predetermined number of frames, and statistically processes the amount of movement of the subject's face during the evaluation period. A visual evaluation of the subject is made based on. Thereby, the rank regarding the visual evaluation of the subject is determined.

ステップＳ１５において、総合評価部５５が、ステップＳ１２における評価結果とステップＳ１４における評価結果とに基づいて被検者の人物評価を行う。その際に、総合評価部５５は、例えば、図１１に示すようなマッピングエリアを用いて、被検者の音声に関するランクと被検者の視覚的評価に関するランクとに基づいて被検者の人物評価を行う。 In step S15, the comprehensive evaluation unit 55 evaluates the person of the subject based on the evaluation result in step S12 and the evaluation result in step S14. At that time, the comprehensive evaluation unit 55 uses, for example, a mapping area as shown in FIG. 11, and uses the subject's person based on the subject's voice rank and the subject's visual evaluation rank. Make an evaluation.

＜動画像データの処理フロー＞
図１３及び図１４は、動画像データの処理フローの例を示すフローチャートである。この例において、動画像データは、１秒間に２４フレームの画像を表している。
図１３に示すステップＳ２１において、画像処理部５３が、被検者の視覚的評価のために人物評価装置に供給される動画像データを格納する生データ格納部７１から、撮像開始後５秒〜６０秒の５５秒間の評価期間における画像を表す動画像データを取得して、フレーム番号ｎを１に設定する。 <Processing flow of moving image data>
13 and 14 are flowcharts showing an example of the processing flow of moving image data. In this example, the moving image data represents an image of 24 frames per second.
In step S21 shown in FIG. 13, the image processing unit 53 receives the raw data storage unit 71 that stores the moving image data supplied to the person evaluation device for the visual evaluation of the subject from 5 seconds after the start of imaging. The moving image data representing the image in the evaluation period of 60 seconds and 55 seconds is acquired, and the frame number n is set to 1.

ステップＳ２２において、画像処理部５３が、第ｎフレームの動画像データに対して顔認識処理を施すことにより、被検者の顔において認識される複数の特徴点を抽出し、それらの特徴点の座標を求める。さらに、ステップＳ２３において、画像処理部５３が、複数の特徴点の座標を、フレーム番号と共に座標データ格納部７５に格納する。 In step S22, the image processing unit 53 performs face recognition processing on the moving image data of the nth frame to extract a plurality of feature points recognized on the face of the subject, and of these feature points. Find the coordinates. Further, in step S23, the image processing unit 53 stores the coordinates of the plurality of feature points in the coordinate data storage unit 75 together with the frame numbers.

ステップＳ２４において、画像処理部５３が、フレーム番号ｎが１３２０（＝２４×５５）であるか、又は、第ｎフレームが動画像データの最終フレームであるか否かを判定する。フレーム番号ｎが１３２０よりも小さく、第ｎフレームが動画像データの最終フレームでない場合には、画像処理部５３が、フレーム番号ｎをインクリメントして（ｎ＋１）とし、処理がステップＳ２２に戻る。一方、フレーム番号ｎが１３２０であるか、又は、第ｎフレームが動画像データの最終フレームである場合には、処理がステップＳ２５〜Ｓ２７のいずれかに移行する。あるいは、ステップＳ２５〜Ｓ２７が順次処理されても良いし、並列処理されても良い。 In step S24, the image processing unit 53 determines whether the frame number n is 1320 (= 24 × 55) or the nth frame is the final frame of the moving image data. If the frame number n is smaller than 1320 and the nth frame is not the final frame of the moving image data, the image processing unit 53 increments the frame number n to (n + 1), and the process returns to step S22. On the other hand, when the frame number n is 1320 or the nth frame is the final frame of the moving image data, the process proceeds to any one of steps S25 to S27. Alternatively, steps S25 to S27 may be sequentially processed or may be processed in parallel.

ステップＳ２５において、画像解析部５４が、座標データ格納部７５から各フレームにおける複数の特徴点の座標を読み出して、第１の軸を回転中心とする被検者の顔の動き量を算出するために必要な第１の三角形及び第２の三角形の面積又は高さをフレーム毎に求める。その後、処理がステップＳ２８（図１４）に移行する。 In step S25, the image analysis unit 54 reads out the coordinates of a plurality of feature points in each frame from the coordinate data storage unit 75, and calculates the amount of movement of the subject's face with the first axis as the rotation center. The area or height of the first triangle and the second triangle required for the above is obtained for each frame. After that, the process proceeds to step S28 (FIG. 14).

ステップＳ２６において、画像解析部５４が、座標データ格納部７５から各フレームにおける複数の特徴点の座標を読み出して、第２の軸を回転中心とする被検者の顔の動き量を算出するために必要な第１の三角形及び第２の三角形の面積をフレーム毎に求める。その後、処理がステップＳ２８（図１４）に移行する。 In step S26, the image analysis unit 54 reads out the coordinates of a plurality of feature points in each frame from the coordinate data storage unit 75, and calculates the amount of movement of the subject's face with the second axis as the rotation center. The areas of the first triangle and the second triangle required for the above are obtained for each frame. After that, the process proceeds to step S28 (FIG. 14).

ステップＳ２７において、画像解析部５４が、座標データ格納部７５から各フレームにおける複数の特徴点の座標を読み出して、第３の軸を回転中心とする被検者の顔の動き量を算出するために、被検者の顔の向きを表す量として、左右の目頭を結ぶ線の角度等をフレーム毎に求める。その後、処理がステップＳ２９（図１４）に移行する。 In step S27, the image analysis unit 54 reads out the coordinates of a plurality of feature points in each frame from the coordinate data storage unit 75, and calculates the amount of movement of the subject's face with the third axis as the rotation center. In addition, the angle of the line connecting the left and right inner corners of the eye is obtained for each frame as an amount indicating the direction of the subject's face. After that, the process proceeds to step S29 (FIG. 14).

図１４に示すステップＳ２８において、画像解析部５４が、被検者の顔の向きを表す量として、第１の三角形と第２の三角形との面積又は高さの比の値をフレーム毎に求める。その後、処理がステップＳ２９に移行する。 In step S28 shown in FIG. 14, the image analysis unit 54 obtains the value of the area or height ratio of the first triangle and the second triangle for each frame as an amount representing the direction of the face of the subject. .. After that, the process proceeds to step S29.

ステップＳ２９において、画像解析部５４が、２４フレーム（１秒間）における被検者の顔の向きを表す量の分散値を被検者の顔の動き量として算出する。さらに、ステップＳ３０において、画像解析部５４が、評価期間（５５秒間）において得られる複数の分散値を、その大きさに応じて複数の階級に分類することにより、各々の分散値の存在確率を求める。 In step S29, the image analysis unit 54 calculates the dispersion value of the amount representing the orientation of the subject's face in 24 frames (1 second) as the amount of movement of the subject's face. Further, in step S30, the image analysis unit 54 classifies the plurality of variance values obtained in the evaluation period (55 seconds) into a plurality of classes according to their size, thereby determining the existence probability of each variance value. Ask.

ステップＳ３１において、画像解析部５４が、評価期間における分散値の確率分布に基づいて被検者の視覚的評価に関するランクを判定する。それにより、評価期間に相当する動画像データにおいて被検者の顔の特徴点の座標を求めることができた割合が一定の割合以上である場合に、被検者の顔の動き量に応じて、その被検者の画像が複数のランクのいずれかに分類される。 In step S31, the image analysis unit 54 determines the rank of the subject regarding the visual evaluation based on the probability distribution of the variance values during the evaluation period. As a result, when the ratio at which the coordinates of the feature points of the subject's face can be obtained in the moving image data corresponding to the evaluation period is a certain ratio or more, the movement amount of the subject's face is adjusted. , The subject's image is classified into one of multiple ranks.

以上説明したように、本発明の一実施形態によれば、被検者の音声を収録して得られる音声データからデータブロック毎に複数の周波数帯域における音圧分布を表す声紋データを生成して、複数の周波数帯域における音圧の大きさ及び広がりに応じてデータブロックを分類し、所定数のデータブロックの分類結果に基づいて被検者の音声に関する評価を行うことにより、人物評価の対象となる被検者の音声に基づいて、被検者の人物評価を行う際に参考となる情報を提供することができる。 As described above, according to the embodiment of the present invention, voiceprint data representing the sound pressure distribution in a plurality of frequency bands is generated for each data block from the sound data obtained by recording the sound of the subject. , Data blocks are classified according to the magnitude and spread of sound pressure in a plurality of frequency bands, and the voice of the subject is evaluated based on the classification results of a predetermined number of data blocks. Based on the voice of the subject, it is possible to provide information that can be used as a reference when evaluating the person of the subject.

さらに、被検者の顔を撮像して得られる動画像データから被検者の顔において認識される複数の特徴点の座標を求めて、所定数のフレームにおける複数の特徴点の座標に基づいて被検者の顔の動き量を算出し、評価期間における被検者の顔の動き量の統計処理に基づいて被検者の視覚的な評価を行うことにより、人物評価の対象となる被検者の動画像及び音声に基づいて、被検者の人物評価を行う際に参考となる情報を提供することができる。 Further, the coordinates of a plurality of feature points recognized in the subject's face are obtained from the moving image data obtained by imaging the subject's face, and based on the coordinates of the plurality of feature points in a predetermined number of frames. By calculating the amount of movement of the subject's face and performing a visual evaluation of the subject based on the statistical processing of the amount of movement of the subject's face during the evaluation period, the subject to be evaluated as a person. It is possible to provide information that can be used as a reference when evaluating a person of a subject based on the moving image and sound of the person.

以上説明した実施形態における判定方法は一例である。本発明は、それらの実施形態に限定されるものではなく、当該技術分野において通常の知識を有する者によって、本発明の技術的思想内で多くの変形が可能である。 The determination method in the embodiment described above is an example. The present invention is not limited to those embodiments, and many modifications can be made within the technical idea of the present invention by a person having ordinary knowledge in the art.

本発明は、人物を評価するために用いられる人物評価装置等において利用することが可能である。 The present invention can be used in a person evaluation device or the like used for evaluating a person.

１０…操作部、２０…表示部、３０…入出力インターフェース、４０…ネットワークインターフェース、５０…ＣＰＵ、５１…音声処理部、５２…音声解析部、５３…画像処理部、５４…画像解析部、５５…総合評価部、６０…メモリー、７０…格納部、７１…生データ格納部、７２…声紋データ格納部、７３…学習データ格納部、７４…評価データ格納部、７５…座標データ格納部 10 ... Operation unit, 20 ... Display unit, 30 ... Input / output interface, 40 ... Network interface, 50 ... CPU, 51 ... Voice processing unit, 52 ... Voice analysis unit, 53 ... Image processing unit, 54 ... Image analysis unit, 55 ... Comprehensive evaluation unit, 60 ... Memory, 70 ... Storage unit, 71 ... Raw data storage unit, 72 ... Voiceprint data storage unit, 73 ... Learning data storage unit, 74 ... Evaluation data storage unit, 75 ... Coordinate data storage unit

Claims

A voice processing unit that Fourier transforms the voice data obtained by recording the voice of the subject for each data block per unit time and generates voiceprint data representing the sound pressure distribution in a plurality of frequency bands for each data block.
When classifying data blocks according to the magnitude and spread of sound pressure in the plurality of frequency bands and evaluating the voice of the subject based on the classification results of a predetermined number of data blocks , one of the frequencies For data blocks whose sound pressure exceeds the threshold in the region, a score higher than the score of the data block whose sound pressure is below the threshold is given in all frequency regions, and the sound pressure exceeds the threshold and is maximized in the frequency band. A predetermined number of data blocks are given a score higher than the score of the data block in which the number of frequency bands in which the sound pressure exceeds the threshold value and becomes maximum is equal to or less than the predetermined value for the data blocks whose number exceeds the predetermined value. A sound analysis unit that determines the rank of the subject regarding the sound based on the total value or the average value of the block scores, and the sound analysis unit.
A person evaluation device equipped with.

By performing face recognition processing for each frame on the moving image data obtained by imaging the face of the subject, a plurality of feature points recognized on the face of the subject are extracted, and the plurality of feature points are extracted. An image processing unit that obtains the coordinates of feature points,
The amount of movement of the subject's face is calculated based on the coordinates of the plurality of feature points in a predetermined number of frames, and the amount of movement of the subject's face during the evaluation period is statistically processed. Image analysis unit that visually evaluates
The person evaluation device according to claim 1, further comprising.

A first axis defined by the image analysis unit based on the position of a specific part in the image of the subject represented by the moving image data, a second axis substantially orthogonal to the first axis, Alternatively, the person evaluation device according to claim 2 , wherein the amount of movement of the subject's face is calculated with the third axis substantially orthogonal to the first and second axes as the center of rotation.

The image analysis unit calculates a dispersion value of an amount representing the orientation of the subject's face in the predetermined number of frames as the amount of movement of the subject's face, and a probability distribution of the dispersion value in the evaluation period. The person evaluation device according to claim 2 or 3 , which determines the rank of the subject with respect to the visual evaluation based on the above.

The person evaluation device according to any one of claims 2 to 4 , further comprising a comprehensive evaluation unit that evaluates the person of the subject based on the evaluation result by the voice analysis unit and the evaluation result by the image analysis unit. ..

The procedure (a) and the procedure (a) in which the voice data obtained by recording the voice of the subject is Fourier-transformed for each data block per unit time to generate voiceprint data representing the sound pressure distribution in a plurality of frequency bands for each data block. ,
When classifying data blocks according to the magnitude and spread of sound pressure in the plurality of frequency bands and evaluating the voice of the subject based on the classification results of a predetermined number of data blocks , one of the frequencies For data blocks whose sound pressure exceeds the threshold in the region, a score higher than the score of the data block whose sound pressure is below the threshold is given in all frequency regions, and the sound pressure exceeds the threshold and is maximized in the frequency band. A predetermined number of data blocks are given a score higher than the score of the data block in which the number of frequency bands in which the sound pressure exceeds the threshold value and becomes maximum is equal to or less than the predetermined value for the data blocks whose number exceeds the predetermined value. The procedure (b) of determining the rank of the subject regarding the sound based on the total value or the average value of the block scores, and
A person evaluation program that causes the CPU to execute.

By performing face recognition processing for each frame on the moving image data obtained by imaging the face of the subject, a plurality of feature points recognized on the face of the subject are extracted, and the plurality of feature points are extracted. The procedure (c) for obtaining the coordinates of the feature points and
The amount of movement of the subject's face is calculated based on the coordinates of the plurality of feature points in a predetermined number of frames, and the amount of movement of the subject's face during the evaluation period is statistically processed. Procedure (d) for visual evaluation of
The procedure (e) for evaluating the person of the subject based on the evaluation result in the procedure (b) and the evaluation result in the procedure (d), and
6. The person evaluation program according to claim 6, wherein the CPU is further executed.

Step (a): Fourier transform the voice data obtained by recording the voice of the subject for each data block per unit time, and generate voiceprint data representing the sound pressure distribution in a plurality of frequency bands for each data block. ,
When classifying data blocks according to the magnitude and spread of sound pressure in the plurality of frequency bands and evaluating the voice of the subject based on the classification results of a predetermined number of data blocks , one of the frequencies For data blocks whose sound pressure exceeds the threshold in the region, a score higher than the score of the data block whose sound pressure is below the threshold is given in all frequency regions, and the sound pressure exceeds the threshold and is maximized in the frequency band. A predetermined number of data blocks are given a score higher than the score of the data block in which the number of frequency bands in which the sound pressure exceeds the threshold value and becomes maximum is equal to or less than the predetermined value for the data blocks whose number exceeds the predetermined value. Step (b) of determining the rank of the subject regarding the sound based on the total value or the average value of the block scores, and
Person evaluation method including.

By performing face recognition processing for each frame on the moving image data obtained by imaging the face of the subject, a plurality of feature points recognized on the face of the subject are extracted, and the plurality of feature points are extracted. Step (c) to find the coordinates of the feature points and
The amount of movement of the subject's face is calculated based on the coordinates of the plurality of feature points in a predetermined number of frames, and the amount of movement of the subject's face during the evaluation period is statistically processed. Step (d) for visual evaluation of
In step (e), the person of the subject is evaluated based on the evaluation result in step (b) and the evaluation result in step (d).
8. The person evaluation method according to claim 8.