JP4774825B2

JP4774825B2 - Performance evaluation apparatus and method

Info

Publication number: JP4774825B2
Application number: JP2005182514A
Authority: JP
Inventors: 務澤田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2005-06-22
Filing date: 2005-06-22
Publication date: 2011-09-14
Anticipated expiration: 2025-06-22
Also published as: JP2007004396A; US7523084B2; US20060293900A1

Description

本発明は、システムが求める演技に対するユーザの演技を評価する演技評価装置及び方法に関する。 The present invention relates to a performance evaluation apparatus and method for evaluating a user performance with respect to a performance required by a system.

近年、様々なアプリケーションソフトを動かすゲーム装置が発売されているが、それらの多くは、ジョイスティック、ゲームパッド等の汎用コントローラが用いられており、ボタン、スティック等の単純なモーダルでゲームをコントロールしている。 In recent years, game devices that move various application software have been released, but many of them use general-purpose controllers such as joysticks and game pads, and control games with simple modals such as buttons and sticks. Yes.

一方、例えばカメラを有し、その入力画像を用いてゲームをコントロールするゲーム装置（例えば、非特許文献１に掲載された（株）ソニー・コンピュータエンタテインメントのEye Toy(TM):Play(アイトーイプレイ)）や、マイクロフォンを有し、その入力音声を用いてゲームをコントロールするゲーム装置（例えば、非特許文献２に掲載された（株）ソニー・コンピュータエンタテインメントのOPERATORS SIDE(オペレーターズサイド)）も存在するが、これらのゲーム装置においても、画像内の動き領域、音声内容であるテキスト等、単一のモーダルでゲームをコントロールしている。 On the other hand, for example, a game device that has a camera and controls a game using the input image (for example, Eye Toy (TM): Play of Sony Computer Entertainment Inc. published in Non-Patent Document 1) ) And a game device (for example, OPERATORS SIDE (Sony Computer Entertainment Co., Ltd., published in Non-Patent Document 2)) that has a microphone and controls the game using its input sound. Also in these game devices, the game is controlled in a single modal, such as a motion area in an image and a text as audio content.

http://www.playstation.jp/land/soft/pickup/eyetoy_play.htmlhttp://www.playstation.jp/land/soft/pickup/eyetoy_play.html http://www.playstation.jp/scej/title/operatorsside/01.htmlhttp://www.playstation.jp/scej/title/operatorsside/01.html

しかし、単一のモーダルでゲームをコントロールするゲーム装置では、ユーザに単純な構成内容のゲームを供給できるのみで、例えば、ユーザが演技を行い、その評価によってストーリを進行させていくといった、複雑な構成内容を有するゲームをユーザに供給できなかった。 However, a game device that controls a game in a single modal can only supply a game with a simple configuration content to a user, for example, a user performs a performance and advances a story by the evaluation. The game having the configuration contents could not be supplied to the user.

本発明は、このような従来の実情に鑑みて提案されたものであり、システムが求める演技に対するユーザの演技を評価し、その評価結果を提供することを実現する演技評価装置及び方法を提供することを目的とする。 The present invention has been proposed in view of such a conventional situation, and provides a performance evaluation apparatus and method that realizes evaluating a user's performance with respect to a performance required by the system and providing the evaluation result. For the purpose.

上述した目的を達成するために、本発明に係る演技評価装置は、所定のシーンにおけるユーザの演技を評価する演技評価装置であって、複数のモーダルを通じて上記ユーザの演技を認識する認識手段と、上記認識手段によって得られた各モーダルについての認識結果のデータと該各モーダルについての教師データとを用いて該各モーダルについての演技評価値を算出するとともに、該各モーダルについての演技評価値の全てを用いて上記ユーザの演技を総合的に評価する総合評価値を算出して上記ユーザの現在のシーンにおける演技を評価する演技評価手段と、上記演技評価値と該演技評価値の基準値となる第１の閾値とを比較するとともに上記総合評価値と該総合評価値の基準値となる第２の閾値とを比較する比較手段と、上記総合評価値が該第２の閾値以上である場合には、次のシーンにおける上記ユーザの演技内容を該ユーザに報知し、該総合評価値が該第２の閾値未満である場合には、上記演技評価値が第１の閾値未満であるモーダルについての上記現在のシーンの演技の修正点を上記ユーザに報知する報知手段とを備えるものである。 In order to achieve the above-described object, the performance evaluation device according to the present invention is a performance evaluation device that evaluates a user's performance in a predetermined scene , and a recognition unit that recognizes the user's performance through a plurality of modals, The performance evaluation value for each modal is calculated using the recognition result data for each modal obtained by the recognition means and the teacher data for each modal, and all the performance evaluation values for each modal are calculated. And a performance evaluation means for evaluating the performance of the user in the current scene by calculating a comprehensive evaluation value for comprehensively evaluating the user's performance using the performance evaluation value and a reference value for the performance evaluation value Comparison means for comparing the first evaluation value and the second evaluation value as a reference value of the comprehensive evaluation value and the comprehensive evaluation value If it is equal to or greater than the second threshold, the user's performance content in the next scene is notified to the user, and if the overall evaluation value is less than the second threshold, the performance evaluation value is Informing means for informing the user of the current scene performance correction point for a modal that is less than the first threshold .

また、上述した目的を達成するために、本発明に係る演技評価方法は、所定のシーンにおけるユーザの演技を評価する演技評価方法であって、複数のモーダルを通じて上記ユーザの演技を認識する認識工程と、上記認識工程で得られた各モーダルについての認識結果のデータと該各モーダルについての教師データとを用いて該各モーダルについての演技評価値を算出するとともに、該各モーダルについての演技評価値の全てを用いて上記ユーザの演技を総合的に評価する総合評価値を算出して上記ユーザの現在のシーンにおける演技を評価する演技評価工程と、上記演技評価値と該演技評価値の基準値となる第１の閾値とを比較するとともに上記総合評価値と該総合評価値の基準値となる第２の閾値とを比較する比較工程と、上記総合評価値が該第２の閾値以上である場合には、次のシーンにおける上記ユーザの演技内容を該ユーザに報知し、該総合評価値が該第２の閾値未満である場合には、上記演技評価値が第１の閾値未満であるモーダルについての上記現在のシーンの演技の修正点を上記ユーザに報知する報知工程とを有するものである。 In order to achieve the above-described object, the performance evaluation method according to the present invention is a performance evaluation method for evaluating a user's performance in a predetermined scene, and a recognition step of recognizing the user's performance through a plurality of modals. And calculating the performance evaluation value for each modal using the recognition result data for each modal obtained in the recognition step and the teacher data for each modal, and the performance evaluation value for each modal. A performance evaluation step for calculating a comprehensive evaluation value for comprehensively evaluating the user's performance using all of the above and evaluating the performance of the user in the current scene, and the performance evaluation value and a reference value for the performance evaluation value A comparison step of comparing the first evaluation threshold value and the second evaluation threshold value as a reference value of the total evaluation value, and the total evaluation value If it is equal to or greater than the second threshold, the user's performance content in the next scene is notified to the user, and if the overall evaluation value is less than the second threshold, the performance evaluation value is And a notifying step of notifying the user of the correction point of the current scene performance regarding the modal that is less than the first threshold .

本発明に係る演技評価装置及び方法によれば、複数のモーダルを通じてユーザの演技を認識することができる。また、各モーダルの認識結果に基づいてユーザの演技を評価することができる。 According to the performance evaluation apparatus and method according to the present invention, a user's performance can be recognized through a plurality of modals. Moreover, a user's performance can be evaluated based on the recognition result of each modal.

以下、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。 Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings.

先ず、本実施の形態における演技評価装置の構成について説明する。 First, the structure of the performance evaluation apparatus in this Embodiment is demonstrated.

図１に示すように、本実施の形態における演技評価装置１は、カメラ画像入力器１０と、マイク音声入力器１１と、顔認識器１２と、視線認識器１３と、表情認識器１４と、手振り認識器１５と、音声認識器１６と、韻律認識器１７と、演技評価器１８と、シナリオコントローラ１９と、エージェントコントローラ２０と、エージェント２１とで構成される。 As shown in FIG. 1, the performance evaluation apparatus 1 according to the present embodiment includes a camera image input device 10, a microphone sound input device 11, a face recognition device 12, a line-of-sight recognition device 13, and a facial expression recognition device 14. The hand gesture recognizer 15, the speech recognizer 16, the prosody recognizer 17, the performance evaluator 18, the scenario controller 19, the agent controller 20, and the agent 21 are included.

カメラ画像入力器１０は、演技を行うユーザの画像を入力し、入力した画像を顔認識器１２と、手振り認識器１５とに供給する。 The camera image input device 10 inputs an image of a user who performs an action, and supplies the input image to the face recognizer 12 and the hand gesture recognizer 15.

マイク音声入力器１１は、演技を行うユーザの音声を入力し、入力した音声を音声認識器１６と、韻律認識器１７とに供給する。 The microphone voice input device 11 inputs the voice of the user performing the performance and supplies the input voice to the voice recognizer 16 and the prosody recognizer 17.

顔認識器１２は、カメラ画像入力器１０から画像が供給されると、顔画像を認識する。そして、この供給された画像を、水平方向がX軸、垂直方向がY軸である直交座標平面として、顔の中心位置(x,y)を検出する。また、その位置において顔の方向を検出し、ロール角、ピッチ角及びヨー角を用いて検出した顔の方向を表現する。その後、顔認識器１２は、時刻tにおける画像内での顔の中心位置及び方向を示すデータF(t){x(t),y(t),roll(t),pitch(t),yow(t)}を演技評価器１８に供給する。また、顔認識器１２は、顔画像を視線認識器１３と表情認識器１４とに供給する。なお、顔を検出する技術に関しては、例えば「ピクセル差分特徴を用いた実時間任意姿勢検出器の学習（佐部浩太郎・日台健一）」（http://face.pdp.crl.sony.co.jp/index.html#mview_face_detection）に記載されている。 When the image is supplied from the camera image input device 10, the face recognizer 12 recognizes a face image. Then, the center position (x, y) of the face is detected using the supplied image as an orthogonal coordinate plane in which the horizontal direction is the X axis and the vertical direction is the Y axis. In addition, the face direction is detected at the position, and the detected face direction is expressed using the roll angle, the pitch angle, and the yaw angle. Thereafter, the face recognizer 12 generates data F (t) {x (t), y (t), roll (t), pitch (t), yow indicating the center position and direction of the face in the image at time t. (t)} is supplied to the performance evaluator 18. Further, the face recognizer 12 supplies the face image to the line-of-sight recognizer 13 and the expression recognizer 14. Regarding the technology to detect faces, for example, “Learning a real-time arbitrary posture detector using pixel difference features (Kotaro Sabe, Kenichi Hidai)” (http://face.pdp.crl.sony.co .jp / index.html # mview_face_detection).

視線認識器１３は、顔認識器１２から顔画像が供給されると、顔画像から右目の視線方向及び左目の視線方向を検出する。そして、視線認識器１３は、３次元の極座標表現を用いて、検出した右目の視線方向(θr,φr)及び左目の視線方向(θl,φl)を表現する。その後、視線認識器１３は、時刻tにおける視線の方向を示すデータG(t){θr(t),φr(t),θl(t),φl(t)}を演技評価器１８に供給する。なお、顔画像をもとに、所定の座標系で視線方向を検出する技術に関しては、例えば、特公平３−５１４０７号公報に記載されている。 When the face image is supplied from the face recognizer 12, the line-of-sight recognizer 13 detects the line-of-sight direction of the right eye and the line-of-sight direction of the left eye from the face image. The line-of-sight recognizer 13 expresses the detected right-eye line-of-sight direction (θr, φr) and left-eye line-of-sight direction (θl, φl) using a three-dimensional polar coordinate expression. Thereafter, the line-of-sight recognizer 13 supplies data G (t) {θr (t), φr (t), θl (t), φl (t)} indicating the direction of the line of sight at time t to the performance evaluator 18. . For example, Japanese Patent Publication No. 3-51407 discloses a technique for detecting a gaze direction in a predetermined coordinate system based on a face image.

表情認識器１４は、顔認識器１２から顔画像が供給されると、顔画像から、図２に示すように、眉４０、目４１及び口４２の１８箇所の特徴点４３を検出する。続いて、表情認識器１４は、ユーザの顔の表情を特定するために、眉４０、目４１及び口４２の１６箇所の矢印で示す特徴点間距離４４を算出する。その後、表情認識器１４は、時刻tにおける特徴点間距離４４を示すデータE(t){Ei(t)}(i＝0〜15)を演技評価器１８に供給する。なお、特徴点間距離を用いて表情を認識する技術に関しては、「ロボットでの利用を目的とした顔画像情報と音声情報の統合による感情認識，日本ロボット学会第２２回大会,3D14,Sep.2004（松本祥平・山口健・駒谷和範・尾形哲也・奥乃博）」(http://winnie.kuis.kyoto-u.ac.jp/okuno-lab-bib-j.html)に記載されている。 When the facial image is supplied from the facial recognizer 12, the facial expression recognizer 14 detects 18 feature points 43 of the eyebrows 40, eyes 41, and mouth 42 from the facial image, as shown in FIG. Subsequently, the facial expression recognizer 14 calculates a distance 44 between feature points indicated by 16 arrows of the eyebrows 40, eyes 41, and mouth 42 in order to specify the facial expression of the user. Thereafter, the facial expression recognizer 14 supplies data E (t) {Ei (t)} (i = 0 to 15) indicating the distance 44 between the feature points at time t to the performance evaluator 18. Regarding the technology for recognizing facial expressions using distances between feature points, “Emotion recognition by integrating facial image information and speech information for use in robots, 22nd Annual Conference of the Robotics Society of Japan, 3D14, Sep. 2004 (Shohei Matsumoto, Ken Yamaguchi, Kazunori Komatani, Tetsuya Ogata, Hiroshi Okuno) (http://winnie.kuis.kyoto-u.ac.jp/okuno-lab-bib-j.html) .

手振り認識器１５は、カメラ画像入力器１０から画像が供給されると、この画像を水平方向がX軸、垂直方向がY軸である直交座標平面として、手の中心位置(x,y)を検出する。また、この位置(x,y)を通り、手のひらに対して垂直方向にある軸を回転軸とした回転角を手の方向αとして検出し、さらに、手の大きさs、指の状態l（例えば、五本指のうち立っている指の本数等）を検出する。これにより、手振り認識器１５は、ユーザが行う手振りの状態を特定することができる。その後、手振り認識器１５は、時刻tにおける手振り状態を示すデータH(t){x(t),y(t),s(t),α(t),l(t)}を演技評価器１８に供給する。なお、手振り認識技術に関しては、“Lars Bretzner,Ivan laptev and Tony Lindberg.Hand Gesture recognition using Multi-Scale Colour Features,Hierarchical Models and Particle Filtering.煤ihttp://cg.cs.uni-bonn.de/docs/teaching/2003/SS/cv-seminar/documents/papers/bretzner02hand.pdf）に記載されている。 When an image is supplied from the camera image input device 10, the hand shake recognizer 15 sets the center position (x, y) of the hand as an orthogonal coordinate plane in which the horizontal direction is the X axis and the vertical direction is the Y axis. To detect. Further, the rotation angle with the axis passing through the position (x, y) and being perpendicular to the palm as the rotation axis is detected as the hand direction α, and the hand size s and the finger state l ( For example, the number of standing fingers among the five fingers is detected. Thereby, the hand gesture recognizer 15 can specify the state of the hand gesture performed by the user. Thereafter, the hand gesture recognizer 15 uses the data H (t) {x (t), y (t), s (t), α (t), l (t)} indicating the hand shaking state at time t as the performance evaluator. 18 is supplied. Regarding hand gesture recognition technology, “Lars Bretzner, Ivan laptev and Tony Lindberg. Hand Gesture recognition using Multi-Scale Color Features, Hierarchical Models and Particle Filtering. 煤 ihttp: //cg.cs.uni-bonn.de/docs /teaching/2003/SS/cv-seminar/documents/papers/bretzner02hand.pdf).

音声認識器１６は、マイク音声入力器１１から音声が供給されると、音声をテキストに変換し、テキストデータTとして演技評価器１８に供給する。 When the voice is supplied from the microphone voice input unit 11, the voice recognizer 16 converts the voice into text and supplies it to the performance evaluator 18 as text data T.

韻律認識器１７は、マイク音声入力器１１から音声が入力されると、音声を解析し、音声の高さ、強さ及び速さを検出する。その後、韻律認識器１７は、時刻tにおける韻律を示すデータPRO(t){pitch(t),power(t),speed(t)}を演技評価器１８に供給する。 When a voice is input from the microphone voice input unit 11, the prosody recognizer 17 analyzes the voice and detects the height, strength, and speed of the voice. Thereafter, the prosody recognizer 17 supplies the performance evaluator 18 with data PRO (t) {pitch (t), power (t), speed (t)} indicating the prosody at time t.

演技評価器１８は、各認識器から供給されたデータと、シナリオコントローラ１９から供給されたユーザの演技の手本となるユーザ演技教師データとを用いて、重み付きデータ間距離を用いた算出方法により、ユーザの演技における各モーダルについての演技評価値及び総合評価値を算出する。ここで、総合評価値とは、全てのモーダルについての演技評価値を用いて算出されるものであり、ユーザ演技を総合的に評価し、ストーリの進行を決定するものである。その後、演技評価器１８は、演技評価値のデータと総合評価値のデータとを演技評価結果データとして、シナリオコントローラ１９に供給する。 The performance evaluator 18 uses the data supplied from each recognizer and the user performance teacher data, which is a model of the user's performance supplied from the scenario controller 19, to calculate using a distance between weighted data. Thus, the performance evaluation value and the overall evaluation value for each modal in the user's performance are calculated. Here, the comprehensive evaluation value is calculated using the performance evaluation values for all the modals, and comprehensively evaluates the user performance and determines the progress of the story. Thereafter, the performance evaluator 18 supplies the performance evaluation value data and the comprehensive evaluation value data to the scenario controller 19 as performance evaluation result data.

以下に、演技評価器１８が、重み付きデータ間距離を用いて、時刻tにおける各モーダルについての演技評価値を算出する方法について説明する。 Hereinafter, a method in which the performance evaluator 18 calculates a performance evaluation value for each modal at time t using the weighted data distance will be described.

演技評価器１８は、顔認識器１２から供給されたデータF(t)と、シナリオコントローラ１９から供給されたユーザ演技教師データFt(t)とを用いて、以下の式（１）により、時刻tにおける演技評価値Fscore(t)を算出する。 The performance evaluator 18 uses the data F (t) supplied from the face recognizer 12 and the user performance teacher data Ft (t) supplied from the scenario controller 19 to calculate the time according to the following equation (1). The performance evaluation value Fscore (t) at t is calculated.

Fscore(t)＝Σexp(−Wi|Fi−Fti|)/N・・・（１）
ここで、i＝0〜4であり、F0＝x,F1＝y,F2＝roll,F3＝pitch,F4＝yowである。よって、データ数Nは、N＝5となる。また、Ftiは、Fiに対するユーザ演技教師データであり、Wiは、データの重み係数である。 Fscore (t) = Σexp (−Wi | Fi−Fti |) / N (1)
Here, i = 0 to 4, F0 = x, F1 = y, F2 = roll, F3 = pitch, and F4 = yow. Therefore, the number of data N is N = 5. Fti is user acting teacher data for Fi, and Wi is a data weighting factor.

また、演技評価器１８は、視線認識器１３から供給されたデータG(t)と、シナリオコントローラ１９から供給されたユーザ演技教師データGt(t)とを用いて、以下の式（２）により、演技評価値Gscore(t)を算出する。 Further, the performance evaluator 18 uses the data G (t) supplied from the line-of-sight recognizer 13 and the user performance teacher data Gt (t) supplied from the scenario controller 19 according to the following equation (2). The performance evaluation value Gscore (t) is calculated.

Gscore(t)＝Σexp(−Wi|Gi−Gti|)/N・・・（２）
ここで、i＝0〜3であり、G0＝θr,G1＝φr,G2＝θl,G3＝φlである。よって、データ数Nは、N＝4となる。また、Gtiは、Giに対するユーザ演技教師データであり、Wiは、データの重み係数である。 Gscore (t) = Σexp (−Wi | Gi−Gti |) / N (2)
Here, i = 0 to 3, G0 = θr, G1 = φr, G2 = θl, and G3 = φl. Therefore, the number of data N is N = 4. Gti is user acting teacher data for Gi, and Wi is a data weighting coefficient.

また、演技評価器１８は、表情認識器１４から供給されたデータE(t)と、シナリオコントローラ１９から供給されたユーザ演技教師データEt(t)とを用いて、以下の式（３）により、演技評価値Escore(t)を算出する。 The performance evaluator 18 uses the data E (t) supplied from the facial expression recognizer 14 and the user performance teacher data Et (t) supplied from the scenario controller 19 according to the following equation (3). Then, the performance evaluation value Escore (t) is calculated.

Escore(t)＝Σexp(−Wi|Ei−Eti|)/N・・・（３）
ここで、Eiは特徴点間距離を示すデータであり、図２で示すようにi＝0〜15である。よって、データ数Nは、N＝16となる。また、Etiは、Eiに対するユーザ演技教師データであり、Wiは、データの重み係数である。 Escore (t) = Σexp (−Wi | Ei−Eti |) / N (3)
Here, Ei is data indicating the distance between feature points, and i = 0 to 15 as shown in FIG. Therefore, the number of data N is N = 16. Eti is user acting teacher data for Ei, and Wi is a data weighting coefficient.

また、演技評価器１８は、手振り認識器１５から供給されたデータH(t)と、シナリオコントローラ１９から供給されたユーザ演技教師データHt(t)とを用いて、以下の式（４）により、演技評価値Hscore(t)を算出する。 The performance evaluator 18 uses the data H (t) supplied from the hand gesture recognizer 15 and the user performance teacher data Ht (t) supplied from the scenario controller 19 according to the following equation (4). The performance evaluation value Hscore (t) is calculated.

Hscore(t)＝Σexp(−Wi|Hi−Hti|)/N・・・（４）
ここで、i＝0〜4であり、H0＝x,H1＝y,H2＝s,H3＝α,H4＝lである。よって、データ数Nは、N＝5となる。また、Htiは、Hiに対するユーザ演技教師データであり、Wiは、データの重み係数である。 Hscore (t) = Σexp (−Wi | Hi−Hti |) / N (4)
Here, i = 0 to 4, H0 = x, H1 = y, H2 = s, H3 = α, and H4 = l. Therefore, the number of data N is N = 5. Hti is user acting teacher data for Hi, and Wi is a data weighting coefficient.

また、演技評価器１８は、音声認識器１６から供給されたテキストデータTと、シナリオコントローラ１９から供給されたユーザ教師データTtとを用いて、以下の式（５）により演技評価値Tscore(t)を算出する。 Further, the performance evaluator 18 uses the text data T supplied from the speech recognizer 16 and the user teacher data Tt supplied from the scenario controller 19 to perform the performance evaluation value Tscore (t ) Is calculated.

Tscore(t)＝exp(−W*Levenshtein距離)・・・（５）
ここで、Levenshtein距離とは、音声認識器１６から供給されたテキストデータTを、シナリオコントローラ１９から供給されたユーザ演技教師データTtに変換するために、置換、挿入、又は削除しなければならない最小の文字数である。また、Wは、データの重み係数である。 Tscore (t) = exp (−W * Levenshtein distance) (5)
Here, the Levenshtein distance is the minimum value that must be replaced, inserted, or deleted in order to convert the text data T supplied from the speech recognizer 16 into the user acting teacher data Tt supplied from the scenario controller 19. Is the number of characters. W is a weighting factor of data.

また、演技評価器１８は、韻律認識器１７から供給されたデータPRO(t)と、シナリオコントローラ１９から供給されたユーザ演技教師データPROt(t)とを用いて、以下の式（６）により、演技評価値PROscore(t)を算出する。 The performance evaluator 18 uses the data PRO (t) supplied from the prosody recognizer 17 and the user performance teacher data PROt (t) supplied from the scenario controller 19 according to the following equation (6). Then, the performance evaluation value PROscore (t) is calculated.

PROscore(t)＝Σexp(−Wi|PROi−PROti|)/N・・・（６）
ここで、i＝0〜2であり、PRO0＝pitch,PRO1＝power,PRO2＝speedである。よって、データ数は、N＝3となる。また、PROtiは、PROiに対するユーザ演技教師データであり、Wiは、データの重み係数である。 PROscore (t) = Σexp (−Wi | PROi−PROti |) / N (6)
Here, i = 0 to 2, PRO0 = pitch, PRO1 = power, PRO2 = speed. Therefore, the number of data is N = 3. PROti is user acting teacher data for PROi, and Wi is a data weighting factor.

その後、演技評価器１８は、上述した数式（１）〜（６）によって得られた時刻tにおける演技評価値を用いて、シナリオコントローラ１９に供給する演技評価値を算出する。 Thereafter, the performance evaluator 18 calculates a performance evaluation value to be supplied to the scenario controller 19 using the performance evaluation value at time t obtained by the above mathematical formulas (1) to (6).

ここで、演技評価器１８は、演技評価の開始・終了間を１シーンとし、また、音声認識結果が１シーンに収まるようにシーンを定義し、１シーン終了毎にシナリオコントローラ１９に供給する演技評価値を算出する。 Here, the performance evaluator 18 defines a scene so that the period between the start and end of performance evaluation is one scene, and the speech recognition result fits into one scene, and the performance is supplied to the scenario controller 19 every time one scene ends. An evaluation value is calculated.

具体的に、演技評価器１８は、音声以外のモーダルについては、時刻tにおける演技評価値をシーンの時間で積分する。また、音声については、時刻tにおける演技評価値を１シーンにデータが入力された回数で平均する。このように算出して得られた値をシナリオコントローラ１９に供給する演技評価値とする。 Specifically, the performance evaluator 18 integrates the performance evaluation value at time t with the scene time for modals other than voice. For voice, the performance evaluation value at time t is averaged by the number of times data is input to one scene. The value obtained by such calculation is used as the performance evaluation value supplied to the scenario controller 19.

また、演技評価器１８は、以上により算出された全てのモーダルについての演技評価値の平均値、又は重み付き平均値を総合評価値とする。 Further, the performance evaluator 18 uses the average value or the weighted average value of the performance evaluation values for all the modals calculated as described above as the overall evaluation value.

その後、演技評価器１８は、各モーダルについての演技評価値のデータ及び総合評価値のデータを演技評価結果データとして、シナリオコントローラ１９に供給する。 Thereafter, the performance evaluator 18 supplies the scenario controller 19 with performance evaluation value data and comprehensive evaluation value data for each modal as performance evaluation result data.

シナリオコントローラ１９は、演技評価器１８が演技評価を行う際、ユーザ演技教師データを演技評価器１８に供給する。 The scenario controller 19 supplies user performance teacher data to the performance evaluator 18 when the performance evaluator 18 performs performance evaluation.

また、シナリオコントローラ１９は、各モーダルについての演技評価値及び総合評価値に対して、それぞれ一定の閾値を有しており、演技評価器１８から演技評価結果データが供給されると、演技評価値とその閾値とを、また、総合評価値とその閾値とを比較する。 Further, the scenario controller 19 has a certain threshold value for the performance evaluation value and the comprehensive evaluation value for each modal. When the performance evaluation result data is supplied from the performance evaluator 18, the performance evaluation value is obtained. And the threshold value, and the comprehensive evaluation value and the threshold value are compared.

この比較結果を受けて、シナリオコントローラ１９は、エージェント２１がユーザに対してどのような動作をすべきかを示すエージェント動作データをエージェントコントローラ２０に供給し、また、次のユーザ演技教師データを演技評価器１８に供給する。 In response to this comparison result, the scenario controller 19 supplies agent operation data indicating what operation the agent 21 should perform to the user to the agent controller 20 and performs the performance evaluation of the next user performance teacher data. Supply to the vessel 18.

エージェントコントローラ２０は、シナリオコントローラ１９からエージェント動作データが供給されると、このエージェント動作データをもとにエージェント２１を制御する。 When the agent operation data is supplied from the scenario controller 19, the agent controller 20 controls the agent 21 based on the agent operation data.

エージェント２１は、エージェントコントローラ２０の制御を受けて、エージェント動作データに基づく動作をする。 The agent 21 operates based on the agent operation data under the control of the agent controller 20.

次に、上述した構成を有する演技評価装置１がユーザの演技を評価する動作について、演技練習モードと、舞台本番モードとに分けて説明する。 Next, the operation in which the performance evaluation apparatus 1 having the above-described configuration evaluates the user's performance will be described separately for the performance practice mode and the stage performance mode.

演技評価装置１がユーザ演技を評価する動作は、例えば図３及び図４のフローチャートに示すような順序で行われる。 The action evaluation apparatus 1 performs an operation for evaluating a user action in the order shown in the flowcharts of FIGS. 3 and 4, for example.

演技練習モードにおける演技評価装置１の動作を図３に示す。 The operation of the performance evaluation apparatus 1 in the performance practice mode is shown in FIG.

先ずステップＳ１において、シナリオコントローラ１９は、ストーリの始めのシーンにおけるユーザの演技の内容をユーザに伝えるためのエージェント動作データをエージェントコントローラ２０に供給する。また、シナリオコントローラ１９は、ストーリの始めのシーンに対応するユーザ演技教師データを演技評価器１８に供給する。 First, in step S1, the scenario controller 19 supplies the agent controller 20 with agent action data for transmitting the contents of the user's performance in the first scene of the story to the user. The scenario controller 19 supplies user performance teacher data corresponding to the first scene of the story to the performance evaluator 18.

次にステップＳ２において、エージェント２１は、エージェントコントローラ２０の制御を受けて、舞台監督として始めのシーンの演技内容をユーザに伝える。 Next, in step S2, the agent 21 receives the control of the agent controller 20 and informs the user of the performance contents of the first scene as a stage director.

ステップＳ３において、カメラ画像入力器１０は、エージェント２１の指示を受けたユーザの演技の画像を、また、マイク音声入力器１１は、音声を入力し、入力した画像と音声とを上述した各認識器に供給する。 In step S3, the camera image input unit 10 inputs an image of the user's performance received by the agent 21, and the microphone audio input unit 11 inputs audio, and the input image and audio are recognized as described above. Supply to the vessel.

その後ステップＳ４において、各認識器は、画像処理及び音声処理を行うと、各モーダルの認識結果のデータを演技評価器１８に供給する。 Thereafter, in step S4, when each recognizer performs image processing and sound processing, it supplies data of recognition results of each modal to the performance evaluator 18.

ステップＳ５において、演技評価器１８は、各認識器から供給されたデータとシナリオコントローラ１９から供給されたユーザ演技教師データとを用いて、ユーザの演技評価値を算出し、この演技評価値のデータと総合評価値のデータとを演技評価結果データとしてシナリオコントローラ１９に供給する。 In step S5, the performance evaluator 18 calculates a user performance evaluation value using the data supplied from each recognizer and the user performance teacher data supplied from the scenario controller 19, and the performance evaluation value data And the comprehensive evaluation value data are supplied to the scenario controller 19 as performance evaluation result data.

ステップＳ６において、シナリオコントローラ１９は、演技評価結果データが供給されると、演技評価値とその閾値とを、また、総合評価値とその閾値とを比較する。ここで、総合評価値がその閾値以上である場合、ステップＳ７に進む。一方、総合評価値がその閾値未満である場合、ステップＳ９に進む。 In step S6, when the performance evaluation result data is supplied, the scenario controller 19 compares the performance evaluation value with its threshold value, and compares the overall evaluation value with its threshold value. If the overall evaluation value is equal to or greater than the threshold value, the process proceeds to step S7. On the other hand, when the comprehensive evaluation value is less than the threshold value, the process proceeds to step S9.

総合評価値がその閾値以上である場合、ステップＳ７において、シナリオコントローラ１９は、次のシーンにおけるユーザの演技の内容をユーザに伝えるためのエージェント動作データをエージェントコントローラ２０に供給する。また、シナリオコントローラ１９は、次のシーンに対応するユーザ演技教師データを演技評価器１８に供給する。 When the comprehensive evaluation value is equal to or greater than the threshold value, in step S7, the scenario controller 19 supplies the agent controller 20 with agent action data for conveying the contents of the user's performance in the next scene to the user. The scenario controller 19 supplies user performance teacher data corresponding to the next scene to the performance evaluator 18.

ステップＳ８において、エージェント２１は、エージェントコントローラ２０の制御を受けて、舞台監督として次のシーンの演技内容をユーザに伝える。 In step S8, the agent 21 receives the control of the agent controller 20 and informs the user of the performance content of the next scene as a stage director.

一方、総合評価値がその閾値未満である場合、ステップＳ９において、シナリオコントローラ１９は、演技評価値がその閾値未満であるモーダルについて、現在の演技の修正点をユーザに伝えるためのエージェント動作データをエージェントコントローラ２０に供給する。また、シナリオコントローラ１９は、再度、現在のシーンに対応するユーザ演技教師データを演技評価器１８に供給する。 On the other hand, if the overall evaluation value is less than the threshold value, in step S9, the scenario controller 19 sends agent action data for conveying the correction points of the current performance to the user for the modal whose performance evaluation value is less than the threshold value. The agent controller 20 is supplied. The scenario controller 19 supplies user performance teacher data corresponding to the current scene to the performance evaluator 18 again.

ステップＳ１０において、エージェント２１は、エージェントコントローラ２０の制御を受けて、現在の演技の修正点をユーザに伝える。 In step S10, the agent 21 receives the control of the agent controller 20 and informs the user of the correction point of the current performance.

このように、演技練習モードでは、総合評価値とその閾値との比較結果によって、ユーザが次のシーンの演技を行うか、再度、現在のシーンの演技を行うかが決定される。 As described above, in the performance practice mode, it is determined whether the user performs the next scene or again performs the current scene, based on the comparison result between the comprehensive evaluation value and the threshold value.

例えば、演技練習モードでは、
エージェントＡ（舞台監督）：「じゃあ、空を仰ぐ感じで『おお、ロミオ、あなたはなぜロミオなの？』って言ってみて。」
ユーザ：「（カメラ画像入力器及びマイク音声入力器の前で、空を仰ぎながら）おお、ロミオ、あなたはなぜロミオなの？」
演技評価器：Fscore＝0.9，Gscore＝0.9，Escore＝0.8，Hscore＝0.8，Tscore＝0.5，PROscore＝0.3
シナリオコントローラ：Score_Threshold(80)＞Score(70) Then Repeat
エージェントＡ（舞台監督）：「今の演技は７０点だな。表情もポーズも良いけれど、台詞回しがイマイチだね。もう一回やってみて。」
ユーザ：「（カメラ画像入力器及びマイク音声入力器の前で、空を仰ぎながら）『おお、ロミオ、あなたはなぜロミオなの？』」
演技評価器：Fscore＝0.9，Gscore＝0.9，Escore＝0.9，Hscore＝0.9，Tscore＝1，PROscore＝0.9
シナリオコントローラ：Score_Threshold(80)＜Score(92) Then Go To Next
エージェントＡ（舞台監督）：「なかなか良くやったね。９２点！じゃあ次のシーンに行ってみようか。」
といった演技評価装置１とユーザとの相互動作によってストーリを進行させていく。 For example, in the acting practice mode,
Agent A (stage director): “So, look up at the sky and say,“ Oh, Romeo, why are you Romeo? ”
User: “Looking at the sky in front of the camera image input device and microphone audio input device, oh, Romeo, why are you Romeo?”
Performance evaluator: Fscore = 0.9, Gscore = 0.9, Escore = 0.8, Hscore = 0.8, Tscore = 0.5, PROscore = 0.3
Scenario controller: Score _Threshold (80)> Score (70) Then Repeat
Agent A (stage director): “I currently have 70 performances. The expressions and poses are good, but the dialogue is not good. Try again.”
User: “(Looking up at the sky in front of the camera image input device and microphone audio input device)“ Oh, Romeo, why are you Romeo? ”
Performance evaluator: Fscore = 0.9, Gscore = 0.9, Escore = 0.9, Hscore = 0.9, Tscore = 1, PROscore = 0.9
Scenario controller: Score _Threshold (80) <Score (92) Then Go To Next
Agent A (stage director): “I did quite well. 92 points! Then let's go to the next scene?”
The story is advanced by the interaction between the performance evaluation device 1 and the user.

一方、舞台本番モードにおける演技評価装置１の動作を図４に示す。 On the other hand, the operation of the performance evaluation device 1 in the stage performance mode is shown in FIG.

先ずステップＳ１１において、シナリオコントローラ１９は、ストーリの始めのシーンにおけるエージェントの動作を制御するエージェント動作データをエージェントコントローラ２０に供給する。また、シナリオコントローラ１９は、ストーリの始めのシーンに対応するユーザ演技教師データを演技評価器１８に供給する。 First, in step S11, the scenario controller 19 supplies agent operation data for controlling the operation of the agent in the first scene of the story to the agent controller 20. The scenario controller 19 supplies user performance teacher data corresponding to the first scene of the story to the performance evaluator 18.

次にステップＳ１２において、エージェント２１は、エージェントコントローラ２０の制御を受けて、ユーザの相手役としてストーリの始めのシーンの演技を行う。 Next, in step S12, the agent 21 receives the control of the agent controller 20 and performs the first scene of the story as the user's opponent.

続いてステップＳ１３において、ユーザが、相手役であるエージェント２１の演技を受けて、始めのシーンの演技を行うと、カメラ画像入力器１０はユーザの演技の画像を、また、マイク音声入力器１１はユーザの演技の音声を入力し、入力した画像と音声とを上述した各認識器に供給する。 Subsequently, in step S13, when the user performs the action of the first scene in response to the acting of the partner agent 21, the camera image input device 10 displays the user's performance image and the microphone audio input device 11 as well. Inputs the voice of the user's performance and supplies the input image and voice to each of the above-described recognizers.

その後ステップＳ１４において、各認識器は、画像処理及び音声処理を行うと、各モーダルの認識結果のデータを演技評価器１８に供給する。 Thereafter, in step S <b> 14, when each recognizer performs image processing and sound processing, it supplies data of recognition results of each modal to the performance evaluator 18.

ステップＳ１５において、演技評価器１８は、各認識器から供給されたデータとシナリオコントローラ１９から供給されたユーザ演技教師データとを用いて、ユーザの演技評価値を算出し、この演技評価値のデータと総合評価値のデータとを演技評価結果データとしてシナリオコントローラ１９に供給する。 In step S15, the performance evaluator 18 calculates the user performance evaluation value using the data supplied from each recognizer and the user performance teacher data supplied from the scenario controller 19, and the performance evaluation value data And the comprehensive evaluation value data are supplied to the scenario controller 19 as performance evaluation result data.

ステップＳ１６において、シナリオコントローラ１９は、次のシーンにおけるエージェントの動作を制御するエージェント動作データと、総合評価値に対応した観客の反応をユーザに示すためのエージェント動作データとをエージェントコントローラ２０に供給する。また、シナリオコントローラ１９は、次のシーンに対応するユーザ演技教師データを演技評価器１８に供給する。 In step S <b> 16, the scenario controller 19 supplies the agent controller 20 with agent action data for controlling the action of the agent in the next scene and agent action data for showing the audience reaction corresponding to the comprehensive evaluation value to the user. . The scenario controller 19 supplies user performance teacher data corresponding to the next scene to the performance evaluator 18.

ステップＳ１７において、エージェントコントローラ２０の制御を受けたエージェント２１は、観客として、現在のシーンのユーザ演技に対する反応をユーザに示す。 In step S17, the agent 21 under the control of the agent controller 20 shows the user the reaction to the user performance of the current scene as a spectator.

ステップＳ１８において、エージェントコントローラ２０の制御を受けたエージェント２１は、ユーザの相手役として次のシーンの演技を行う。 In step S18, the agent 21 under the control of the agent controller 20 performs the next scene as the user's opponent.

例えば、舞台本番モードでは、
エージェントＡ（ロミオ）：「もしこれに手を触れて汚したのであれば、自分は赤面した巡礼だから、償いのために接吻させてほしい。」
ユーザ（ジュリエット）：（カメラ画像入力器及びマイク音声入力器の前で、はにかむ様子で）「あなたのご信心はあまりにもお行儀がよく、お上品でございます。聖者にだって手はございますもの。巡礼がお触れになってもよろしゅうございます。でも、接吻はいけませんわ。」
演技評価器：Fscore＝0.9，Gscore＝0.9，Escore＝0.9，Hscore＝0.9，Tscore＝1，PROscore＝0.9
シナリオコントローラ：Score_Threshold(80)＜Score(92) Then Go To Next
エージェントＢ（観客）：大きな拍手
といった演技評価装置１とユーザとの相互動作によってストーリを進行させていく。 For example, in stage production mode,
Agent A (Romeo): "If you touch this and get dirty, I'm a blush pilgrimage, so let me kiss you for compensation."
User (Juliet): (In front of the camera image input device and microphone audio input device) I ’m happy to hear from you, but you should n’t kiss. ”
Performance evaluator: Fscore = 0.9, Gscore = 0.9, Escore = 0.9, Hscore = 0.9, Tscore = 1, PROscore = 0.9
Scenario controller: Score _Threshold (80) <Score (92) Then Go To Next
Agent B (audience): The story is advanced by the interaction between the performance evaluation apparatus 1 and the user such as a big applause.

以上、演技評価器１８が、重み付きデータ間距離を用いて演技評価値を算出する場合における演技評価装置１の構成及び動作について説明したが、演技評価器１８が演技評価値を算出する方法としては、重み付きデータ間距離を用いる方法以外に、ベイジアン・ネットワークを用いる方法もある。 As mentioned above, although the structure and operation | movement of the performance evaluation apparatus 1 in the case where the performance evaluation device 18 calculates a performance evaluation value using the distance between weighted data were demonstrated, as a method by which the performance evaluation device 18 calculates a performance evaluation value. In addition to the method using the weighted data distance, there is a method using a Bayesian network.

ベイジアン・ネットワークを用いた演技評価器１８の構成を図５に示す。 FIG. 5 shows a configuration of the performance evaluator 18 using a Bayesian network.

なお、図５では、各認識器とシナリオコントローラ１９についても併せて示している。 In FIG. 5, each recognizer and the scenario controller 19 are also shown.

演技評価器１８は、各時刻での各モーダルの認識器の認識結果を状態変数とするノードと、シーンの種類を状態変数とするノードとで構成され、各ノードの因果関係を有効グラフで繋いだベイジアン・ネットワーク構造を有する。 The performance evaluator 18 is composed of a node having the recognition result of each modal recognizer at each time as a state variable and a node having a scene type as the state variable, and connects the causal relationships of the nodes with an effective graph. It has a Bayesian network structure.

ここで、各ノードは条件付き確率分布（ＣＰＤ）又は条件付き確率テーブル（ＣＰＴ）を有する。 Here, each node has a conditional probability distribution (CPD) or a conditional probability table (CPT).

ベイジアン・ネットワーク構造は、デザイナーによって設計される場合や、サンプルデータから学習により獲得される場合があり、後者の構造は、実際にユーザがある演技をした場合の各認識器における時系列データを集めてサンプルデータとし、Ｋ２アルゴリズム、ＭＣＭＣ法等を用いて獲得される。 A Bayesian network structure may be designed by a designer or acquired by learning from sample data. The latter structure collects time-series data in each recognizer when a user actually performs a certain performance. Sample data and obtained using the K2 algorithm, MCMC method, or the like.

ベイジアン・ネットワークを用いて、ユーザが現在のシーンにどれだけ適合した演技を行ったかを表す演技適合確率は、以下のように算出される。 Using the Bayesian network, the performance matching probability indicating how much the user has performed to the current scene is calculated as follows.

先ず、シナリオコントローラ１９が、現在のシーンのＩＤを示すユーザ演技教師データ(SCENE_ID)を演技評価器１８に供給する。 First, the scenario controller 19 supplies user performance teacher data (SCENE _ID ) indicating the ID of the current scene to the performance evaluator 18.

各モーダルの認識器が画像処理及び音声処理を行った後、音声認識器１６以外の認識器からは、各時刻において、その認識結果のデータが対応するノードに供給される。また、音声認識器１６からは、認識結果のデータが供給された時刻において、その認識結果のデータが対応するノードに供給される。 After each modal recognizer performs image processing and voice processing, the recognition result data other than the voice recognizer 16 is supplied to the corresponding node at each time. The speech recognizer 16 supplies the recognition result data to the corresponding node at the time when the recognition result data is supplied.

続いて、演技評価器１８は、各時刻における認識器の認識結果のデータが各ノードに供給されると、π−λ法、ジャンクション・ツリー・アルゴリズム、Loopy・BP等の手法で推論を行い、各シーンの演技に対する評価値として、演技適合確率{Prob.(SCENE_ID)}を算出する。 Subsequently, the performance evaluator 18 performs inference by a method such as π-λ method, junction tree algorithm, Loopy · BP, etc., when the recognition result data of the recognizer at each time is supplied to each node. The performance matching probability {Prob. (SCENE _ID )} is calculated as an evaluation value for the performance of each scene.

その後、演技評価器１８は、この演技適合確率として表される演技評価値のデータを演技評価結果データとしてシナリオコントローラ１９に供給する。 Thereafter, the performance evaluator 18 supplies the scenario controller 19 with performance evaluation value data expressed as the performance suitability probability as performance evaluation result data.

シナリオコントローラ１９は、この演技評価値に対して一定の閾値を有しており、演技評価器１８から演技評価結果データが供給されると、演技評価値とその閾値とを比較する。 The scenario controller 19 has a certain threshold value for the performance evaluation value. When the performance evaluation result data is supplied from the performance evaluation unit 18, the scenario controller 19 compares the performance evaluation value with the threshold value.

この比較結果を受けて、シナリオコントローラ１９は、エージェント２１がユーザに対してどのような動作をすべきかを示すエージェント動作データをエージェントコントローラ２０に供給し、また、現在又は次のシーンのＩＤを示すユーザ演技教師データ(SCENE_ID)を演技評価器１８に供給する。 In response to the comparison result, the scenario controller 19 supplies agent operation data indicating what operation the agent 21 should perform to the user to the agent controller 20 and indicates the ID of the current or next scene. User performance teacher data (SCENE _ID ) is supplied to the performance evaluator 18.

例えば、ＩＤ＝３であるシーンの舞台本番モードでは、
エージェントＡ（ロミオ）：「もしこれに手を触れて汚したのであれば、自分は赤面した巡礼だから、償いのために接吻させてほしい。」
ユーザ（ジュリエット）：（カメラ画像入力器及びマイク音声入力器の前で、はにかむ様子で）「あなたのご信心はあまりにもお行儀がよく、お上品でございます。聖者にだって手はございますもの。巡礼がお触れになってもよろしゅうございます。でも、接吻はいけませんわ。」
演技評価器：Prob.(SCENE_ID＝3)＝0.9
シナリオコントローラ：Prob_Threshold＜Prob.(SCENE_ID＝3) Then Go To Next
エージェントＢ（観客）：大きな拍手
といった演技評価装置１とユーザとの相互動作によってストーリを進行させていく。 For example, in the stage production mode of the scene with ID = 3,
Agent A (Romeo): "If you touch this and get dirty, I'm a blush pilgrimage, so let me kiss you for compensation."
User (Juliet): (In front of the camera image input device and microphone audio input device) I ’m happy to hear from you, but you should n’t kiss. ”
Performance evaluator: Prob. (SCENE _ID = 3) = 0.9
Scenario controller: Prob _Threshold <Prob. (SCENE _ID = 3) Then Go To Next
Agent B (audience): The story is advanced by the interaction between the performance evaluation apparatus 1 and the user such as a big applause.

なお、本発明は上述した実施の形態のみに限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能であることは勿論である。 It should be noted that the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention.

例えば、上述した実施の形態においては、演技評価器１８が演技評価値を算出する方法として、重み付きデータ間距離を用いた方法及びベイジアン・ネットワークを用いた方法についてそれぞれ説明した。しかし、重み付きデータ間距離を用いた演技評価器とベイジアン・ネットワークを用いた演技評価器とを統合した演技評価装置を構成することも可能である。具体的に、幾つかの認識器から供給されたデータから重み付きデータ間距離を計算し、その結果をベイジアン・ネットワークに供給し、演技の総合評価値を算出してもよい。また、幾つかの認識器から供給されたデータをベイジアン・ネットワークに供給し、その結果から重み付きデータ間距離を用いて演技の総合評価値を算出してもよい。 For example, in the above-described embodiments, the method using the weighted data distance and the method using the Bayesian network have been described as methods by which the performance evaluator 18 calculates the performance evaluation value. However, it is also possible to configure a performance evaluation apparatus that integrates a performance evaluator that uses a distance between weighted data and a performance evaluator that uses a Bayesian network. Specifically, a distance between weighted data may be calculated from data supplied from several recognizers, and the result may be supplied to a Bayesian network to calculate an overall performance evaluation value. Alternatively, data supplied from several recognizers may be supplied to a Bayesian network, and a comprehensive performance evaluation value may be calculated from the results using the distance between weighted data.

また、各認識器がユーザの演技を認識する際に用いるモーダルとして、上述した実施の形態で用いたモーダルの他に、例えば、足の動き、口の形状、目の形状、眉の形状、姿勢等を用いてもよい。 Moreover, as a modal used when each recognizer recognizes a user's performance, in addition to the modal used in the above-described embodiment, for example, the movement of the foot, the shape of the mouth, the shape of the eyes, the shape of the eyebrows, and the posture Etc. may be used.

本実施の形態における演技評価装置の構成を模式的に示すブロック図である。It is a block diagram which shows typically the structure of the performance evaluation apparatus in this Embodiment. 本実施の形態における表情認識器によって検出される表情の特徴点及び特徴点間距離を示す図である。It is a figure which shows the feature point of the facial expression detected by the facial expression recognizer in this Embodiment, and the distance between feature points. 演技評価装置の演技練習モード時の動作順序を示すフローチャートである。It is a flowchart which shows the operation | movement order at the time of the performance practice mode of a performance evaluation apparatus. 演技評価装置の舞台本番モード時の動作順序を示すフローチャートである。It is a flowchart which shows the operation | movement order in the stage performance mode of a performance evaluation apparatus. ベイジアン・ネットワークを用いた演技評価器を示す図である。It is a figure which shows the performance evaluator using a Bayesian network.

Explanation of symbols

１演技評価装置、１０カメラ画像入力器、１１マイク音声入力器、１２顔認識器、１３視線認識器、１４表情認識器、１５手振り認識器、１６音声認識器、１７韻律認識器、１８演技評価器、１９シナリオコントローラ１９、２０エージェントコントローラ、２１エージェント、４０眉、４１目、４２口、４３特徴点、４４特徴点間距離 DESCRIPTION OF SYMBOLS 1 Performance evaluation apparatus, 10 Camera image input device, 11 Microphone voice input device, 12 Face recognition device, 13 Eye-gaze recognition device, 14 Expression recognition device, 15 Hand movement recognition device, 16 Speech recognition device, 17 Prosody recognition device, 18 Performance evaluation 19 Scenario controller 19, 20 Agent controller, 21 Agent, 40 Eyebrows, 41 eyes, 42 mouths, 43 Feature points, 44 Distance between feature points

Claims

A performance evaluation device for evaluating a user's performance in a predetermined scene ,
Recognition means for recognizing the user's performance through a plurality of modals;
The performance evaluation value for each modal is calculated using the recognition result data for each modal obtained by the recognition means and the teacher data for each modal, and all the performance evaluation values for each modal are calculated. a performance evaluation means for evaluating the performance of the current scene of the user to calculate the overall evaluation value for comprehensively evaluating the performance of the user using,
Comparing means for comparing the performance evaluation value with a first threshold value serving as a reference value for the performance evaluation value and comparing the comprehensive evaluation value with a second threshold value serving as a reference value for the comprehensive evaluation value;
When the comprehensive evaluation value is equal to or greater than the second threshold, the user's performance content in the next scene is notified to the user, and when the comprehensive evaluation value is less than the second threshold, the performance evaluation value is first the current scene of Ru performance evaluation device and a notification means for notifying to the user the modification point acting about the modal is less than the threshold.

The recognition result data and the teacher data are represented by vectors,
2. The performance evaluation device according to claim 1 , wherein the performance evaluation means calculates the performance evaluation value using a distance between vectors of the recognition result data and the teacher data.

The recognizing means, as modal, includes the position and direction of the face, the direction of the line of sight, facial expressions, hand and foot movement, mouth shape, eye shape, eyebrow shape, posture, utterance content, and prosody, acting evaluation apparatus according to claim 1, wherein the Ru using at least two or more.

A performance evaluation method for evaluating a user's performance in a predetermined scene ,
A recognition process for recognizing the user's performance through a plurality of modals;
The performance evaluation value for each modal is calculated using the recognition result data for each modal obtained in the recognition step and the teacher data for each modal, and all the performance evaluation values for each modal are calculated. A performance evaluation step of evaluating a performance in the current scene of the user by calculating a comprehensive evaluation value for comprehensively evaluating the performance of the user using
A comparison step of comparing the performance evaluation value with a first threshold value serving as a reference value for the performance evaluation value and comparing the comprehensive evaluation value with a second threshold value serving as a reference value for the comprehensive evaluation value;
When the comprehensive evaluation value is equal to or greater than the second threshold, the user's performance content in the next scene is notified to the user, and when the comprehensive evaluation value is less than the second threshold, acting evaluated how having a a notifying step of the performance evaluation value reports the fixes of the acting current scene for the modal is less than the first threshold value to the user.