JP2015176592A

JP2015176592A - Animation generation device, animation generation method, and program

Info

Publication number: JP2015176592A
Application number: JP2014055203A
Authority: JP
Inventors: 伸也高山; Shinya Takayama; 池田　和史; Kazufumi Ikeda; 和史池田; 茂之酒澤; Shigeyuki Sakasawa
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2014-03-18
Filing date: 2014-03-18
Publication date: 2015-10-05
Anticipated expiration: 2034-03-18
Also published as: JP6222465B2

Abstract

PROBLEM TO BE SOLVED: To extract news having high topicality and a comment on the news from a WEB site, and to enable a character having an appropriate animation to present not only the news but also the comment on the news.SOLUTION: An animation generation device configured to generate the animation of a character on the basis of arbitrary information includes: a temporal animation generation part 40-2 for generating a temporal animation on the basis of an emotion included in a plurality of arbitrary analyzed input data and the dynamic featured value of a subjective degree; and a spatial animation generation part 40-3 for generating a spatial animation on the basis of the emotion included in the plurality of analyzed input data or the static featured value of the subjective degree.

Description

本発明は、話題性の高いニュースおよびニュースに対するコメントをＷＥＢ（World Wide Web）サイトから抽出すると共に、適切なアニメーションを持つキャラクタがニュースおよびコメントを提示するする技術に関する。 The present invention relates to a technique for extracting news with high topicality and comments on the news from a WEB (World Wide Web) site and presenting the news and comments by a character having an appropriate animation.

従来から、キャラクタを利用してニュース情報を提供する技術が知られている。例えば、特許文献１には、リアルタイムに伝えられるニュース情報を音声に変換し、音声に対応する口形態と顔表情をキャタクタに適用して、キャタクタニュースを生成するシステムが開示されている。 Conventionally, a technique for providing news information using a character is known. For example, Patent Literature 1 discloses a system that generates news of a character by converting news information transmitted in real time into sound, and applying a mouth form and facial expression corresponding to the sound to the character.

また、特許文献２には、ニュースに関連するＳＮＳ（Social Networking Service）のコメントを複数の指標に基づいて評価し、ニュースに合わせて各指標の重みを設定することで最適なＳＮＳのコメント提示順序を決定する技術が開示されている。 Also, in Patent Document 2, an SNS (Social Networking Service) comment related to news is evaluated based on a plurality of indexes, and an optimum SNS comment presentation order is set by setting the weight of each index according to the news. Techniques for determining are disclosed.

特許第４４８９１２１号明細書Japanese Patent No. 4489121 特許第４１３４９７５号明細書Japanese Patent No. 4134975

しかしながら、特許文献１に開示されている技術では、予め定められたキャラクタがニュースを読み上げるアニメーションは生成できるが、ＳＮＳ等で表現されるニュースに対するコメント内容やそのコメントに含まれる感情表現を組み合わせた適切なアニメーションは生成できない。 However, in the technique disclosed in Patent Document 1, an animation in which a predetermined character reads the news can be generated, but an appropriate combination of comment content for the news expressed in SNS or the like and emotional expression included in the comment is appropriate. Cannot produce a simple animation.

一方、特許文献２に開示されている技術では、話題性の高いニュースまたはコメントを抽出することは可能だが、それらに適合したアニメーションを持つキャラクタによって、ニュースまたはコメントを提示することは困難である。 On the other hand, with the technique disclosed in Patent Document 2, it is possible to extract highly topical news or comments, but it is difficult to present news or comments with characters having animations adapted to them.

本発明は、このような事情に鑑みてなされたものであり、話題性の高いニュースおよびニュースに対するコメントをＷＥＢサイトから抽出すると共に、適切なアニメーションを持つキャラクタが、ニュースだけでなくニュースに対するコメントまでも提示するアニメーション生成装置、アニメーション生成方法およびプログラムを提供することを目的とする。また、本発明では、ニュースに対するコメントを予め集約することで、集約されたコメントを踏まえたアニメーションを持つキャラクタによるニュース本文の提示が可能となる。さらに、本発明は、表情変化だけでなくキャラクタの全身動作を加えることで、テレビ局が制作する本当のニュース番組に近いコンテンツを提供することができる。 The present invention has been made in view of such circumstances, and a highly topical news and a comment on the news are extracted from the WEB site, and a character having an appropriate animation is able to comment not only on the news but also on the news. An object of the present invention is to provide an animation generation apparatus, an animation generation method, and a program that also present the program. Further, according to the present invention, it is possible to present a news text by a character having an animation based on the aggregated comments by previously aggregating comments on the news. Furthermore, the present invention can provide content close to a real news program produced by a television station by adding not only the expression change but also the whole body motion of the character.

（１）上記の目的を達成するために、本発明は、以下のような手段を講じた。すなわち、本発明のアニメーション生成装置は、任意の情報に基づいて、キャラクタのアニメーションを生成するアニメーション生成装置であって、解析された任意の複数の入力データに含まれる感情または主観度の動的特徴量に基づいて、時間的アニメーションを生成する時間的アニメーション生成部と、解析された前記複数の入力データに含まれる感情または主観度の静的特徴量に基づいて、空間的アニメーションを生成する空間的アニメーション生成部と、を備えることを特徴とする。 (1) In order to achieve the above object, the present invention takes the following measures. That is, the animation generation apparatus of the present invention is an animation generation apparatus that generates an animation of a character based on arbitrary information, and is a dynamic feature of emotion or subjectivity included in any of a plurality of analyzed input data A temporal animation generation unit that generates a temporal animation based on the quantity, and a spatial animation that generates a spatial animation based on a static feature quantity of emotion or subjectivity included in the analyzed input data An animation generation unit.

このように、解析された任意の複数の入力データに含まれる感情または主観度の動的特徴量に基づいて、時間的アニメーションを生成し、また、解析された前記複数の入力データに含まれる感情または主観度の静的特徴量に基づいて、空間的アニメーションを生成するので、入力された情報に応じて、適切なアニメーションを有するキャラクタを作成することが可能となる。 In this way, a temporal animation is generated based on the emotion or subjectivity dynamic feature amount included in any of a plurality of analyzed input data, and the emotion included in the analyzed input data Alternatively, since the spatial animation is generated based on the static feature quantity of the subjectivity level, it is possible to create a character having an appropriate animation according to the input information.

（２）また、本発明のアニメーション生成装置は、前記入力データをＷＥＢ（World Wide Web）サイトから抽出する情報抽出部を更に備えることを特徴とする。 (2) Moreover, the animation production | generation apparatus of this invention is further provided with the information extraction part which extracts the said input data from a WEB (World Wide Web) site.

このように、入力データをＷＥＢサイトから抽出するので、ＳＮＳ等で表現されるニュースに対するコメント内容に応じたキャラクタのアニメーションを生成することが可能となる。 As described above, since the input data is extracted from the WEB site, it is possible to generate an animation of the character according to the comment content for the news expressed by SNS or the like.

（３）また、本発明のアニメーション生成装置は、前記複数の入力データにおける感情または主観度を解析する入力データ解析部を更に備えることを特徴とする。 (3) Moreover, the animation production | generation apparatus of this invention is further provided with the input data analysis part which analyzes the emotion or subjectivity in these input data.

このように、複数の入力データにおける感情または主観度を解析するので、ＳＮＳ等で表現されるニュースに対するコメントに含まれる感情表現に応じたキャラクタのアニメーションを生成することが可能となる。 Thus, since the emotion or subjectivity in a plurality of input data is analyzed, it is possible to generate a character animation according to the emotion expression included in the comment for the news expressed by SNS or the like.

（４）また、本発明のアニメーション生成装置は、前記入力データを提示すると共に、前記生成されたキャラクタのアニメーションを再生するキャラクタ再生部を更に備えることを特徴とする。 (4) The animation generation apparatus of the present invention further includes a character reproduction unit that presents the input data and reproduces the animation of the generated character.

このように、入力データを提示すると共に、生成されたキャラクタのアニメーションを再生するので、入力された情報に応じて、適切なアニメーションを有するキャラクタを表示することが可能となる。 In this way, since the input data is presented and the generated animation of the character is reproduced, it is possible to display a character having an appropriate animation according to the input information.

（５）また、本発明のアニメーション生成装置は、ナレータが読み上げた前記入力データの音声を取得する音声収録部を更に備えることを特徴とする。 (5) Moreover, the animation production | generation apparatus of this invention is further provided with the audio | voice recording part which acquires the audio | voice of the said input data read by the narrator.

このように、ナレータが読み上げた前記入力データの音声を取得するので、人間の声に基づいた音声の再生を行なうことが可能となる。 Thus, since the voice of the input data read out by the narrator is acquired, it is possible to reproduce the voice based on the human voice.

（６）また、本発明のアニメーション生成装置は、前記入力データに対応する音声データを合成する音声合成部を更に備えることを特徴とする。 (6) The animation generation apparatus of the present invention further includes a voice synthesizer that synthesizes voice data corresponding to the input data.

このように、入力データに対応する音声データを合成するので、人工的に音声を作り上げることが可能となる。また、ナレータによる読み上げが不要となるため、製作コストを削減することが可能となる。 As described above, since the voice data corresponding to the input data is synthesized, the voice can be artificially created. Further, since it is not necessary to read out by the narrator, it is possible to reduce the manufacturing cost.

（７）また、本発明のアニメーション生成装置は、前記生成されたキャラクタのアニメーションと共に、前記取得された音声または前記合成された音声を再生する音声再生部を更に備えることを特徴とする。 (7) The animation generation apparatus of the present invention is further characterized by further comprising an audio reproduction unit that reproduces the acquired voice or the synthesized voice together with the animation of the generated character.

このように、生成されたキャラクタのアニメーションと共に、取得された音声または合成された音声を再生するので、キャラクタのアニメーションと共に、音声を出力することができる。これにより、例えば、ニュースキャスターのキャラクタにニュースを読ませるサービスを提供することが可能となる。 Thus, since the acquired voice or synthesized voice is reproduced together with the generated animation of the character, the voice can be output together with the animation of the character. Thereby, for example, it is possible to provide a service that allows a newscaster character to read news.

（８）また、本発明のアニメーション生成装置は、前記入力データの音声の有音区間を検出し、前記キャラクタの台詞の開始時刻および前記台詞の終了時刻を判定する台詞時刻判定部を更に備えることを特徴とする。 (8) The animation generation apparatus of the present invention further includes a dialogue time determination unit that detects a voiced section of the voice of the input data and determines a start time of the character's dialogue and an end time of the dialogue. It is characterized by.

このように、入力データの音声の有音区間を検出し、前記キャラクタの台詞の開始時刻および前記台詞の終了時刻を判定するので、音声を入力した場合であってもキャラクタが読み上げる台詞（テキスト）データを生成することが可能となる。 In this manner, the voiced speech section of the input data is detected, and the start time and the end time of the speech of the character are determined, so the speech (text) that the character reads out even when speech is input. Data can be generated.

（９）また、本発明のアニメーション生成装置は、前記台詞の開始時刻および前記台詞の終了時刻に応じて、前記アニメーションデータを変換する時間長調整部を更に備えることを特徴とする。 (9) Moreover, the animation production | generation apparatus of this invention is further provided with the time length adjustment part which converts the said animation data according to the start time of the said dialog, and the end time of the said dialog.

このように、台詞の開始時刻および前記台詞の終了時刻に応じて、アニメーションデータを変換するので、キャラクタの台詞が動的に生成される場合であっても、画像信号と音声信号の同期ズレを解消して、キャラクタのアニメーションを生成することができる。 In this way, since the animation data is converted according to the start time of the dialogue and the end time of the dialogue, even if the dialogue of the character is dynamically generated, the synchronization deviation between the image signal and the audio signal is prevented. The character animation can be generated.

（１０）また、本発明のアニメーション生成装置において、前記アニメーションは、前記キャラクタが全身動作または表情変化する際の、任意の時刻における前記キャラクタの任意のボーン（bone）またはポリゴン（polygon）の空間座標であることを特徴とする。 (10) Further, in the animation generating apparatus of the present invention, the animation is a spatial coordinate of an arbitrary bone or polygon of the character at an arbitrary time when the character changes its whole body motion or expression. It is characterized by being.

このように、アニメーションは、キャラクタのボーン（bone）またはポリゴン（polygon）の任意の時刻における空間座標を示すデータであるので、台詞に応じた複雑なアニメーションを持つキャラクタを生成することができる。 As described above, the animation is data indicating the spatial coordinates of the bone or polygon of the character at an arbitrary time, so that it is possible to generate a character having a complicated animation corresponding to the dialogue.

（１１）また、本発明のアニメーション生成方法は、任意の情報に基づいて、キャラクタのアニメーションを生成するアニメーション生成方法であって、解析された任意の複数の入力データに含まれる感情または主観度の動的特徴量に基づいて、時間的アニメーションを生成するステップと、解析された前記複数の入力データに含まれる感情または主観度の静的特徴量に基づいて、空間的アニメーションを生成するステップと、を少なくとも含むことを特徴とする。 (11) Further, the animation generation method of the present invention is an animation generation method for generating an animation of a character based on arbitrary information, and is an emotion or subjectivity level included in any of a plurality of analyzed input data. Generating a temporal animation based on the dynamic feature quantity; generating a spatial animation based on the static feature quantity of emotion or subjectivity included in the analyzed input data; It is characterized by including at least.

（１２）また、本発明のプログラムは、任意の情報に基づいて、キャラクタのアニメーションを生成するアニメーション生成装置のプログラムであって、解析された任意の複数の入力データに含まれる感情または主観度の動的特徴量に基づいて、時間的アニメーションを生成する処理と、解析された前記複数の入力データに含まれる感情または主観度の静的特徴量に基づいて、空間的アニメーションを生成する処理と、の一連の処理をコンピュータに実行させることを特徴とする。 (12) Further, the program of the present invention is a program for an animation generation device that generates an animation of a character based on arbitrary information, and the emotion or subjectivity included in any of a plurality of analyzed input data A process of generating a temporal animation based on the dynamic feature quantity; a process of generating a spatial animation based on the static feature quantity of emotion or subjectivity included in the analyzed input data; A series of processes is executed by a computer.

本発明によれば、入力された情報に応じて、適切なアニメーションを有するキャラクタを作成することが可能となる。 According to the present invention, it is possible to create a character having an appropriate animation in accordance with input information.

第１の実施形態に係るキャラクタ情報提示装置の概略構成を示す図である。It is a figure which shows schematic structure of the character information presentation apparatus which concerns on 1st Embodiment. 第１の実施形態に係るキャラクタ情報提示装置の機能を示すブロック図である。It is a block diagram which shows the function of the character information presentation apparatus which concerns on 1st Embodiment. 第１の実施形態に係るキャラクタ情報提示装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the character information presentation apparatus which concerns on 1st Embodiment. 第２の実施形態に係るアニメーション生成装置の概略構成を示す図である。It is a figure which shows schematic structure of the animation production | generation apparatus which concerns on 2nd Embodiment. 第２の実施形態に係るアニメーション生成装置の機能を示すブロック図である。It is a block diagram which shows the function of the animation production | generation apparatus which concerns on 2nd Embodiment. 第２の実施形態に係るアニメーション生成装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the animation production | generation apparatus which concerns on 2nd Embodiment. テキストデータ、開始時刻および終了時刻から構成される台詞データを示す図である。It is a figure which shows the dialog data comprised from text data, start time, and end time. アニメーションデータを示す図である。It is a figure which shows animation data. 感情データを示す図である。It is a figure which shows emotion data. 第２の実施形態に係るデータフォーマットを示す図である。It is a figure which shows the data format which concerns on 2nd Embodiment.

［第１の実施形態］
本発明の実施形態に係るキャラクタ情報提示装置は、任意の情報をＷＥＢサイトから抽出して入力し、複数の入力データにおける感情または主観度を解析し、解析された複数の入力データに含まれる感情または主観度の動的特徴量に基づいて時間的アニメーションを生成し、解析された前記複数の入力データに含まれる感情または主観度の静的特徴量に基づいて空間的アニメーションを生成し、入力データを提示すると共に、生成されたキャラクタのアニメーションを再生する。 [First Embodiment]
The character information presentation device according to the embodiment of the present invention extracts and inputs arbitrary information from a WEB site, analyzes emotion or subjectivity in a plurality of input data, and emotions included in the analyzed plurality of input data Alternatively, a temporal animation is generated based on the dynamic feature quantity of subjectivity, and a spatial animation is generated based on the static feature quantity of emotion or subjectivity included in the plurality of input data analyzed, and the input data And the animation of the generated character is reproduced.

この構成により、入力データがＳＮＳのコメント群のような砕けたテキストデータであっても、適切なアニメーションを持つキャラクタにより提示することができる。また、複数の入力データを包括的に解析すれば、コメント群を反映したニュース本文の提示のように、集約されたデータを踏まえたアニメーションを持つキャラクタによる情報提示が可能となる。さらに、キャラクタのアニメーションとして、キャラクタが全身動作または表情変化する際の、任意の時刻におけるキャラクタの任意のボーンまたはポリゴンの空間座標を制御すれば、テレビ局が制作する本当のニュース番組に近いコンテンツを提供することができる。 With this configuration, even if the input data is crushed text data such as a comment group of SNS, it can be presented by a character having an appropriate animation. Moreover, if a plurality of input data are comprehensively analyzed, information can be presented by a character having an animation based on the aggregated data, as in the case of presenting a news body reflecting a comment group. In addition, if the character's animation controls the spatial coordinates of any bone or polygon of the character at any time when the character changes its whole body motion or facial expression, it provides content close to the real news program produced by the TV station can do.

なお、本実施形態では、ＷＥＢ上のニュースおよびコメント群を用いるが、本発明の技術的思想は、これに限定されるわけではなく、ＳＮＳ上コミュニティー内の発言、メッセンジャーによるチャットでの発言、各種紹介サイトにおける口コミ、街角インタビュの結果、交通機関の運行情報・状態情報、気象情報、占い・運勢を用いても十分である。 In this embodiment, news and comment groups on the WEB are used. However, the technical idea of the present invention is not limited to this, and it is not limited to this. It is sufficient to use the information on reviews, street corner interviews, transportation information / status information, weather information, fortune-telling / fortune on the introduction site.

図１は、本実施形態に係るキャラクタ情報提示装置の概略構成を示す図である。キャラクタ情報提示装置１は、ニュース抽出サーバ１０と、スピーカ２０、ディスプレイ３０と、ＰＣ（Personal Computer）４０とから構成されている。まず、ニュース抽出サーバ１０は、話題性の高いニュースおよびコメント群をＷＥＢサイトから抽出し、ニュースデータ５０およびコメントデータ６０としてＰＣ４０に入力する。なお、図１では、ニュース抽出サーバ１０が、ＰＣ４０に接続されているが、本発明の技術的思想は、これに限定されるわけではなく、ＰＣに対してオフラインによりニュースデータ５０またはコメントデータ６０を入力しても十分である。 FIG. 1 is a diagram illustrating a schematic configuration of a character information presentation device according to the present embodiment. The character information presentation device 1 includes a news extraction server 10, a speaker 20, a display 30, and a PC (Personal Computer) 40. First, the news extraction server 10 extracts highly topical news and comment groups from the WEB site, and inputs them as news data 50 and comment data 60 to the PC 40. In FIG. 1, the news extraction server 10 is connected to the PC 40. However, the technical idea of the present invention is not limited to this, and the news data 50 or comment data 60 is offline to the PC. It is enough to enter.

一方、ＰＣ４０は、ケーブル４０ａを介してキャラクタ情報提示装置としてのスピーカ２０およびディスプレイ３０に接続されている。ＰＣ４０は、入力されたニュースデータ５０およびコメントデータ６０における感情データ８０または主観度９０を解析する。また、ＰＣ４０は、解析された感情データ８０または主観度９０に含まれる感情または主観度の動的特徴量に基づいて時間的アニメーションデータ７０−１を生成する。さらに、ＰＣ４０は、解析された感情データ８０または主観度９０に含まれる感情または主観度の静的特徴量に基づいて空間的アニメーションデータ７０−２を生成する。そして、ＰＣ４０は、入力されたニュースデータ５０およびコメントデータ６０および生成されたキャラクタの時間的アニメーションデータ７０−１および空間的アニメーションデータ７０−２における画像信号を随時ディスプレイ３０に発信する。 On the other hand, the PC 40 is connected to a speaker 20 and a display 30 as a character information presentation device via a cable 40a. The PC 40 analyzes emotion data 80 or subjectivity 90 in the input news data 50 and comment data 60. Further, the PC 40 generates temporal animation data 70-1 based on the emotion or the dynamic feature quantity of the subjectivity included in the analyzed emotion data 80 or the subjectivity 90. Further, the PC 40 generates the spatial animation data 70-2 based on the emotion or the static feature quantity of the subjectivity included in the analyzed emotion data 80 or the subjectivity 90. Then, the PC 40 transmits the image data in the input news data 50 and comment data 60 and the generated character temporal animation data 70-1 and spatial animation data 70-2 to the display 30 as needed.

その際、ＰＣ４０から発信された画像はディスプレイ３０にＡ１として映写される。なお、本発明の実施形態に係るキャラクタ情報提示装置では、入力されたニュースデータ５０およびコメントデータ６０の音声を予め収録または合成し、キャラクタの時間的アニメーションデータ７０−１および空間的アニメーションデータ７０−２の映写と同時に、収録音声信号または合成音声信号を随時スピーカ２０に発信するが、音声信号を発信しなくても良い。例えば、キャラクタが表示され、いわゆる“口パク”のように、音声がミュートされ、字幕が表示されている態様である。 At that time, the image transmitted from the PC 40 is projected on the display 30 as A1. In the character information presentation device according to the embodiment of the present invention, the voices of the input news data 50 and comment data 60 are recorded or synthesized in advance, and the temporal animation data 70-1 and spatial animation data 70- of the character are recorded. Simultaneously with the projection of 2, the recorded audio signal or the synthesized audio signal is transmitted to the speaker 20 at any time, but the audio signal may not be transmitted. For example, a character is displayed, and the sound is muted and subtitles are displayed as in a so-called “mouth lip”.

図２は、第１の実施形態に係るキャラクタ情報提示装置１の機能を示すブロック図である。ニュース抽出サーバ１０のニュース抽出部１０−１は、話題性の高いニュースおよびコメント群をＷＥＢサイトから抽出し、ニュースデータ５０およびコメントデータ６０としてＰＣ４０に入力する。 FIG. 2 is a block diagram illustrating functions of the character information presentation device 1 according to the first embodiment. The news extraction unit 10-1 of the news extraction server 10 extracts highly topical news and comment groups from the WEB site and inputs them as news data 50 and comment data 60 to the PC 40.

ＰＣ４０の入力データ解析部４０−１は、ニュース抽出サーバ１０より入力されたニュースデータ５０およびコメントデータ６０における感情データ８０または主観度９０を解析する。ＰＣ４０の時間的アニメーション生成部４０−２は、解析された感情データ８０または主観度９０に含まれる感情または主観度の動的特徴量に基づいて時間的アニメーションデータ７０−１を生成する。ＰＣ４０の空間的アニメーション生成部４０−３は、解析された感情データ８０または主観度９０に含まれる感情または主観度の静的特徴量に基づいて空間的アニメーションデータ７０−２を生成する。ＰＣ４０のキャラクタ再生部４０−４は、入力されたニュースデータ５０およびコメントデータ６０および生成されたキャラクタの時間的アニメーションデータ７０−１および空間的アニメーションデータ７０−２における画像信号を再生する。 The input data analysis unit 40-1 of the PC 40 analyzes emotion data 80 or subjectivity 90 in the news data 50 and comment data 60 input from the news extraction server 10. The temporal animation generation unit 40-2 of the PC 40 generates temporal animation data 70-1 based on the emotion or the dynamic feature amount of the subjectivity included in the analyzed emotion data 80 or the subjectivity 90. The spatial animation generation unit 40-3 of the PC 40 generates the spatial animation data 70-2 based on the emotion or the static feature quantity of the subjectivity included in the analyzed emotion data 80 or the subjectivity 90. The character reproduction unit 40-4 of the PC 40 reproduces the input news data 50 and comment data 60, and the image signals in the generated character temporal animation data 70-1 and spatial animation data 70-2.

図３は、第１の実施形態に係るキャラクタ情報提示装置１の動作を示すフローチャートである。初めに、ニュース抽出サーバ１０において、ＷＥＢ上にあるニュースの収集を行なう。（ステップＳ１）。ＲＳＳ等を利用し、対象とするニュースのＵＲＬ、タイトル、記事本文、ジャンル、配信日時等の情報を取集する。 FIG. 3 is a flowchart showing the operation of the character information presentation device 1 according to the first embodiment. First, the news extraction server 10 collects the news on the WEB. (Step S1). Using RSS or the like, information such as URL, title, article text, genre, distribution date and time of the target news is collected.

次に、ニュース抽出サーバ１０において、ステップＳ１で収集したニュースのＵＲＬやタイトル、またはＴＦＩＤＦ等を用いて抽出したニュースを特徴づける単語をＳＮＳ上から検索することで、ニュースに関連するコメント群を収集する（ステップＳ２）。ここで、ニュースに関連するコメント群は、ニュースのＵＲＬやタイトルを、ＳＮＳが提供する検索ＡＰＩ等を用いて検索することで、収集できる。また、ニュース記事のタイトルや本文から当該ニュースを特徴づける単語を、ＴＦＩＤＦ等を用いて取り出し、検索に利用しても良い。このとき、ＴＦは各ニュース記事のタイトルまたは記事本文中の各単語の出現頻度、ＤＦには、全記事における各単語の出現頻度などを設定する。 Next, the news extraction server 10 collects comment groups related to the news by searching the SNS for words that characterize the news extracted using the URL or title of the news collected in step S1 or TFIDF. (Step S2). Here, a comment group related to news can be collected by searching for a URL or a title of the news using a search API provided by SNS. In addition, a word characterizing the news may be extracted from the title or body of the news article using TFIDF or the like and used for searching. At this time, the TF sets the title of each news article or the appearance frequency of each word in the article body, and the DF sets the appearance frequency of each word in all articles.

次に、ニュース抽出サーバ１０において、ステップＳ２で収集したコメント群を解析し、話題性の高いニュースデータ５０およびコメントデータ６０を抽出する（ステップＳ３）。ここで、ニュース抽出サーバ１０は、主観指標抽出機能と、主観度算出機能を有する。また、主観指標抽出機能は、顔文字抽出、文体抽出、盛り上がり抽出を実行する。 Next, the news extraction server 10 analyzes the comment group collected in step S2, and extracts news data 50 and comment data 60 having high topicality (step S3). Here, the news extraction server 10 has a subjective index extraction function and a subjectivity calculation function. The subjective index extraction function performs emoticon extraction, style extraction, and excitement extraction.

顔文字抽出では、ＳＮＳユーザのコメント中に顔文字が含まれるコメント群を抽出する。顔文字の抽出方法は、事前に登録した辞書を用いて、コメント中に感情表現が含まれるか否かを判別すれば良い。ここで、顔文字の個数をｓとする。 In emoticon extraction, a comment group in which emoticons are included in SNS user comments is extracted. The emoticon extraction method may be performed by using a dictionary registered in advance to determine whether or not an emotional expression is included in the comment. Here, s is the number of emoticons.

［顔文字を含む主観的なＳＮＳコメント群の例］
“えっ(;゜Δ゜)誰得？”
“欲しいかも(σ´□｀。)”
“(；´∀｀)・・・うわぁ、誰得・・・”
文体抽出では、ＳＮＳコメント群を文体に基づいて複数に分類する。文体を判別する方法は、文末に着目し、語尾の活用形(命令形、仮定形、未然形など)を利用することができる。形態素解析器を用いて文章を解析し、文末の形容詞、形容動詞、動詞の活用形を抽出する。活用形の種類数Ｎに対し、各活用形の出現の有無をｗ_１…ｗ_ｎで表現する。例えば、ｗ_１：命令形、ｗ_２：仮定形、…、とし、文体が命令形の場合、ｗ_１＝１、ｗ_２…ｗ_ｎ＝０とする。 [Example of subjective SNS comment group including emoticons]
“Eh (; ° Δ °) Who is it?”
“You may want (σ´ □ ｀.)”
“(; ´∀ ｀) ... Wow, who gains ...”
In the style extraction, the SNS comment group is classified into a plurality based on the style. As a method for discriminating the style, attention can be paid to the end of the sentence, and the ending forms of the ending (instruction form, hypothesis form, unwritten form, etc.) can be used. Analyzes sentences using a morphological analyzer and extracts adjectives, adjective verbs, and verb usages at the end of sentences. With respect to the number N of utilization forms, the presence / absence of each utilization form is expressed as w ₁ ... W _n . For example, if w _{1 is an} instruction form, w _{2 is an} assumed form, and so on, and the style is an instruction form, w ₁ = 1, w ₂ ... W _n = 0.

［文体の例］
“わぁ、誰得な仕様はやめろよ → 命令形”
“欲しいけど、もうちょっと安かったらなぁ → 仮定形”
“こんな誰得携帯は買わない。 → 未然形”
盛り上がり抽出では、ＳＮＳコメント群から盛り上がりを表す表現を抽出する。盛り上がりを表す表現は、文字の連続性に着目することで抽出する。同一の文字が３文字以上出現することを検出する、または同一形態素が連続して出現することを検出する。各コメントにおいて、連続する文字数または形態素数の最大値を当該コメントの盛り上がり度をｅとする。 [Example style]
“Wow, do n’t give anyone a spec .
“I want it, but I want to be a little cheaper → Assumptions”
“I do n’t buy this mobile phone.
In the excitement extraction, an expression representing excitement is extracted from the SNS comment group. Expressions that express excitement are extracted by focusing on the continuity of characters. Detect that three or more identical characters appear, or detect that the same morpheme appears continuously. In each comment, the maximum value of the number of consecutive characters or morphemes is defined as the degree of excitement of the comment.

［盛り上がり表現の例］
“この機能、誰得ｗｗｗｗｗｗｗｗ →８文字のため、ｅ＝８”
“うおおおおお！欲しいいい！ →５文字のため、ｅ＝５”
“おいおいおい誰得だよ → ６文字 (「おい」３回) のため、ｅ＝６”
一方、主観度算出機能では、各コメントから抽出された主観指標に基づいて、各コメントの主観度Ｐを算出する。主観度ＰはＳＮＳコメントから抽出した（丸１）顔文字含有度合いｓ、（丸２）文体の特徴ｗ_１＋ｗ_２＋…＋ｗ_ｎ、（丸３）盛り上がり度合いｅを用いて、例えば、式（１）のような線型結合の計算式で算出できる。ここで、Ｐの値が閾値以上であるコメント群を話題性の高いコメントデータ６０として抽出する。また、このとき、当該ニュースをニュースデータ５０として抽出する。 [Example of excitement]
“This function, who gets wwwwww → 8 characters, e = 8”
“Uooooo! I want it!
“Everyone gains hey hey → e = 6 because it ’s 6 characters (3 times hey”)
On the other hand, in the subjectivity level calculation function, the subjectivity level P of each comment is calculated based on the subjective index extracted from each comment. The degree of subjectivity P is obtained by using (circle 1) emoticon content degree s extracted from the SNS comment, (circle 2) stylistic features w ₁ + w ₂ +... + W _n , (circle 3) excitement degree e, for example, It can be calculated by a linear coupling formula as in 1). Here, a comment group in which the value of P is greater than or equal to the threshold value is extracted as highly topical comment data 60. At this time, the news is extracted as news data 50.

また、主観度算出機能では、評価指標重み付け部を含んでもよい。この場合、ＷＥＢニュースの特徴、具体的には（Ａ）ジャンル、（Ｂ）配信元、（Ｃ）配信日時に基づいて、主観指標に重み付けをすることで、主観コメント群を選別する。ＷＥＢニュースのジャンル、配信元、配信日時などは、ＲＳＳを用いてＷＥＢニュースを収集する際に同時に取得することが可能である。重みづけを考慮した主観度Ｐの算出式の例を、式（２）に示す。ここで、重み係数α_Ａ、α_Ｂ、α_Ｃはそれぞれ、コメント中の顔文字表現に対する、ジャンルによる重み、配信元による重み、配信日時による重み、となる。同様に、重み係数β、γは文体、盛り上がり表現、に対するそれぞれの重みとなる。

The subjectivity calculation function may include an evaluation index weighting unit. In this case, the subjective comment group is selected by weighting the subjective index based on the characteristics of WEB news, specifically, (A) genre, (B) distribution source, and (C) distribution date and time. The genre, distribution source, distribution date, etc. of WEB news can be acquired simultaneously when collecting WEB news using RSS. An example of an expression for calculating the degree of subjectivity P in consideration of weighting is shown in Expression (2). Here, the weighting factors α _A , α _B , and α _C are respectively the weight by genre, the weight by the distribution source, and the weight by the distribution date and time for the emoticon expression in the comment. Similarly, the weighting factors β and γ are weights for the style and the expression of excitement.

例えば、ジャンルがスポーツの場合、より共感きるコメントを優先的に表示するため、盛り上がり表現の重みγ_Ａを大きく設定すると有効である。一方、ジャンルが政治・経済の場合、冷静なコメントが求められることから、γ_Ａの値は小さくなる。また、週刊誌が配信するニュースは、読者の感情を引き出すような見出しや文章が多いことから、感情的なコメントを優先的に表示するα_Ｂの重みを大きくすることで、より共感できる主観コメントが得られる。さらに、休日に配信されるニュースは、平日に配信されるニュースと比べて穏やかな内容のニュースが多く、読者もそのようなコメントを望む可能性が高いことから、命令形の文体を排除するようβ_Ａの重みを設定する、といったコメント群の選別方法が有効である。

For example, when the genre is sports, it is effective to set a larger weight γ _A for the swell expression in order to preferentially display comments that can be more sympathetic. On the other hand, when the genre is politics / economics, since a calm comment is required, the value of γ _A becomes small. In addition, since news delivered by weekly magazines has many headlines and sentences that draw out the emotions of readers, subjective comments that can be more sympathetic by increasing the weight of α _B that preferentially displays emotional comments Is obtained. Furthermore, news that is distributed on holidays is more calm than news that is distributed on weekdays, and readers are more likely to want such comments. _A comment group selection method such as setting the weight of β _A is effective.

次に、ステップＳ３で抽出されたニュースデータ５０およびコメントデータ６０をＰＣ４０に入力し、入力されたニュースデータ５０およびコメントデータ６０における感情データ８０または主観度９０を解析する（ステップＳ４）。ここでは、入力されるニュースデータ５０およびコメントデータ６０が単語列の場合を説明するが、本発明の技術的思想は、単語に限定されるわけではなく、句であっても文であっても良い。本実施形態では、感情分類とそれぞれの強度から構成される感情語データベースを使用する。なお、感情語データベースは、日本語辞書などの大量の単語データベースに存在する全ての単語に対して、人間の表情形成に用いられる感情分類「喜び」「悲しみ」「怒り」「嫌悪」「恐怖」「罪」「恥」「興味」「驚き」の９つがどれくらいの割合で存在するか規定し、それぞれの感情強度を０〜１の範囲で、０．１刻みに１０段階で指定して、予め形成されている。 Next, the news data 50 and comment data 60 extracted in step S3 are input to the PC 40, and emotion data 80 or subjectivity 90 in the input news data 50 and comment data 60 is analyzed (step S4). Here, the case where the news data 50 and the comment data 60 that are input are word strings will be described. However, the technical idea of the present invention is not limited to words, and may be phrases or sentences. good. In this embodiment, an emotion word database composed of emotion classifications and respective intensities is used. The emotion word database is the emotion classification "joy", "sadness", "anger", "disgust", "fear" used for human expression formation for all words in a large number of word databases such as Japanese dictionary Specify the ratio of “Sin”, “Shame”, “Interest”, and “Surprise”, and specify the emotional intensity in the range of 0 to 1 in 10 steps in increments of 0.1. Is formed.

また、入力されるニュースデータ５０またはコメントデータ６０が句または文である場合は、単語の場合と同様に、句または文に全体における感情分類とそれぞれの強度から構成される感情句データベースまたは感情文データベースを使用すれば良い。ここで、「喜び」の強度をＳ_１、「悲しみ」の強度をＳ_２、「怒り」の強度をＳ_３、「嫌悪」の強度をＳ_４、「恐怖」の強度をＳ_５、「罪」の強度をＳ_６、「恥」の強度をＳ_７、「興味」の強度をＳ_８、「驚き」の強度をＳ_９と表す。ただし、式（３）を満たす。 When the input news data 50 or comment data 60 is a phrase or sentence, as in the case of a word, the phrase phrase or sentence is an emotion phrase database or emotion sentence composed of emotion classifications and respective strengths as a whole. Use a database. Here, the intensity of “joy” is S ₁ , the intensity of “sadness” is S ₂ , the intensity of “anger” is S ₃ , the intensity of “hate” is S ₄ , the intensity of “fear” is S ₅ , “sin” ”Is represented as S ₆ ,“ Shame ”as S ₇ ,“ Interest ”as S ₈ , and“ Surprise ”as S ₉ . However, Formula (3) is satisfy | filled.

そして、入力されるニュースデータ５０およびコメントデータ６０と一致または類似する単語を感情語データベースの中から検索し、その単語における感情分類と強度を抽出し、感情データ８０を生成する。すなわち、ニュースデータ５０における感情分類は、式（４）で表わされる。

Then, a word that matches or is similar to the input news data 50 and comment data 60 is searched from the emotion word database, the emotion classification and intensity in the word are extracted, and emotion data 80 is generated. That is, the emotion classification in the news data 50 is expressed by the equation (4).

コメントデータ６０における感情分類は、式（５）で表わされる。

The emotion classification in the comment data 60 is expressed by Expression (5).

このように、９次元のベクトルで示される。ここでｋは、入力されるコメントデータ６０のＩＤを示している。また、入力される台詞のニュースデータ５０およびコメントデータ６０における全体強度Ｗは、式（６）で表わされる。

Thus, it is represented by a 9-dimensional vector. Here, k indicates the ID of the comment data 60 to be input. Further, the overall strength W in the input news data 50 and comment data 60 is expressed by equation (6).

例えば、コメントデータ６０として、「泣き笑い」を入力すると、式（７）が感情データ８０として生成される。

For example, when “crying laughter” is input as the comment data 60, Expression (7) is generated as the emotion data 80.

一方、主観度９０は、式（２）を用いて、ニュースデータ５０の主観度Ｐ（Ｎ）およびコメントデータ６０の主観度Ｐ（Ｃ_ｋ）を算出する。

On the other hand, the subjectivity 90 calculates the subjectivity P (N) of the news data 50 and the subjectivity P (C _k ) of the comment data 60 using equation (2).

次に、ＰＣ４０において、ステップＳ４で解析された感情データ８０または主観度９０に含まれる感情または主観度の動的特徴量に基づいて時間的アニメーションデータ７０−１を生成する（ステップＳ５）。本実施形態では、まず、キャラクタの全身動作または表情変化の際のアニメーションデータと、日本語辞書などの代表的な単語データベースに存在する各単語をパラメータとして算出した各アニメーションデータにおける類似度パラメータ、のペアで構成されるアニメーションデータベースを用意する。なお、ここで言うアニメーションデータは、キャラクタの全てのボーンまたはポリゴンが任意の時刻において何れの空間位置座標に存在するかが記述されている。 Next, in the PC 40, temporal animation data 70-1 is generated based on the emotion or the dynamic feature quantity of the subjectivity included in the emotion data 80 or the subjectivity 90 analyzed in step S4 (step S5). In this embodiment, first, animation data at the time of a character's whole body movement or expression change, and similarity parameters in each animation data calculated using each word existing in a typical word database such as a Japanese dictionary as a parameter, Prepare an animation database consisting of pairs. Note that the animation data referred to here describes at which spatial position coordinates all the bones or polygons of the character exist at an arbitrary time.

ニュースデータ５０およびコメントデータ６０における類似度パラメータを算出し、アニメーションデータベースの中にある全てのアニメーションデータにおける類似度パラメータとのコサイン類似度を求め、最も値の大きいアニメーションデータをキャラクタのアニメーションデータ７０として選定する。ここで、選定されたアニメーションデータ７０にキャラクタにおける任意のボーンの空間座標が記録されている場合について述べる。本実施形態では、予め収録された全ての全身動作または表情変化における無感情のアニメーションデータおよび各感情分類のアニメーションデータを学習させ、無感情のアニメーションデータから各感情分類のアニメーションデータへの変換を予め定義し、選定されたアニメーションデータ７０を、ステップＳ４で解析された感情データ８０の分散Ｖ_ｉ（式（８））または主観度９０の分散Ｖ_ｉ（式（９））が閾値を超えた感情分類ｉに対し、動的特徴量Δ_ｉ（式（１０）、式（１１））に応じて合成変換することで、キャラクタの全身動作または表情変化における時間的アニメーションデータ７０−１を生成する。 The similarity parameter in the news data 50 and the comment data 60 is calculated, the cosine similarity with the similarity parameter in all the animation data in the animation database is obtained, and the animation data having the largest value is used as the animation data 70 of the character. Select. Here, a case where spatial coordinates of an arbitrary bone in a character are recorded in the selected animation data 70 will be described. In this embodiment, the emotionless animation data and animation data of each emotion classification in all pre-recorded whole body motions or facial expression changes are learned, and conversion from emotionless animation data to animation data of each emotion classification is performed in advance. The animation data 70 defined and selected is an emotion whose variance V _i (expression (8)) of the emotion data 80 analyzed in step S4 or variance V _i (expression (9)) of the subjectivity level 90 exceeds a threshold value. The temporal animation data 70-1 for the whole body motion or expression change of the character is generated by performing synthetic conversion on the classification i in accordance with the dynamic feature amount Δ _i (Equation (10), Equation (11)).

本明細書では、一例として、主成分分析を用いてキャラクタの全身動作または表情変化に感情を付与するが、本発明の技術的思想は、主成分分析に限定されるわけでなく、非線形状態空間写像や機械学習等、別の方法を用いて全身動作または表情変化に感情を付与しても良い。まず、無感情および人間の表情形成に用いられる感情分類「喜び」「悲しみ」「怒り」「嫌悪」「恐怖」「罪」「恥」「興味」「驚き」の９つの感情の全身動作または表情変化におけるアニメーションデータを収録等により予め複数用意し、学習用データベースに登録しておく。登録された全ての無感情のアニメーションデータおよび各感情分類のアニメーションデータを学習し、線形回帰手法により、無感情のアニメーションデータから各感情分類のアニメーションデータに変換するためのパラメータを算出する。すなわち、予め用意された全身動作または表情変化をｍ（ｍ＝１，２，…）とすると、全身動作または表情変化ｍにおける無感情のアニメーションデータの第ｊ主成分座標の微分値ｋ^ｊ（ｍ）を用いて、式（１２）により線形回帰演算を行なって、変換パラメータａ^ｊ _ｉ、ｂ^ｊ _ｉを算出する。

In this specification, as an example, the principal component analysis is used to give emotions to the whole body motion or facial expression change of the character. However, the technical idea of the present invention is not limited to the principal component analysis, but a nonlinear state space. Other methods such as mapping and machine learning may be used to give emotions to whole body movements or facial expression changes. First, the emotional movements and expressions of the nine emotions of emotion classification "joy", "sadness", "anger", "hate", "fear", "sin", "shame", "interest" and "surprise" A plurality of animation data in the change is prepared in advance by recording or the like and registered in the learning database. All registered emotionless animation data and animation data of each emotion classification are learned, and parameters for converting emotionless animation data into animation data of each emotion classification are calculated by a linear regression method. That is, if m (m = 1, 2,...) Represents a pre-prepared whole body motion or facial expression change, a differential value k ^j (m) of the j-th principal component coordinates of emotionless animation data in the whole body motion or facial expression change m. ), The linear regression calculation is performed according to the equation (12), and the conversion parameters a ^j _i and b ^j _i are calculated.

ただし、ｑ^ｊ _ｉ（ｍ）は全身動作または表情変化ｍにおける各感情のアニメーションデータの第ｊ主成分座標の微分値を示しており、ｉ＝１は感情分類が「喜び」の場合を、ｉ＝２は感情分類が「悲しみ」の場合を、ｉ＝３は感情分類が「怒り」の場合を、ｉ＝４は感情分類が「嫌悪」の場合を、ｉ＝５は感情分類が「恐怖」の場合を、ｉ＝６は感情分類が「罪」の場合を、ｉ＝７は感情分類が「恥」の場合を、ｉ＝８は感情分類が「興味」の場合を、ｉ＝９は感情分類が「驚き」の場合をそれぞれ表す。

However, q ^j _i (m) indicates the differential value of the j-th principal component coordinate of animation data of each emotion in whole body motion or facial expression change m, i = 1 indicates that the emotion classification is “joy”, i = 2 is when the emotion classification is “sadness”, i = 3 is when the emotion classification is “anger”, i = 4 is when the emotion classification is “disgust”, i = 5 is when the emotion classification is “fear” ”, I = 6 is the emotion classification“ sin ”, i = 7 is the emotion classification“ shame ”, i = 8 is the emotion classification“ interest ”, i = 9 Represents the case where the emotion classification is “surprise”.

次に、算出されたΔ_ｉに対応する変換パラメータａ^ｊ _ｉ、ｂ^ｊ _ｉを用いて、選定されたアニメーションデータを合成変換し、キャラクタの全身動作または表情変化における時間的アニメーションデータ７０−１を生成する。すなわち、時間的アニメーションデータ７０−１の第ｊ主成分座標の微分値ｐ^ｊ（ｈ）は、式（１３）となる。

これにより、例えば、続々と入力されるニュースデータ５０が段階的に暗くなる場合は、徐々に沈静化したアニメーションを時間的アニメーションデータ７０−１として生成することが可能となる。また、主観度９０を用いることで、例えば、ニュースデータ５０のジャンルがスポーツの場合、引き分けの試合の結果を伝えるニュースデータ５０および「まずまずの結果」というコメントデータ６０に対して、主観度９０が大きい場合は、アニメーションデータベースの中からポジティブなアニメーションを象徴する「親指を上げる」アニメーションを時間的アニメーションデータ７０−１として生成することができる。一方、主観度９０が小さい場合は、ネガティブなアニメーションを象徴する「肩をすくめる」アニメーションを時間的アニメーションデータ７０−１として生成することができる。 Next, the selected animation data is synthesized and converted using the conversion parameters a ^j _i and b ^j _i corresponding to the calculated Δ _i, and temporal animation data 70-1 in the whole body motion or expression change of the character is obtained. Generate. That is, the differential value p ^j (h) of the j-th principal component coordinate of the temporal animation data 70-1 is expressed by Equation (13).

Thereby, for example, when the news data 50 input one after another becomes dark in steps, it is possible to generate a gradually calmed animation as the temporal animation data 70-1. In addition, by using the subjectivity level 90, for example, when the genre of the news data 50 is sports, the subjectivity level 90 is increased with respect to the news data 50 that conveys the result of the draw match and the comment data 60 that is "a reasonable result". If it is larger, a “thumb up” animation symbolizing a positive animation can be generated from the animation database as the temporal animation data 70-1. On the other hand, when the degree of subjectivity 90 is small, a “shoulder shrug” animation symbolizing a negative animation can be generated as the temporal animation data 70-1.

次に、ＰＣ４０において、ステップＳ４で解析された感情データ８０または主観度９０に含まれる感情または主観度の動的特徴量に基づいて時間的アニメーションデータ７０−２を生成する（ステップＳ６）。ステップＳ５で選定されたアニメーションデータ７０を、ステップＳ４で解析された感情データ８０の分散Ｖ_ｉが閾値を下回った感情分類ｉに対し、静的特徴量Ｅ_ｉ（式（１４）、式（１５））に応じて、背景色の変更、キャラクタの立ち位置の変更等の空間的アニメーションデータ７０−２を生成する。

例えば、Ｅ_５の値が大きく算出された場合、「恐怖」が継続的であるので、背景色を少し暗くし、入力されるニュースデータ５０またはコメントデータ６０が明るい内容であっても、暗い印象を与えることが可能となる。また、Ｅ_７の値が大きく算出された場合、「恥」が継続的であるので、キャラクタの立ち位置を少し遠ざけ、入力されるニュースデータ５０またはコメントデータ６０の内容に応じず恥ずかしい印象を与える可能となる。 Next, in the PC 40, temporal animation data 70-2 is generated based on the emotion or the dynamic feature quantity of the subjectivity included in the emotion data 80 or the subjectivity 90 analyzed in step S4 (step S6). With respect to the emotion classification i in which the variance V _i of the emotion data 80 analyzed in step S4 falls below the threshold, the static feature quantity E _i (equations (14) and (15) )), Spatial animation data 70-2 such as a background color change and a character standing position change is generated.

For example, when the value of E ₅ is calculated to be large, “fear” is continuous, so the background color is slightly darkened, and even if the input news data 50 or comment data 60 has bright content, the dark impression Can be given. Further, when the value of E ₇ is calculated to be large, “shame” is continuous, so that the character's standing position is slightly moved away and an embarrassed impression is given regardless of the contents of the input news data 50 or comment data 60. It becomes possible.

次に、ＰＣ４０において、ステップＳ５およびステップＳ６で生成されたキャラクタの時間的アニメーションデータ７０−１および空間的アニメーションデータ７０−２における画像信号を生成する（ステップＳ７）。まず、ニュースデータ５０を提示するキャラクタの描画を開始して画像信号の生成を開始する。次に、ステップＳ５およびステップＳ６で生成されたニュースデータ５０における時間的アニメーションデータ７０−１および空間的アニメーションデータ７０−２を反映したキャラクタを描画し、画像信号を生成する。その後、ニュースデータ５０における画像信号の生成が終了すると、コメントデータ６０を提示するキャラクタの描画を開始して画像信号の生成を開始する。 Next, the PC 40 generates image signals in the temporal animation data 70-1 and the spatial animation data 70-2 of the character generated in steps S5 and S6 (step S7). First, drawing of a character presenting the news data 50 is started, and generation of an image signal is started. Next, a character reflecting the temporal animation data 70-1 and the spatial animation data 70-2 in the news data 50 generated in step S5 and step S6 is drawn to generate an image signal. Thereafter, when the generation of the image signal in the news data 50 is finished, the drawing of the character presenting the comment data 60 is started and the generation of the image signal is started.

次に、ステップＳ５およびステップＳ６で生成されたコメントデータ６０における時間的アニメーションデータ７０−１および空間的アニメーションデータ７０−２を反映したキャラクタを描画し、画像信号を生成する。全てのコメントデータ６０に対して同様の処理を繰り返し、コメントデータ６０における画像信号を生成する。最後のコメントデータ６０に対しての処理が終了した後、別のニュースを提示する場合は、ステップＳ１に戻り、次のニュースデータ５０およびコメントデータ６０に対しての処理を開始する。なお、本発明の技術的思想は、ニュースデータ５０を提示するキャラクタとコメントデータ６０を提示するキャラクタは、それぞれ同じであっても異なっていてもどちらでも良い。次に、未処理データがあるかどうかを判断し（ステップＳ７−２）、ある場合は、ステップＳ１に遷移し、無い場合は、ステップＳ８に遷移する。 Next, a character reflecting the temporal animation data 70-1 and the spatial animation data 70-2 in the comment data 60 generated in step S5 and step S6 is drawn to generate an image signal. Similar processing is repeated for all the comment data 60 to generate an image signal in the comment data 60. When another news is presented after the process for the last comment data 60 is completed, the process returns to step S1, and the process for the next news data 50 and comment data 60 is started. In the technical idea of the present invention, the character presenting the news data 50 and the character presenting the comment data 60 may be the same or different. Next, it is determined whether or not there is unprocessed data (step S7-2). If there is, the process proceeds to step S1, and if not, the process proceeds to step S8.

最後に、ステップＳ４で入力されたニュースデータ５０およびコメントデータ６０の画像信号と共に、ステップＳ７で生成されたキャラクタの画像信号をディスプレイ１０に映写する（ステップＳ８）。 Finally, the image signal of the character generated in step S7 is projected on the display 10 together with the image data of the news data 50 and comment data 60 input in step S4 (step S8).

このように、本実施形態によれば、話題性の高いニュースおよびコメント群をＷＥＢサイトから抽出し、ニュースデータ５０およびコメントデータ６０としてＰＣ４０に入力し、入力されたニュースデータ５０およびコメントデータ６０における感情データ８０または主観度９０を解析し、解析された感情データ８０または主観度９０に含まれる感情または主観度の動的特徴量に基づいて時間的アニメーションデータ７０−１を生成し、解析された感情データ８０または主観度９０に含まれる感情または主観度の静的特徴量に基づいて空間的アニメーションデータ７０−２を生成し、入力されたニュースデータ５０およびコメントデータ６０および生成されたキャラクタの時間的アニメーションデータ７０−１および空間的アニメーションデータ７０−２における画像信号を再生するので、入力データがＳＮＳのコメント群のような砕けたテキストデータであっても、適切なアニメーションを持つキャラクタにより提示することができる。また、複数の入力データを包括的に解析すれば、コメント群を反映したニュース本文の提示のように、集約されたデータを踏まえたアニメーションを持つキャラクタによる情報提示が可能となる。さらに、キャラクタのアニメーションとして、キャラクタが全身動作または表情変化する際の、任意の時刻におけるキャラクタの任意のボーンまたはポリゴンの空間座標を制御すれば、テレビ局が制作する本当のニュース番組に近いコンテンツを提供することができる。 As described above, according to the present embodiment, highly topical news and comment groups are extracted from the WEB site and input to the PC 40 as the news data 50 and the comment data 60. The emotion data 80 or the subjectivity level 90 is analyzed, and temporal animation data 70-1 is generated and analyzed based on the dynamic feature quantity of the emotion or subjectivity included in the analyzed emotion data 80 or the subjectivity level 90. Spatial animation data 70-2 is generated based on the emotion or subjectivity static feature amount included in the emotion data 80 or the subjectivity 90, and the input news data 50 and comment data 60 and the time of the generated character are generated. Animation data 70-1 and spatial animation data Since the picture signals in data 70-2, the input data is even broken text data, such as comments group SNS, can be presented by the character with the appropriate animation. Moreover, if a plurality of input data are comprehensively analyzed, information can be presented by a character having an animation based on the aggregated data, as in the case of presenting a news body reflecting a comment group. In addition, if the character's animation controls the spatial coordinates of any bone or polygon of the character at any time when the character changes its whole body motion or facial expression, it provides content close to the real news program produced by the TV station can do.

［第２の実施形態］
第２の実施形態に係るアニメーション生成装置は、台詞である音声信号を入力し、入力された音声信号を時系列で記録した音声データに基づいて台詞のテキストデータおよび台詞の開始時刻および終了時刻を時系列で生成する。この生成された台詞のテキストデータおよび台詞の開始時刻および終了時刻に基づいてキャラクタのアニメーションデータを時系列で生成する。アニメーションデータには感情データを伴っていても良く、生成された感情データに基づいてアニメーションデータに感情を付与し、台詞の開始時刻および終了時刻に基づいてアニメーションデータの時間長を調整する。この生成された台詞の開始時刻および終了時刻に応じて、音声データにおける音声信号および生成されたアニメーションデータにおける画像信号を生成する。 [Second Embodiment]
The animation generation apparatus according to the second embodiment inputs speech signals that are lines, and based on the sound data that records the input sound signals in time series, the line text data and the line start time and end time. Generate in time series. Character animation data is generated in time series based on the generated line text data and the line start time and end time. The animation data may be accompanied by emotion data, and emotion is given to the animation data based on the generated emotion data, and the time length of the animation data is adjusted based on the start time and end time of the dialogue. An audio signal in the audio data and an image signal in the generated animation data are generated according to the generated start time and end time of the dialogue.

これにより、キャラクタの台詞が動的に生成される場合であっても、画像信号と音声信号の同期ズレを解消して、キャラクタを生成することができる。また、アニメーションデータとしてキャラクタの任意のボーンまたはポリゴンの空間座標を制御した画像信号を生成するので、台詞に応じた複雑なアニメーションを持つキャラクタを生成することができる。 Thereby, even when the dialogue of the character is dynamically generated, it is possible to eliminate the synchronization shift between the image signal and the audio signal and generate the character. In addition, since an image signal in which spatial coordinates of an arbitrary bone or polygon of the character is controlled is generated as animation data, a character having a complicated animation corresponding to the dialogue can be generated.

図４は、第２の実施形態に係るアニメーション生成装置の概略構成を示す図である。このアニメーション生成装置は、マイクロフォン２１０と、スピーカ２２０、ディスプレイ２３０と、ＰＣ（Personal Computer）２４０とから構成されている。そして、マイクロフォン２１０によりＰＣ２４０に音声データ２５０が入力される。なお、図４では、マイクロフォン２１０が、ＰＣ２４０に接続されているが、本発明の技術的思想は、これに限定されるわけではなく、任意手段によりＰＣに対して音声データが入力されれば十分である。 FIG. 4 is a diagram illustrating a schematic configuration of an animation generation apparatus according to the second embodiment. This animation generation apparatus includes a microphone 210, a speaker 220, a display 230, and a PC (Personal Computer) 240. Then, the voice data 250 is input to the PC 240 by the microphone 210. In FIG. 4, the microphone 210 is connected to the PC 240, but the technical idea of the present invention is not limited to this, and it is sufficient if audio data is input to the PC by any means. It is.

ＰＣ２４０は、ケーブル２４０ａを介してアニメーション生成装置としてのスピーカ２２０およびディスプレイ２３０に接続されている。ＰＣ２４０では、台詞として入力された音声信号を時系列で記録した音声データ２５０に基づいて台詞のテキストデータまたは台詞の開始時刻および終了時刻を時系列で生成する。また、生成された台詞のテキストデータおよび台詞の開始時刻および終了時刻に基づいて、キャラクタのアニメーションデータおよび感情データを時系列で生成する。さらに、生成された感情データに基づいてアニメーションデータに感情を付与すると共に生成された台詞の開始時刻および終了時刻に基づいてアニメーションデータの時間長を調整する。 The PC 240 is connected to a speaker 220 and a display 230 as an animation generation device via a cable 240a. In the PC 240, speech text data or speech start time and speech end time are generated in time series based on speech data 250 in which speech signals input as speech are recorded in time series. Also, animation data and emotion data of the character are generated in time series based on the generated line text data and the line start time and end time. Further, emotion is given to the animation data based on the generated emotion data, and the time length of the animation data is adjusted based on the start time and end time of the generated dialogue.

そして、ＰＣ２４０は、生成された台詞の開始時刻および終了時刻に応じて、音声データにおける音声信号を随時スピーカ２２０に発信する。一方、ＰＣ２４０は、生成されたアニメーションデータにおける画像信号を随時ディスプレイ２３０に発信する。その際、ＰＣ２４０において生成された画像はディスプレイ２３０にＡ１として映写される。なお、本発明の実施形態に係るアニメーション生成装置では、音声データにおける音声信号またはアニメーションデータにおける画像信号をスピーカ２２０またはディスプレイ２３０に発信しなくても構わない。 And PC240 transmits the audio | voice signal in audio | voice data to the speaker 220 at any time according to the start time and end time of the produced | generated dialog. On the other hand, the PC 240 transmits an image signal in the generated animation data to the display 230 as needed. At that time, the image generated in the PC 240 is projected on the display 230 as A1. In the animation generation apparatus according to the embodiment of the present invention, the audio signal in the audio data or the image signal in the animation data may not be transmitted to the speaker 220 or the display 230.

図５は、第２の実施形態に係るアニメーション生成装置の機能を示すブロック図である。ＰＣ２４０の台詞テキスト生成部２４０−１は、マイクロフォン２１０により台詞として入力された音声信号を時系列で記録した音声データ２５０を認識して、台詞のテキストデータ２６０−１を時系列で生成する。また、ＰＣ２４０の台詞時刻判定部２４０−２は、音声データ２５０の有音区間を検出して、台詞の開始時刻および終了時刻２６０−２を時系列で判定する。また、ＰＣ２４０のアニメーション生成部２４０−３は、台詞のテキストデータ２６０−１の内容を解析して、台詞の開始時刻および終了時刻２６０−２毎に、キャラクタのアニメーションデータ２７０を時系列で生成する。 FIG. 5 is a block diagram illustrating functions of the animation generation apparatus according to the second embodiment. The dialogue text generation unit 240-1 of the PC 240 recognizes speech data 250 in which speech signals input as speech by the microphone 210 are recorded in time series, and generates speech text data 260-1 in time series. Further, the line time determination unit 240-2 of the PC 240 detects a voiced section of the audio data 250, and determines the start time and end time 260-2 of the line in time series. The animation generation unit 240-3 of the PC 240 analyzes the contents of the line text data 260-1 and generates character animation data 270 in time series for each line start time and end time 260-2. .

また、ＰＣ２４０の感情生成部２４０−４は、台詞のテキストデータ２６０−１の内容を解析して、キャラクタの感情データ２８０を時系列で生成する。また、ＰＣ２４０の感情付与部２４０−５は、感情データ２８０に応じて、アニメーションデータ２７０を変換する。また、ＰＣ２４０の時間長調整部２４０−６は、台詞の開始時刻および終了時刻２６０−２に基づいて、アニメーションデータ２７０を変換する。また、ＰＣ２４０のキャラクタ再生部２４０−７は、台詞の開始時刻および終了時刻２６０−２に応じて、音声データ２５０における音声信号および生成されたアニメーションデータ２７０における画像信号を再生する。 Also, the emotion generation unit 240-4 of the PC 240 analyzes the content of the dialogue text data 260-1 and generates character emotion data 280 in time series. Also, the emotion giving unit 240-5 of the PC 240 converts the animation data 270 according to the emotion data 280. Further, the time length adjustment unit 240-6 of the PC 240 converts the animation data 270 based on the start time and end time 260-2 of the dialogue. Further, the character reproduction unit 240-7 of the PC 240 reproduces the audio signal in the audio data 250 and the image signal in the generated animation data 270 according to the start time and end time 260-2 of the dialogue.

図６は、第２の実施形態に係るアニメーション生成装置の動作を示すフローチャートである。初めに、ＰＣ２４０に、マイクロフォン２１０により音声信号を入力し、入力された音声信号を時系列で記録して音声データ２５０を形成する（ステップＳ２１）。次に、ＰＣ２４０において、音声データ２５０を認識して、台詞のテキストデータを時系列で生成する（ステップＳ２２）。 FIG. 6 is a flowchart illustrating the operation of the animation generation apparatus according to the second embodiment. First, an audio signal is input to the PC 240 by the microphone 210, and the input audio signal is recorded in time series to form audio data 250 (step S21). Next, in the PC 240, the voice data 250 is recognized, and the line text data is generated in time series (step S22).

次に、ＰＣ２４０において、音声データ２５０の有音区間を検出して、台詞の開始時刻および終了時刻２６０−２を時系列で判定する（ステップＳ２３）。音声データ２５０において、音圧レベルが閾値より大きい状況が一定時間以上継続した場合、該当区間を有音区間とし、区間の先頭を台詞の開始時刻、区間の終端を台詞の終了時刻と判定する。また、ステップＳ２２で生成された台詞のテキストデータを、判定された台詞の開始時刻から終了時刻までを１つの単位として、台詞のテキストデータ２６０−１に分割する。 Next, in the PC 240, a voiced section of the audio data 250 is detected, and the start time and end time 260-2 of the speech are determined in time series (step S23). In the audio data 250, when a situation where the sound pressure level is greater than the threshold value continues for a certain time or longer, the corresponding section is determined as a voiced section, the beginning of the section is determined as the start time of the line, and the end of the section is determined as the end time of the line. Further, the line text data generated in step S22 is divided into line text data 260-1 with the determined line start time to end time as one unit.

次に、ＰＣ２４０において、ステップＳ２３で分割された台詞のテキストデータ２６０−１の内容を解析して、ステップＳ２３で判定された台詞の開始時刻および終了時刻２６０−２毎に、「目を閉じる」「お辞儀する」「手を挙げる」等のキャラクタのアニメーションデータ２７０を時系列で生成する（ステップＳ２４）。本実施形態では、キャラクタの全身動作または表情変化の際のアニメーションデータと、日本語辞書などの代表的な単語データベースに存在する各単語をパラメータとして算出した各アニメーションデータにおける類似度パラメータ、のペアで構成されるアニメーションデータベースを用意する。 Next, the PC 240 analyzes the contents of the line text data 260-1 divided in step S <b> 23, and “closes eyes” for each line start time and end time 260-2 determined in step S <b> 23. Animation data 270 of the character such as “bow” or “raise hand” is generated in time series (step S24). In this embodiment, a pair of animation data when the character's whole body motion or expression changes and a similarity parameter in each animation data calculated using each word existing in a typical word database such as a Japanese dictionary as a parameter. Prepare a configured animation database.

なお、ここで言うアニメーションデータは、キャラクタの全てのボーンまたはポリゴンが任意の時刻において何れの空間位置座標に存在するかが記述されている。台詞のテキストデータ２６０−１における類似度パラメータを算出し、アニメーションデータベースの中にある全てのアニメーションデータにおける類似度パラメータとのコサイン類似度を求め、最も値の大きいアニメーションデータをキャラクタのアニメーションデータ２７０として時系列で生成する。 Note that the animation data referred to here describes at which spatial position coordinates all the bones or polygons of the character exist at an arbitrary time. The similarity parameter in the line text data 260-1 is calculated, the cosine similarity with the similarity parameter in all animation data in the animation database is obtained, and the animation data having the largest value is used as the character animation data 270. Generate in time series.

次に、ＰＣ２４０において、ステップＳ２３で分割された台詞のテキストデータ２６０−１の内容を解析して、感情分類および強度から構成されるキャラクタの感情データ２８０を時系列で生成する（ステップＳ２５）。ここでは、入力される台詞のテキストデータ２６０−１が単語列の場合を説明するが、本発明の技術的思想は、単語に限定されるわけではなく、句であっても文であっても良い。本実施形態では、感情分類とそれぞれの強度から構成される感情語データベースを使用する。 Next, the PC 240 analyzes the contents of the line text data 260-1 divided in step S23, and generates character emotion data 280 composed of emotion classification and intensity in time series (step S25). Here, the case where the input text data 260-1 is a word string will be described. However, the technical idea of the present invention is not limited to words, and may be a phrase or a sentence. good. In this embodiment, an emotion word database composed of emotion classifications and respective intensities is used.

なお、感情語データベースは、日本語辞書などの大量の単語データベースに存在する全ての単語に対して、人間の表情形成に用いられる感情分類「喜び」「悲しみ」「怒り」「嫌悪」「恐怖」「罪」「恥」「興味」「驚き」の９つがどれくらいの割合で存在するか規定し、それぞれの感情強度を０〜１の範囲で、０．１刻みに１０段階で指定して、予め形成されている。また、入力される台詞のテキストデータ２６０−１が句または文である場合は、単語の場合と同様に、句または文に全体における感情分類とそれぞれの強度から構成される感情句データベースまたは感情文データベースを使用すれば良い。ここで、「喜び」の強度をＳ_１、「悲しみ」の強度をＳ_２、「怒り」の強度をＳ_３、「嫌悪」の強度をＳ_４、「恐怖」の強度をＳ_５、「罪」の強度をＳ_６、「恥」の強度をＳ_７、「興味」の強度をＳ_８、「驚き」の強度をＳ_９と表す。ただし、次式を満たすものとする。 The emotion word database is the emotion classification "joy", "sadness", "anger", "disgust", "fear" used for human expression formation for all words in a large number of word databases such as Japanese dictionary Specify the ratio of “Sin”, “Shame”, “Interest”, and “Surprise”, and specify the emotional intensity in the range of 0 to 1 in 10 steps in increments of 0.1. Is formed. If the input text data 260-1 is a phrase or sentence, as in the case of a word, the phrase phrase or sentence is composed of an emotion phrase database or emotion sentence composed of the entire emotion classification and each strength. Use a database. Here, the intensity of “joy” is S ₁ , the intensity of “sadness” is S ₂ , the intensity of “anger” is S ₃ , the intensity of “hate” is S ₄ , the intensity of “fear” is S ₅ , “sin” ”Is represented as S ₆ ,“ Shame ”as S ₇ ,“ Interest ”as S ₈ , and“ Surprise ”as S ₉ . However, the following formula shall be satisfied.

そして、入力される台詞のテキストデータ２６０−１と一致または類似する単語を感情語データベースの中から検索し、その単語における感情分類と強度を抽出し、感情データ２８０を生成する。すなわち、台詞のテキストデータ２６０−１における感情分類は、次式のように、９次元のベクトルで示される。

Then, a word that matches or is similar to the input line text data 260-1 is searched from the emotion word database, and the emotion classification and strength in the word are extracted to generate emotion data 280. That is, the emotion classification in the line text data 260-1 is represented by a nine-dimensional vector as shown in the following equation.

また、入力される台詞のテキストデータ２６０−１における全体強度Ｗは、次式で表わされる。

Further, the overall intensity W in the input text data 260-1 is expressed by the following equation.

例えば、台詞のテキストデータ２６０−１として、「泣き笑い」を入力すると、次式が感情データ２８０として生成される。

For example, when “crying and laughing” is input as the text data 260-1 of the dialogue, the following expression is generated as the emotion data 280.

次に、ＰＣ２３０において、感情データ２８０に応じてアニメーションデータ２７０を変換し、キャラクタの全身動作または表情変化に感情を付与する（ステップＳ２６）。まず、アニメーションデータ２７０にキャラクタにおける任意のボーンの空間座標が時系列で記録されている場合について述べる。本実施形態では、予め収録された全ての全身動作または表情変化における無感情のアニメーションデータおよび各感情分類のアニメーションデータを学習させ、無感情のアニメーションデータから各感情分類のアニメーションデータへの変換を予め定義し、ステップＳ２４で生成されたアニメーションデータ２７０を、ステップＳ２５で生成された感情データ２８０に応じて合成変換することで、キャラクタの全身動作または表情変化に感情を付与する。

Next, in the PC 230, the animation data 270 is converted according to the emotion data 280, and an emotion is given to the character's whole body motion or facial expression change (step S26). First, a case where the spatial coordinates of an arbitrary bone in a character are recorded in time series in the animation data 270 will be described. In this embodiment, the emotionless animation data and animation data of each emotion classification in all pre-recorded whole body motions or facial expression changes are learned, and conversion from emotionless animation data to animation data of each emotion classification is performed in advance. By defining and synthesizing the animation data 270 generated in step S24 according to the emotion data 280 generated in step S25, an emotion is imparted to the character's whole body motion or expression change.

本明細書では、一例として、主成分分析を用いてキャラクタの全身動作または表情変化に感情を付与するが、本発明の技術的思想は、主成分分析に限定されるわけでなく、非線形状態空間写像や機械学習等、別の方法を用いて全身動作または表情変化に感情を付与しても良い。まず、無感情および人間の表情形成に用いられる感情分類「喜び」「悲しみ」「怒り」「嫌悪」「恐怖」「罪」「恥」「興味」「驚き」の９つの感情の全身動作または表情変化におけるアニメーションデータを収録等により予め複数用意し、学習用データベースに登録しておく。登録された全ての無感情のアニメーションデータおよび各感情分類のアニメーションデータを学習し、線形回帰手法により、無感情のアニメーションデータから各感情分類のアニメーションデータに変換するためのパラメータを算出する。 In this specification, as an example, the principal component analysis is used to give emotions to the whole body motion or facial expression change of the character. However, the technical idea of the present invention is not limited to the principal component analysis, but a nonlinear state space. Other methods such as mapping and machine learning may be used to give emotions to whole body movements or facial expression changes. First, the emotional movements and expressions of the nine emotions of emotion classification "joy", "sadness", "anger", "hate", "fear", "sin", "shame", "interest" and "surprise" A plurality of animation data in the change is prepared in advance by recording or the like and registered in the learning database. All registered emotionless animation data and animation data of each emotion classification are learned, and parameters for converting emotionless animation data into animation data of each emotion classification are calculated by a linear regression method.

すなわち、予め用意された全身動作または表情変化をｍ（ｍ＝１，２，…）とすると、全身動作または表情変化ｍにおける無感情のアニメーションデータの第ｊ主成分座標の微分値ｋ^ｊ（ｍ）を用いて、次式により線形回帰演算を行なって、変換パラメータａ^ｊ _ｉ、ｂ^ｊ _ｉを算出する。 That is, if m (m = 1, 2,...) Represents a pre-prepared whole body motion or facial expression change, a differential value k ^j (m) of the j-th principal component coordinates of emotionless animation data in the whole body motion or facial expression change m. ), Linear regression calculation is performed according to the following equation to calculate conversion parameters a ^j _i and b ^j _i .

ただし、ｑ^ｊ _ｉ（ｍ）は全身動作または表情変化ｍにおける各感情のアニメーションデータの第ｊ主成分座標の微分値を示しており、ｉ＝１は感情分類が「喜び」の場合を、ｉ＝２は感情分類が「悲しみ」の場合を、ｉ＝３は感情分類が「怒り」の場合を、ｉ＝４は感情分類が「嫌悪」の場合を、ｉ＝５は感情分類が「恐怖」の場合を、ｉ＝６は感情分類が「罪」の場合を、ｉ＝７は感情分類が「恥」の場合を、ｉ＝８は感情分類が「興味」の場合を、ｉ＝９は感情分類が「驚き」の場合をそれぞれ表す。次に、ステップＳ２５で生成された感情データ２８０に対応する変換パラメータａ^ｊ _ｉ、ｂ^ｊ _ｉを用いて、ステップＳ２４で生成されたアニメーションデータ２７０を合成変換し、キャラクタの全身動作または表情変化に感情を付与する。すなわち、感情が付与されたアニメーションデータ２７０の第ｊ主成分座標の微分値ｐ^ｊ（ｍ）は、次式のようになる。

However, q ^j _i (m) indicates the differential value of the j-th principal component coordinate of animation data of each emotion in whole body motion or facial expression change m, i = 1 indicates that the emotion classification is “joy”, i = 2 is when the emotion classification is “sadness”, i = 3 is when the emotion classification is “anger”, i = 4 is when the emotion classification is “disgust”, i = 5 is when the emotion classification is “fear” ”, I = 6 is the emotion classification“ sin ”, i = 7 is the emotion classification“ shame ”, i = 8 is the emotion classification“ interest ”, i = 9 Represents the case where the emotion classification is “surprise”. Next, by using the conversion parameters a ^j _i and b ^j _i corresponding to the emotion data 280 generated in step S25, the animation data 270 generated in step S24 is synthesized and converted to change the character's whole body motion or expression change. Give emotions. That is, the differential value p ^j (m) of the j-th principal component coordinate of the animation data 270 to which the emotion is given is represented by the following equation.

時系列で記録されたアニメーションデータ２７０の全てに対して同様の操作を実行し、アニメーションデータ２７０を再生成する。

The same operation is performed on all the animation data 270 recorded in time series, and the animation data 270 is regenerated.

次に、アニメーションデータ２７０にキャラクタにおける任意のポリゴンの空間座標が時系列で記録されている場合について述べる。本実施形態では、ステップＳ２４で生成されたアニメーションデータ２７０を、ステップＳ２５で生成された感情データ２８０に応じて合成変換することで、キャラクタの全身動作または表情変化に感情を付与する。まず、人間の表情形成に用いられる感情分類「喜び」「悲しみ」「怒り」「嫌悪」「恐怖」「罪」「恥」「興味」「驚き」の９つの感情の全身動作または表情変化におけるアニメーションデータに対して、無感情のアニメーションデータからの変化量を予め定義しておく。ここで、アニメーションデータの中のあるポリゴンＰの座標に対して、「喜び」における無感情との差分を（Ｘ_１，Ｙ_１，Ｚ_１）、「悲しみ」における無感情との差分を（Ｘ_２，Ｙ_２，Ｚ_２）、「怒り」における無感情との差分を（Ｘ_３，Ｙ_３，Ｚ_３）、「嫌悪」における無感情との差分を（Ｘ_４，Ｙ_４，Ｚ_４）、「恐怖」における無感情との差分を（Ｘ_５，Ｙ_５，Ｚ_５）、「罪」における無感情との差分を（Ｘ_６，Ｙ_６，Ｚ_６）、「恥」における無感情との差分を（Ｘ_７，Ｙ_７，Ｚ_７）、における無感情との差分を（Ｘ_８，Ｙ_８，Ｚ_８）、「驚き」における無感情との差分を（Ｘ_９，Ｙ_９，Ｚ_９）、と表す。 Next, a case where the spatial coordinates of an arbitrary polygon in a character are recorded in time series in the animation data 270 will be described. In the present embodiment, the animation data 270 generated in step S24 is synthesized and converted according to the emotion data 280 generated in step S25, thereby giving an emotion to the character's whole body motion or expression change. First, the emotion classification “joy”, “sadness”, “anger”, “disgust”, “fear”, “sin”, “shame”, “interest” and “surprise” used for human expression formation are animated in whole body motion or facial expression change. The amount of change from the emotionless animation data is defined in advance for the data. Here, with respect to the coordinates of a certain polygon P in the animation data, the difference from no emotion in “joy” (X ₁ , Y ₁ , Z ₁ ) and the difference from no emotion in “sadness” (X ₂ , Y ₂ , Z ₂ ), (X ₃ , Y ₃ , Z ₃ ) the difference from no emotion in “anger”, and (X ₄ , Y ₄ , Z ₄ ) the difference from no emotion in “disgust” , The difference from feelingless in “fear” (X ₅ , Y ₅ , Z ₅ ), the difference from feelings in “sin” (X ₆ , Y ₆ , Z ₆ ), Difference (X ₇ , Y ₇ , Z ₇ ), difference with no emotion in (X ₈ , Y ₈ , Z ₈ ), difference with no emotion in “surprise” (X ₉ , Y ₉ , Z ₉ ).

そして、ステップＳ２５で生成された感情データ２８０に応じて、ステップＳ２４で生成されたアニメーションデータ２７０を合成変換し、キャラクタの全身動作または表情変化に感情を付与する。すなわち、アニメーションデータ２７０におけるＰの座標を（Ｘ_０，Ｙ_０，Ｚ_０）とすると、感情が付与されたアニメーションデータ２７０におけるＰの座標（Ｘ_Ｅ，Ｙ_Ｅ，Ｚ_Ｅ）は、次式で表わされる。 Then, in accordance with the emotion data 280 generated in step S25, the animation data 270 generated in step S24 is synthesized and converted to give emotion to the whole body motion or expression change of the character. That is, assuming that the coordinates of P in the animation data 270 are (X ₀ , Y ₀ , Z ₀ ), the coordinates (X _E , Y _E , Z _E ) of P in the animation data 270 to which the emotion is given are as follows: Represented.

次に、ＰＣ２４０において、ステップＳ２３で判定された台詞の開始時刻および終了時刻２６０−２に基づいて、ステップＳ２４またはステップＳ２６で生成されたアニメーションデータ２７０を変換し、アニメーションデータ２７０の時間長を調整する（ステップＳ２７）。本実施形態では、キャラクタの一連の全身動作または表情変化におけるアニメーションデータ内のボーンまたはポリゴンの空間座標を「開始部」「主要部」「収束部」等の動きの内容に応じて予め分割して、それらの境界となる時刻をキーフレームと指定し、キーフレーム間の遷移が規定された「アニメーショングラフ」を用いて、アニメーションデータ２７０を変換する。 Next, the PC 240 converts the animation data 270 generated in step S24 or S26 based on the line start time and end time 260-2 determined in step S23, and adjusts the time length of the animation data 270. (Step S27). In this embodiment, spatial coordinates of bones or polygons in animation data in a series of whole body motions or facial expression changes of characters are divided in advance according to the contents of movement such as “start part”, “main part”, “convergence part”, etc. The animation data 270 is converted by using an “animation graph” in which transitions between the key frames are specified by specifying the time as a boundary between them as key frames.

本明細書では、一例として、アニメーショングラフを用いてアニメーションデータ２７０の時間長を調整するが、本発明の技術的思想は、アニメーショングラフに限定されるわけではなく、機械学習やHidden Markov Model等の確率モデルも用いてアニメーションデータ２７０の時間長を調整しても良い。まず、ステップＳ２４で用意されたアニメーションデータベースにおける全てのアニメーションデータに対して、予め「開始部」「主要部」「収束部」を定めてキーフレームを指定し、上述したアニメーショングラフを用意する。 In this specification, as an example, the animation graph is used to adjust the time length of the animation data 270. However, the technical idea of the present invention is not limited to the animation graph, and machine learning, Hidden Markov Model, etc. The time length of the animation data 270 may be adjusted using a probability model. First, with respect to all animation data in the animation database prepared in step S24, “start part”, “main part” and “convergence part” are defined in advance to designate key frames, and the animation graph described above is prepared.

次に、ステップＳ２３で判定された台詞の開始時刻および終了時刻よりステップＳ２３で分割された台詞のテキストデータ２６０−１の時間長を求める。そして、ステップＳ２４またはステップＳ２６で生成されたアニメーションデータ２７０の時間長に、アニメーションデータ２７０における「主要部」の時間長を、台詞の時間長との差分が最も小さくなるように繰り返し加え、繰り返し回数を保持する。最後に、アニメーションデータ２７０の「収束部」の直前に、保持された繰り返し回数分だけ「主要部」のアニメーションデータを挿入する形で、アニメーションデータ２７０を再生成する。本明細書では、一例として、アニメーションデータにおける「主要部」を繰り返して時間長を調整するが、本発明の技術的思想は、「主要部」の繰り返しに限定されるわけではなく、アニメーションデータの速度制御や「開始部」「収束部」の切り捨て等でアニメーションデータ２７０の時間長を調整しても良い。 Next, the time length of the line text data 260-1 divided in step S23 is obtained from the line start time and end time determined in step S23. Then, the time length of the “main part” in the animation data 270 is repeatedly added to the time length of the animation data 270 generated in step S24 or step S26 so that the difference from the time length of the dialogue is minimized, and the number of repetitions Hold. Finally, the animation data 270 is regenerated in such a manner that the animation data of the “main part” is inserted by the number of repetitions held immediately before the “convergence part” of the animation data 270. In this specification, as an example, the “main part” in the animation data is repeated to adjust the time length, but the technical idea of the present invention is not limited to the repetition of the “main part”. The time length of the animation data 270 may be adjusted by speed control or truncation of the “starting part” and “convergence part”.

次に、ＰＣ２４０において、ステップＳ２３で判定された台詞の開始時刻および終了時刻２６０−２に応じて、音声データ２５０における音声信号ステップＳ２７で生成されたアニメーションデータ２７０における画像信号を生成する（ステップＳ２８）。まず、音声データ２５０の再生を開始して音声信号の生成を開始すると共に、キャラクタの描画を開始して画像信号の生成を開始する。次に、音声データ２５０の再生時刻がステップＳ２３で判定された台詞の開始時刻に到達すると、ステップＳ２７で生成されたアニメーションデータ２７０を反映したキャラクタの描画を開始して画像信号を生成する。 Next, the PC 240 generates an image signal in the animation data 270 generated in the audio signal step S27 in the audio data 250 in accordance with the line start time and end time 260-2 determined in step S23 (step S28). ). First, reproduction of the audio data 250 is started to start generation of an audio signal, and drawing of a character is started to start generation of an image signal. Next, when the reproduction time of the audio data 250 reaches the start time of the line determined in step S23, drawing of the character reflecting the animation data 270 generated in step S27 is started to generate an image signal.

一方、音声データ２５０の再生時刻がステップＳ２３で判定された台詞の終了時刻に到達すると、アニメーションデータ２７０が反映されない標準のキャラクタの描画に戻り画像信号を生成する。時系列で記録された全ての台詞の開始時刻および終了時刻２６０−２に対して、同様の処理を繰り返し、アニメーションデータ２７０における画像信号を生成する。最後の台詞の開始時刻および終了時刻２６０−２に対しての処理が終了した後、マイクロフォン２１０により入力される音声データ２５０が続く場合は、ステップＳ２１に戻り、次の音声データ２５０に対しての処理を開始する。 On the other hand, when the reproduction time of the audio data 250 reaches the end time of the line determined in step S23, the process returns to drawing a standard character that does not reflect the animation data 270 and generates an image signal. The same processing is repeated for the start time and end time 260-2 of all dialogues recorded in time series, and an image signal in the animation data 270 is generated. When the voice data 250 input by the microphone 210 continues after the processing for the last dialogue start time and end time 260-2 is completed, the process returns to step S21, and the next voice data 250 is processed. Start processing.

最後に、音声データ２５０をスピーカ２２０から放射すると共に、生成されたキャラクタの画像信号をディスプレイ２３０に映写する（ステップＳ２９）。 Finally, the audio data 250 is radiated from the speaker 220, and the generated image signal of the character is projected on the display 230 (step S29).

図１０は、第２の実施形態に係るデータフォーマットを示す図である。このように、第２の実施形態によれば、台詞として入力された音声信号を時系列で記録した音声データ２５０に基づいて台詞のテキストデータおよび台詞の開始時刻および終了時刻を時系列で生成する。また、生成された台詞のテキストデータに基づいて台詞の開始時刻および終了時刻毎にキャラクタのアニメーションデータおよび感情データを時系列で生成する。さらに、生成された感情データに基づいてアニメーションデータに感情を付与すると共に生成された台詞の開始時刻および終了時刻に基づいてアニメーションデータの時間長を調整する。そして、ＰＣ２４０は、生成された台詞の開始時刻および終了時刻に応じて、音声データにおける音声信号を随時スピーカ２２０に発信する。一方、ＰＣ２４０は、アニメーションデータにおける画像信号を随時ディスプレイ２３０に発信する。 FIG. 10 is a diagram illustrating a data format according to the second embodiment. As described above, according to the second embodiment, the text data of speech and the start time and end time of speech are generated in time series based on the speech data 250 in which speech signals input as speech are recorded in time series. . Also, animation data and emotion data of the character are generated in time series for each start time and end time of the line based on the generated line text data. Further, emotion is given to the animation data based on the generated emotion data, and the time length of the animation data is adjusted based on the start time and end time of the generated dialogue. And PC240 transmits the audio | voice signal in audio | voice data to the speaker 220 at any time according to the start time and end time of the produced | generated dialog. On the other hand, the PC 240 transmits an image signal in the animation data to the display 230 as needed.

これにより、第２の実施形態では、キャラクタの台詞が動的に生成される場合であっても、画像信号と音声信号の同期ズレを解消して、キャラクタを生成することができる。また、アニメーションデータとしてキャラクタの任意のボーンまたはポリゴンの空間座標を制御した画像信号を生成するので、台詞に応じた複雑なアニメーションを持つキャラクタを生成することができる。 Thereby, in 2nd Embodiment, even if it is a case where the dialog of a character is produced | generated dynamically, the synchronization shift | offset | difference of an image signal and an audio | voice signal can be eliminated, and a character can be produced | generated. In addition, since an image signal in which spatial coordinates of an arbitrary bone or polygon of the character is controlled is generated as animation data, a character having a complicated animation corresponding to the dialogue can be generated.

以上説明したように、本発明によれば、任意の情報をＷＥＢサイトから抽出して入力し、複数の入力データにおける感情または主観度を解析し、解析された複数の入力データに含まれる感情または主観度の動的特徴量に基づいて時間的アニメーションを生成し、解析された前記複数の入力データに含まれる感情または主観度の静的特徴量に基づいて空間的アニメーションを生成し、入力データを提示すると共に、生成されたキャラクタのアニメーションを再生するので、入力データがＳＮＳのコメント群のような砕けたテキストデータであっても、適切なアニメーションを持つキャラクタにより提示することができる。また、複数の入力データを包括的に解析すれば、コメント群を反映したニュース本文の提示のように、集約されたデータを踏まえたアニメーションを持つキャラクタによる情報提示が可能となる。さらに、キャラクタのアニメーションとして、キャラクタが全身動作または表情変化する際の、任意の時刻におけるキャラクタの任意のボーンまたはポリゴンの空間座標を制御すれば、テレビ局が制作する本当のニュース番組に近いコンテンツを提供することができる。 As described above, according to the present invention, arbitrary information is extracted from a WEB site and input, emotion or subjectivity in a plurality of input data is analyzed, and emotions included in the analyzed plurality of input data or A temporal animation is generated based on the dynamic feature quantity of the subjectivity level, and a spatial animation is generated based on the static feature quantity of emotion or subjectivity included in the plurality of input data analyzed, and the input data is Since the animation of the generated character is reproduced while being presented, even if the input data is broken text data such as a comment group of SNS, it can be presented by a character having an appropriate animation. Moreover, if a plurality of input data are comprehensively analyzed, information can be presented by a character having an animation based on the aggregated data, as in the case of presenting a news body reflecting a comment group. In addition, if the character's animation controls the spatial coordinates of any bone or polygon of the character at any time when the character changes its whole body motion or facial expression, it provides content close to the real news program produced by the TV station can do.

１０ニュース抽出サーバ
１０−１ニュース抽出サーバ
２０スピーカ
３０ディスプレイ
４０ＰＣ
４０−１入力データ解析部
４０−２時間的アニメーション生成部
４０−３空間的アニメーション生成部
４０−４キャラクタ再生部
４０ａケーブル
５０ニュースデータ
６０コメントデータ
７０−１時間的アニメーションデータ
７０−２空間的アニメーションデータ
８０感情データ
９０主観度
２１０マイクロフォン
２２０スピーカ
２３０ディスプレイ
２４０ＰＣ
２４０−１台詞テキスト生成部
２４０−２台詞時刻判定部
２４０−３アニメーション生成部
２４０−４感情生成部
２４０−５感情付与部
２４０−６時間長調整部
２４０−７キャラクタ再生部
２４０ａケーブル
２５０音声データ
２６０−１テキストデータ
２６０−２開始時刻および終了時刻
２７０アニメーションデータ
２８０感情データ
10 News Extraction Server 10-1 News Extraction Server 20 Speaker 30 Display 40 PC
40-1 Input Data Analysis Unit 40-2 Temporal Animation Generation Unit 40-3 Spatial Animation Generation Unit 40-4 Character Playback Unit 40a Cable 50 News Data 60 Comment Data 70-1 Temporal Animation Data 70-2 Spatial Animation Data 80 Emotion data 90 Subjectivity 210 Microphone 220 Speaker 230 Display 240 PC
240-1 line text generation unit 240-2 line time determination unit 240-3 animation generation unit 240-4 emotion generation unit 240-5 emotion assignment unit 240-6 time length adjustment unit 240-7 character reproduction unit 240a cable 250 voice data 260-1 Text data 260-2 Start time and end time 270 Animation data 280 Emotion data

Claims

An animation generation device that generates an animation of a character based on arbitrary information,
A temporal animation generation unit that generates a temporal animation based on a dynamic feature quantity of emotion or subjectivity included in any of a plurality of input data analyzed;
An animation generation apparatus comprising: a spatial animation generation unit configured to generate a spatial animation based on a static feature quantity of emotion or subjectivity included in the analyzed plurality of input data.

The animation generation apparatus according to claim 1, further comprising an information extraction unit that extracts the input data from a WEB (World Wide Web) site.

The animation generation apparatus according to claim 1, further comprising an input data analysis unit that analyzes emotion or subjectivity in the plurality of input data.

The animation generation apparatus according to any one of claims 1 to 3, further comprising a character reproduction unit that presents the input data and reproduces the animation of the generated character.

The animation generation apparatus according to claim 1, further comprising an audio recording unit that acquires audio of the input data read by the narrator.

The animation generation apparatus according to claim 1, further comprising a voice synthesis unit that synthesizes voice data corresponding to the input data.

The animation generation apparatus according to claim 1, further comprising an audio reproduction unit that reproduces the acquired voice or the synthesized voice together with the animation of the generated character. .

8. The speech time determination unit according to claim 1, further comprising: a speech time determination unit that detects a voiced section of the speech of the input data and determines a start time of the character's speech and an end time of the speech. An animation generating apparatus according to claim 1.

9. The animation generation apparatus according to claim 8, further comprising a time length adjustment unit that converts the animation data in accordance with a start time of the dialogue and an end time of the dialogue.

2. The animation according to claim 1, wherein the animation is a spatial coordinate of an arbitrary bone or polygon of the character at an arbitrary time when the character changes its whole body motion or expression. The animation generation device according to any one of 9.

An animation generation method for generating an animation of a character based on arbitrary information,
Generating temporal animation based on emotion or subjectivity dynamic features included in any of the analyzed multiple input data;
And generating at least one spatial animation based on a static feature quantity of emotion or subjectivity included in the plurality of input data analyzed.

A program for an animation generation device that generates an animation of a character based on arbitrary information,
A process of generating temporal animation based on emotion or subjectivity dynamic features included in any of a plurality of analyzed input data;
A program for causing a computer to execute a series of processes of generating a spatial animation based on a static feature quantity of emotion or subjectivity included in the analyzed plurality of input data.