JP2007026090A

JP2007026090A - Video preparation device

Info

Publication number: JP2007026090A
Application number: JP2005207307A
Authority: JP
Inventors: Yoshiki Yamaji; 可城山地
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2005-07-15
Filing date: 2005-07-15
Publication date: 2007-02-01

Abstract

<P>PROBLEM TO BE SOLVED: To provide a video preparation device capable of preparing a moving image in which a human being is converted to a character for instance and not only just hiding the face of a photographing object but also preparing interesting video images. <P>SOLUTION: In a video content preparation device 14, the face image data 26 of source information are inputted in an image input part 30, feature point data 42 which are deformation information are generated from the supplied face image data 40 in a face feature extraction part 34, character image data 24 to be a reference are deformed corresponding to the deformation information in a character deformation part 36, and composite image data 46 for which background image data 38 supplied from the image input part 28 and the deformed character image data 44 are combined are generated as video contents in an image combining part 32. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、映像作成装置に関するものである。本発明の映像作成装置は、とくに、キャラクタ画像を合成したビデオコンテンツを作成するシステムに関する。 The present invention relates to a video creation device. The video creation apparatus according to the present invention particularly relates to a system for creating video content by combining character images.

ネットワークの普及にともなって、さまざまな内容のコンテンツを配信するサービスが実施されている。通信技術の向上により配信するコンテンツも動画像が一般的になりつつある。たとえば、特許文献１のように教育用コンテンツ、特許文献２のように動画を含むニュースコンテンツが作成または配信されている。 With the spread of networks, services that distribute various contents are being implemented. Moving images are becoming common for content to be distributed due to improvements in communication technology. For example, educational content as in Patent Document 1 and news content including moving images as in Patent Document 2 are created or distributed.

特開2004−207948号公報JP 2004-207948 A 特開2002−244957号公報JP 2002-244957 A 日本国特許第3593067号Japanese Patent No. 3593067

しかしながら、現状の動画像のコンテンツは、作成のコストを下げるために、過去に用いた講演会等のビデオのように映像資源の流用が行われる。解説者・公演者の映像としては、解説用のスライド画像等の静止画像および説明用のビデオ等の動画像を組み合わせて作成されたものが多い。 However, in order to reduce the cost of creating current moving image content, video resources are diverted like videos used in lectures used in the past. The video of commentators and performers is often created by combining still images such as slide images for explanation and moving images such as videos for explanation.

このような過去の映像資源を使用する際に、映っている解説者・講演者本人自身が不特定多数に映像を見られることを嫌う場合がある。また、社内教育用ビデオコンテンツを作成する場合に解説者として社員を使用する場合には肖像権の問題が生じ、そのまま使えないことがある。実際、そのまま解説している人物を映しても、面白みに欠けるコンテンツになる可能性が高い。 When using such past video resources, there may be cases where the commentator / speaker himself / herself does not want to be able to see the video in an unspecified number. In addition, when creating an in-house educational video content and using an employee as a commentator, there is a problem of portrait rights that may not be used as is. In fact, even if a person who is explaining as it is, it is highly possible that the content will not be interesting.

本発明はこのような従来技術の欠点を解消し、たとえば人物をキャラクタに変換した動画像を作成し、撮影対象の顔を単に隠すだけでなく、面白みのある映像を作成することができる映像作成装置を提供することを目的とする。 The present invention eliminates the disadvantages of the prior art, for example, creates a moving image in which a person is converted into a character, and can not only hide the face to be photographed but also create an interesting video. An object is to provide an apparatus.

本発明は上述の課題を解決するために、第１の画像を入力する第１画像入力手段と、基準とするモデル画像における顔の部位を第１の特徴点とし、第１の特徴点の移動を反映した情報を入力する情報入力手段と、この情報を基に第１の特徴点の部位それぞれが移動した位置を第２の特徴点として解析し、第２の特徴点を変形情報として生成する解析処理手段と、モデル画像に対する第１の特徴点それぞれを変形情報に応じて移動させ変形したモデル画像を作成するモデル変形手段と、第１画像入力手段から供給される第１の画像と変形したモデル画像とを合成し合成画像を作成する画像合成手段とを含むことを特徴とする。 In order to solve the above-described problem, the present invention uses a first image input means for inputting a first image, and a facial part in a model image as a reference as a first feature point, and the movement of the first feature point Information input means for inputting information reflecting the information, and the position where each part of the first feature point has moved based on this information is analyzed as the second feature point, and the second feature point is generated as deformation information Analyzing processing means, model deforming means for creating a deformed model image by moving each first feature point for the model image according to the deformation information, and the first image supplied from the first image input means And image synthesis means for synthesizing the model image and creating a synthesized image.

本発明の映像作成装置によれば、情報入力手段で情報を入力し、解析処理手段で供給される情報から変形情報を生成し、モデル変形手段で基準とするモデル画像を変形情報に応じて変形させ、画像合成手段で第１の画像と変形したモデル画像とを合成して合成画像をビデオコンテンツとして生成することにより撮影した人物に同期して動く画像を生成し、生成した画像に対して視聴者が違和感を抱くことのないビデオコンテンツを容易に作成することができる。 According to the video creation apparatus of the present invention, information is input by the information input means, deformation information is generated from the information supplied by the analysis processing means, and the model image used as a reference is deformed according to the deformation information by the model deformation means. Then, the first image and the deformed model image are synthesized by the image synthesizing unit, and the synthesized image is generated as video content, thereby generating an image that moves in synchronization with the photographed person, and viewing the generated image It is possible to easily create video content that does not give the user a sense of incongruity.

次に添付図面を参照して本発明による映像作成装置の一実施例を詳細に説明する。 Next, an embodiment of a video creation device according to the present invention will be described in detail with reference to the accompanying drawings.

本実施例は、本発明の映像作成装置をビデオコンテンツ作成システム10に適用した場合である。本発明と直接関係のない部分について図示および説明を省略する。以下の説明で、信号はその現れる接続線の参照番号で指示する。 In this embodiment, the video creation apparatus of the present invention is applied to the video content creation system 10. The illustration and description of parts not directly related to the present invention are omitted. In the following description, the signal is indicated by the reference number of the connecting line in which it appears.

ビデオコンテンツ作成システム10は、図１に示すように、外部情報提供部12およびビデオコンテンツ作成装置14を含む。外部情報提供部12は、ストレージ部16および18、ならびにディジタルカメラ20を含む。ストレージ部16は、スライド画像や１フレームの動画像等の背景画像データを格納し、読出しに応じて出力する機能を有する。ストレージ部18は、キャラクタ画像データを格納し、読出しに応じて出力する機能を有する。ストレージ部16および18は、格納する画像データに対して検索を可能にするキーワードが対応した組にデータベース化して記憶されている。 As shown in FIG. 1, the video content creation system 10 includes an external information providing unit 12 and a video content creation device 14. The external information providing unit 12 includes storage units 16 and 18 and a digital camera 20. The storage unit 16 has a function of storing background image data such as a slide image or a moving image of one frame, and outputting it in response to reading. The storage unit 18 has a function of storing character image data and outputting it in response to reading. The storage units 16 and 18 are stored as a database in a set corresponding to a keyword that enables search for stored image data.

ストレージ部16は、背景画像データ22をビデオコンテンツ作成装置14に出力する。また、ストレージ部18もキャラクタ画像データ24をビデオコンテンツ作成装置14に出力する。ディジタルカメラ20は、とくに、被写体として顔を撮影し、得られた顔画像データを出力する機能を有する。ディジタルカメラ20は、撮影した顔画像データ26をビデオコンテンツ作成装置14に出力する。ディジタルカメラ20は、静止画に限定されることなく、動画撮影、すなわちムービー撮影可能なディジタルビデオカメラでも構わない。 The storage unit 16 outputs the background image data 22 to the video content creation device 14. The storage unit 18 also outputs the character image data 24 to the video content creation device 14. In particular, the digital camera 20 has a function of photographing a face as a subject and outputting the obtained face image data. The digital camera 20 outputs the captured face image data 26 to the video content creation device 14. The digital camera 20 is not limited to a still image, and may be a digital video camera capable of moving image shooting, that is, movie shooting.

ビデオコンテンツ作成装置14は、画像入力部28および30、画像合成部32、顔特徴抽出部34、ならびにキャラクタ変形部36を含む。画像入力部28は、供給される背景画像データ22をビデオコンテンツ作成装置14内に取り込む入力インタフェース機能を有する。画像入力部28は、背景画像データ38を画像合成部32に出力する。また、画像入力部30は、供給される顔画像データ26をビデオコンテンツ作成装置14内に取り込む入力インタフェース機能を有する。画像入力部30は、顔画像データ40を顔特徴抽出部34に出力する。 The video content creation device 14 includes image input units 28 and 30, an image composition unit 32, a face feature extraction unit 34, and a character transformation unit 36. The image input unit 28 has an input interface function for taking the supplied background image data 22 into the video content creation device 14. The image input unit 28 outputs the background image data 38 to the image composition unit 32. Further, the image input unit 30 has an input interface function for taking the supplied face image data 26 into the video content creation device 14. The image input unit 30 outputs the face image data 40 to the face feature extraction unit 34.

顔特徴抽出部34は、供給される顔画像データ40から顔の部位に対応してあらかじめ設けた特徴点の位置を検出する機能を有する。顔特徴抽出部34は、たとえば一つ前の情報を基に得られた特徴点に対して現在供給された情報から移動した特徴点の位置を検出する。顔特徴抽出部34は、位置検出として求めてもよいし、距離および方向を有するベクトル情報として求めてもよい。顔特徴抽出部34は、顔画像データ40から得た特徴点データ42をキャラクタ変形部36に出力する。 The face feature extraction unit 34 has a function of detecting the position of a feature point provided in advance corresponding to the face part from the supplied face image data 40. The face feature extraction unit 34 detects, for example, the position of the feature point that has moved from the currently supplied information with respect to the feature point obtained based on the previous information. The face feature extraction unit 34 may obtain the position detection or may obtain the vector information having the distance and the direction. The face feature extraction unit 34 outputs the feature point data 42 obtained from the face image data 40 to the character deformation unit 36.

キャラクタ変形部36は、ストレージ部18から読み出したキャラクタ画像データ24に含まれる特徴点データを顔の部位に対応するそれぞれの特徴点データ42にあわせてたとえば、位置データを置き換える機能を有する。この特徴点は、この置換機能によりキャラクタ画像の特徴点を移動させることができる。この移動は、すなわちキャラクタ画像の顔を変形させることである。キャラクタ変形部36は、変形させたキャラクタ画像データ44を画像合成部32に出力する。 The character transformation unit 36 has a function of replacing, for example, position data with the feature point data included in the character image data 24 read from the storage unit 18 in accordance with each feature point data 42 corresponding to the facial part. This feature point can be moved by this replacement function. This movement is to deform the face of the character image. The character deforming unit 36 outputs the deformed character image data 44 to the image composition unit 32.

画像合成部32は、キャラクタ画像データ44を用いて２次元や３次元のキャラクタ画像になるようにレンダーする機能を有する。また、画像合成部32はレンダーしたキャラクタ画像と背景画像とを合成する機能も有する。画像合成部32は、合成した画像データ46を作成したビデオコンテンツとして出力する。 The image composition unit 32 has a function of rendering using character image data 44 so as to become a two-dimensional or three-dimensional character image. The image composition unit 32 also has a function of compositing the rendered character image and the background image. The image composition unit 32 outputs the synthesized image data 46 as created video content.

次にビデオコンテンツ作成システム10の動作について図２、図３および図４を参照しながら説明する。図２に示すように、背景画像データ22の一例としてスライド画像データを読み出す。また、ディジタルカメラ20は、被写体48の顔画像を撮影し取り込む。画像入力部30には、カメラ20で撮影された顔画像でもよいし、あらかじめ録画された画像でもよい。 Next, the operation of the video content creation system 10 will be described with reference to FIG. 2, FIG. 3, and FIG. As shown in FIG. 2, slide image data is read as an example of the background image data 22. The digital camera 20 captures and captures a face image of the subject 48. The image input unit 30 may be a face image taken by the camera 20 or an image recorded in advance.

画像入力部30から出力された顔画像データ40は顔特徴抽出部34でたとえば、顔の向き、目や口といった顔の各部位の開き具合や位置等を基にした特徴点データに変換する。変換した特徴点データ42は、キャラクタ変形部36に供給される。キャラクタ変換部36では、供給された特徴点データ42と同期するようにキャラクタ画像を変形させる。作成したキャラクタ画像データ44を画像合成部32に出力する。 The face image data 40 output from the image input unit 30 is converted into feature point data based on, for example, the face orientation, the degree of opening of each part of the face such as eyes and mouth, the position, and the like by the face feature extraction unit 34. The converted feature point data 42 is supplied to the character deforming unit 36. The character conversion unit 36 deforms the character image so as to be synchronized with the supplied feature point data 42. The created character image data 44 is output to the image composition unit 32.

ここで、顔画像から特徴点データを作成する方法は特許文献３を利用すればよい。 Here, Patent Document 3 may be used as a method of creating feature point data from a face image.

画像合成部32では、キャラクタ変形部36で作成したキャラクタ画像データ44を基にしてレンダリングし、生成したキャラクタ画像と画像入力部28から供給される背景画像データ38とを合成する。これにより、最終的に作成されるビデオコンテンツの１フレームの合成画像データ46が作成される。 The image composition unit 32 performs rendering based on the character image data 44 created by the character transformation unit 36, and composes the generated character image and the background image data 38 supplied from the image input unit 28. As a result, one frame of composite image data 46 of the video content to be finally created is created.

ここで、ビデオコンテンツ作成システム10は、合成画像中のキャラクタの大きさ・位置・向きを変更でき、合成画像を表示する画面一杯にキャラクタ画像を表示することも可能である。また、ビデオコンテンツ作成システム10は、撮影対象が画面の中央に存在しなくても、キャラクタの位置を中央に設定できる機能を有する。このような機能を適用すると、ホワイトボードと公演者の全体を撮影している講習会のビデオからでも適切な位置にキャラクタを表示する合成画像を容易に作成できる。 Here, the video content creation system 10 can change the size, position, and orientation of the character in the composite image, and can display the character image on the full screen for displaying the composite image. In addition, the video content creation system 10 has a function that allows the character position to be set to the center even if the shooting target does not exist at the center of the screen. By applying such a function, it is possible to easily create a composite image in which a character is displayed at an appropriate position even from a video of a workshop where the entire whiteboard and the performer are filmed.

さらに、ビデオコンテンツの作成を簡単にするために、図３に示すように、コンテンツ編集ツールにおける編集画面50を利用すると便利である。コンテンツ編集ツールは、画像入力部28および30の画像入力における制御を視覚化したツールである。この制御は、図示しない制御部により制御される。具体的に、コンテンツ編集ツールでは、画像入力部28に供給される画像を図３に示すように画像１とし、画像入力部30に供給される画像を画像２とする。画像１および２に供給される画像は、それぞれ枠52および54内に入力画像または動画像の対象を明示し、最初から入力する時間を指定する。編集画面50は、この時間指定に応じた時間範囲をたとえば矢印で表わしてもよい。コンテンツ編集ツールは、画像１と画像２に入力される画像を時間指定に応じて制御するとよい。 Furthermore, in order to simplify the creation of video content, it is convenient to use an editing screen 50 in the content editing tool as shown in FIG. The content editing tool is a tool that visualizes control in image input of the image input units 28 and 30. This control is controlled by a control unit (not shown). Specifically, in the content editing tool, an image supplied to the image input unit 28 is set as an image 1 as shown in FIG. 3, and an image supplied to the image input unit 30 is set as an image 2. The images supplied to the images 1 and 2 clearly indicate the target of the input image or moving image in the frames 52 and 54, respectively, and specify the input time from the beginning. The edit screen 50 may represent a time range corresponding to this time designation, for example, with an arrow. The content editing tool may control the images input to the images 1 and 2 according to the time designation.

キャラクタ変形部36は、作成されるキャラクタ画像に透過情報を持たせてもよい。透過情報とは、キャラクタ画像におけるキャラクタ以外の部分、たとえば図4(a)に示すように、背景領域56がどの範囲にあるかを示す情報である。この透過情報を参照しながら、画像入力部28に入力される背景画像にキャラクタ画像だけを合成すると、図4(b)に示すビデオコンテンツ画像を容易に作成できる。キャラクタ以外の入力情報をそのままに、キャラクタ画像を変更すると、表示されるキャラクタだけを簡単に変更することもできる。子供向けにはアニメキャラなどコンテンツの対象に合わせてキャラクタを設定することで、より適したコンテンツを作成することができる。 The character transformation unit 36 may give transparency information to the created character image. The transmission information is information indicating a range other than the character in the character image, for example, the range of the background region 56 as shown in FIG. 4 (a). When only the character image is synthesized with the background image input to the image input unit 28 while referring to the transmission information, the video content image shown in FIG. 4B can be easily created. If the character image is changed with the input information other than the character as it is, only the displayed character can be easily changed. For children, more suitable content can be created by setting characters according to the content target, such as anime characters.

このような処理を繰り返すことによりフレーム毎に合成画像データ46を作成し、複数の合成画像から動画像を作成したり、合成画像１枚ずつスートリームとして配信したりすることで、所望のビデオコンテンツを容易に作成することができる。すなわち、撮影した人物の代わりに人物に同期したキャラクタによるビデオコンテンツの作成をリアルタイムに処理することが可能になる。これにより顔を隠したビデオコンテンツの即時配信も可能になる。 By repeating such processing, the composite image data 46 is created for each frame, a moving image is created from a plurality of composite images, or a composite image is delivered as a stream one by one, so that desired video content can be obtained. Can be easily created. That is, it is possible to process in real time the creation of video content by a character synchronized with a person instead of a photographed person. This makes it possible to immediately deliver video content with hidden faces.

ビデオコンテンツ作成システム10におけるキャラクタの変形は、顔画像に限定されるものでなく、音、または音声、ならびに特定データを用いてもよい。以下のビデオコンテンツ作成システム10は、外部情報提供部12およびビデオコンテンツ作成装置14の一部構成要素を他の構成要素で置き換える。同じ共通する構成要素には先の実施例で用いた参照符号を付す。同じ説明の繰返しによる煩雑さを回避するため共通する構成要素については、説明を省略する。 The deformation of the character in the video content creation system 10 is not limited to a face image, and sound or voice and specific data may be used. The video content creation system 10 described below replaces some components of the external information providing unit 12 and the video content creation device 14 with other components. The same common components are denoted by the reference symbols used in the previous embodiments. In order to avoid complications due to repetition of the same description, description of common components will be omitted.

図５に示すビデオコンテンツ作成システム10には、音を基にキャラクタを変形させる場合、外部情報提供部12には共通する構成要素とともに、マイクロフォン58が配設される。マイクロフォン58はカメラ18と置き換えられている。マイクロフォン58は収集した音をアナログ信号に変換する機能を有する。マイクロフォン58は変換したアナログ信号60をビデオコンテンツ作成装置14に出力する。 In the video content creation system 10 shown in FIG. 5, when the character is deformed based on sound, the external information providing unit 12 is provided with a microphone 58 together with common components. The microphone 58 is replaced with the camera 18. The microphone 58 has a function of converting the collected sound into an analog signal. The microphone 58 outputs the converted analog signal 60 to the video content creation device 14.

ビデオコンテンツ作成装置14は、共通する構成要素とともに、音声入力部62および音声認識部64を含む。音声入力部62は、供給されるアナログ信号60を入力するインタフェースを有し、入力したアナログ信号60をディジタル信号66に変換する機能を有する。本実施例において、供給されるアナログ信号60は、音声信号である。音声入力部62は、音声認識部64にディジタル信号66を出力する。 The video content creation device 14 includes an audio input unit 62 and an audio recognition unit 64 along with common components. The audio input unit 62 has an interface for inputting the supplied analog signal 60 and has a function of converting the input analog signal 60 into a digital signal 66. In the present embodiment, the supplied analog signal 60 is an audio signal. The voice input unit 62 outputs a digital signal 66 to the voice recognition unit 64.

音声認識部64は、供給されるディジタル信号66を基に音声を認識する機能を有する。音声認識部64は、公知技術である音声認識を使用し、とくに、母音を認識する。音声認識部64は、さらに顔部位における対象を口にして、認識した母音の口形状を変形させる特徴点データ68を生成する。音声認識部64は、特徴点データ68をキャラクタ変形部36に出力する。 The voice recognition unit 64 has a function of recognizing voice based on the supplied digital signal 66. The voice recognition unit 64 uses voice recognition, which is a known technique, and particularly recognizes vowels. The voice recognizing unit 64 further generates feature point data 68 for deforming the mouth shape of the recognized vowel using the target in the facial part as the mouth. The voice recognition unit 64 outputs the feature point data 68 to the character transformation unit 36.

本実施例では、音声に同期してキャラクタを変形させて、とくに、口が動くキャラクタ画像を合成することができる。音声認識部68は、口以外の顔部位に対してあらかじめ作成した特徴点データをキャラクタ変形部36に出力するとよい。ここでの動作例としては、頭を揺らす、定期的に瞬きをさせる等がある。これを供給することで、合成した画像に対してより自然に動かすことができる。 In this embodiment, the character can be deformed in synchronization with the voice, and in particular, a character image with a moving mouth can be synthesized. The voice recognizing unit 68 may output feature point data created in advance for a facial part other than the mouth to the character deforming unit 36. Examples of operations here include shaking the head and periodically blinking. By supplying this, the synthesized image can be moved more naturally.

本実施例のビデオコンテンツ作成システム10は、ラジオなどのように音声しかないコンテンツからでも、あたかもテレビを見ているようなビデオコンテンツを作成することができる。このとき画像入力部28に入力する画像は景色など普通の背景でもよいが、英会話に使用するならばテキストの内容が書かれたスライドを入力するとより分かりやすいビデオコンテンツを作成できるようになる。 The video content creation system 10 of the present embodiment can create video content as if watching television even from content such as radio that has only audio. At this time, the image input to the image input unit 28 may be an ordinary background such as a landscape, but if used for English conversation, a video content that is easier to understand can be created by inputting a slide in which the text content is written.

また、先の実施例、すなわち顔画像と上述した音声とを組み合わせた構成にすることにより、キャラクタの口の動きは音声、その他の動きは顔画像を基に画像を作成することよりリアルにキャラクタを動作させることができる。 Also, by combining the previous embodiment, that is, the face image and the voice described above, the movement of the mouth of the character is voice, and other movements are created more realistically by creating an image based on the face image. Can be operated.

この場合、音声と同期してキャラクタの口が動くビデオコンテンツを作成することができる。映像がなくても、音声だけでビデオコンテンツを作成することができる。 In this case, video content in which the mouth of the character moves in synchronization with the sound can be created. Even without video, video content can be created with audio alone.

図６に示すビデオコンテンツ作成システム10には、特定データを基にキャラクタを変形させる場合、外部情報提供部12には共通する構成要素とともに、操作部70が配設される。操作部70はカメラ18と置き換えられている。操作部70は、特定データとして動作を指定する指示データを生成する機能を有する。操作部70は、個々のキーやキーの組合せに応じて割り付けられた指示データ72をビデオコンテンツ作成装置14に出力してもよい。本実施例の指示データ72は、登録されている動作の種類を示す通番である。操作部70は、この通番を指示データ72として出力する。 In the video content creation system 10 shown in FIG. 6, when the character is deformed based on the specific data, the external information providing unit 12 is provided with the operation unit 70 together with the common components. The operation unit 70 is replaced with the camera 18. The operation unit 70 has a function of generating instruction data that designates an operation as specific data. The operation unit 70 may output the instruction data 72 assigned to each key or key combination to the video content creation device 14. The instruction data 72 of this embodiment is a serial number indicating the type of registered operation. The operation unit 70 outputs this serial number as instruction data 72.

ビデオコンテンツ作成装置14は、共通する構成要素とともに、動作作成部74を含む。動作作成部74は、供給される指示データ72に応じて特徴点データを生成する機能を有する。動作作成部74は、ストレージ部18からキャラクタに行なわせる動作に関する特徴点データ24を入力し、記憶または登録する。また、動作に関する特徴点データ24は、ストレージ部18にあらかじめキャラクタデータの一部として登録されている。動作作成部74は、記憶する特徴点データの中から指示データ72に応じた特徴点データ76をキャラクタ変形部36に出力する。特徴点データ76は、たとえばおじぎや笑う等、あらかじめ決まった動作をキヤラクタに行なわせるデータである。動作指定のないときに対応して、前述した実施例で示したような構成要素を備えることが望ましい。このように特徴点データを作成することでキャラクタの動きを多様化させることができる。 The video content creation device 14 includes an action creation unit 74 along with common components. The action creation unit 74 has a function of generating feature point data in accordance with the supplied instruction data 72. The action creation unit 74 inputs the feature point data 24 related to the action to be performed by the character from the storage unit 18 and stores or registers it. Further, the feature point data 24 relating to the motion is registered in advance in the storage unit 18 as part of the character data. The action creation unit 74 outputs feature point data 76 corresponding to the instruction data 72 from the stored feature point data to the character transformation unit 36. The feature point data 76 is data that causes the character to perform a predetermined action such as bowing or laughing. Corresponding to the case where the operation is not designated, it is desirable to provide the constituent elements as shown in the above-described embodiment. By creating feature point data in this way, character movement can be diversified.

ところで、前述した実施例だけでは作成されるビデオコンテンツが単調になってしまいかねないと考えられる。そこで、本実施例は、たとえばスライドの中をキャラクタが歩き回ったり、キャラクタが説明している箇所を指したり、キャラクタが喜んだりする動作を途中に挿入する。これにより、作成されるビデオコンテンツの単調さを軽減することができる。 By the way, it is considered that the video content created may become monotonous only with the above-described embodiment. Therefore, in the present embodiment, for example, an operation in which the character walks around the slide, points to a location described by the character, or the character is happy is inserted. Thereby, the monotonousness of the created video content can be reduced.

このように指定動作を含めてキヤラクタを合成すると、より複雑に効果的に演出することができる。たとえば、キャラクタに先の実施例では表現できないような喜怒哀楽を大げさに表現させた、より効果的なビデオコンテンツも作成することができる。 In this way, when the character is synthesized including the designated operation, it can be rendered more complicated and effective. For example, it is possible to create more effective video content in which characters are exaggeratedly expressed with emotions that cannot be expressed in the previous embodiment.

さらに、音を基に特定データを生成し画像合成するようにしてもよい。図７に示すビデオコンテンツ作成システム10において外部情報提供部12にはマイクロフォン58およびストレージ部78が追加する構成要素が含まれる。ストレージ部78は単語や慣用句等の登録された音声データによる言葉を格納し、読み出す機能を有する。ストレージ部78は、格納した音声データ80をビデオコンテンツ作成装置14に出力する。 Furthermore, specific data may be generated based on sound and image synthesis may be performed. In the video content creation system 10 shown in FIG. 7, the external information providing unit 12 includes components added by the microphone 58 and the storage unit 78. The storage unit 78 has a function of storing and reading words based on registered voice data such as words and idioms. The storage unit 78 outputs the stored audio data 80 to the video content creation device 14.

ビデオコンテンツ作成装置14には、共通する構成要素とともに、音声入力部62、音声比較部82および動作作成部74が含まれる。音声比較部82はディジタル信号66と読み出したデータ80とを比較し、この比較により認識した結果に応じた指示データを出力する機能を有する。音声比較部82は、指示データ84を動作作成部74に出力する。動作作成部74は、指示データ84に応じた特徴点データ76をキャラクタ変形部36に供給する。 The video content creation device 14 includes an audio input unit 62, an audio comparison unit 82, and an operation creation unit 74, along with common components. The voice comparison unit 82 has a function of comparing the digital signal 66 with the read data 80 and outputting instruction data corresponding to the result recognized by this comparison. The voice comparison unit 82 outputs the instruction data 84 to the action creation unit 74. The action creation unit 74 supplies feature point data 76 corresponding to the instruction data 84 to the character transformation unit 36.

動作を簡単に説明すると、音声比較部82で入力された音声データ66と登録されている音声データ80とを比較し、同じ言葉であると認識した場合に自動的に動作を指定する。具体的な登録するデータとしては、「これ」という音声データには指示を表していることから対象に対して指を指すよう動作させる。また、「こんにちは」という音声データにはおじぎさせる等の動作を登録する。この登録で、ビデオコンテンツ作成者の負担を小さくしながら、効果的に演出する。動作の未設定や動作指定のないときに対応して、前述した実施例で示したような構成要素を備えることが望ましい。このように特徴点データを作成することでキャラクタの動きを多様化させることができる。 The operation will be briefly described. The voice data 66 input by the voice comparison unit 82 is compared with the registered voice data 80, and when it is recognized that the words are the same, the operation is automatically designated. As specific data to be registered, since the voice data “this” indicates an instruction, it is operated to point the finger toward the object. In addition, to register the operations such as to bow to the voice data of "Hello". This registration effectively produces the content while reducing the burden on the video content creator. Corresponding to the case where no operation is set or no operation is specified, it is desirable to provide the components as shown in the above-described embodiments. By creating feature point data in this way, character movement can be diversified.

このように登録した音声データと入力された音声データとを比較することにより、ビデオコンテンツ作成者がとくに、操作部から操作しなくても、適切に動作するキャラクタを合成したビデオコンテンツを簡単に作製することができる。 By comparing the registered audio data with the input audio data in this way, video content creators can easily create video content that synthesizes characters that operate properly without any operation from the operation unit. can do.

本発明に係る映像作成装置を適用したビデオコンテンツ作成システムの概略的な構成を示すブロック図である。1 is a block diagram showing a schematic configuration of a video content creation system to which a video creation device according to the present invention is applied. 図１のビデオコンテンツ作成システムの動作原理を説明する図である。It is a figure explaining the principle of operation of the video content creation system of FIG. 図１のビデオコンテンツ作成システムにおけるコンテンツ編集ツール画面の一例である。It is an example of the content edit tool screen in the video content creation system of FIG. 図１のビデオコンテンツ作成システムにおける透過情報を適用した原理を説明する図である。It is a figure explaining the principle which applied the transparent information in the video content creation system of FIG. 図１のビデオコンテンツ作成システムにおける概略的な他の構成を示すブロック図である。It is a block diagram which shows other schematic structures in the video content creation system of FIG. 図１のビデオコンテンツ作成システムにおける概略的な他の構成を示すブロック図である。It is a block diagram which shows other schematic structures in the video content creation system of FIG. 図１のビデオコンテンツ作成システムにおける概略的な他の構成を示すブロック図である。It is a block diagram which shows other schematic structures in the video content creation system of FIG.

Explanation of symbols

10 ビデオコンテンツ作成システム
12 外部情報提供部
14 ビデオコンテンツ作成装置
16、18 ストレージ部
20 ディジタルカメラ
28、30 画像入力部
32 画像合成部
34 顔特徴抽出部
36 キャラクタ変形部 10 Video content creation system
12 External information provider
14 Video content creation device
16, 18 Storage section
20 Digital camera
28, 30 Image input section
32 Image composition part
34 Face feature extraction unit
36 Character transformation

Claims

First image input means for inputting a first image;
An information input means for inputting information reflecting the movement of the first feature point, with the face part in the model image as a reference as the first feature point;
Analysis processing means for analyzing the position where each part of the first feature point has moved based on the information as a second feature point and generating the second feature point as deformation information;
Model deformation means for creating a model image obtained by moving and deforming each first feature point for the model image according to the deformation information;
An image creation apparatus comprising: an image composition unit configured to compose a first image and the deformed model image to create a composite image.

The apparatus according to claim 1, wherein the information input unit is a second image input unit that inputs a face image as a second image,
The video creation apparatus, wherein the analysis processing means is face feature extraction means for extracting a second feature point based on the face image.

The apparatus according to claim 1 or 2, wherein the information input means is a sound input means for inputting a surrounding sound,
The video creation apparatus, wherein the analysis processing means is data creation means for recognizing an input sound and creating a second feature point according to the recognition.

4. The apparatus according to claim 1, wherein the apparatus receives specified information, registers a second feature point to be used corresponding to the information, and converts the registered second feature point into the model deformation. An image creating apparatus comprising an operation creating means for outputting to the means.

The apparatus according to claim 1, wherein the apparatus compares a sound input means for inputting ambient sound to the information input means, a sound registered in advance and a sound output from the sound input means, and compares them. And means for outputting information corresponding to the matched sound,
Further, the apparatus includes a second feature point used corresponding to the information in advance, and an action creating unit that outputs the registered second feature point to the model deforming unit. Creation device.