JP2020160341A

JP2020160341A - Video output system

Info

Publication number: JP2020160341A
Application number: JP2019061491A
Authority: JP
Inventors: 扇間　敬幸; Atsuyuki Senma; 敬幸扇間
Original assignee: Daikoku Denki Co Ltd
Current assignee: Daikoku Denki Co Ltd
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2020-10-01
Anticipated expiration: 2039-03-27
Also published as: JP6656447B1

Abstract

To provide a video output system capable of expressing a person such that a viewer can feel the person closer even if it is difficult to obtain information on the person.SOLUTION: A digital portrait 1 comprises: a video generation unit that applies image processing to a static image, which is an image of a captured human face, to generate a video accompanied with eye blinks, mouth opening/closing, expression changes, or the like: and a voice conversion unit to change tone by processing a voice input through an external microphone. The voice is output together with the video generated by the video generation unit to change in synchronization with the voice.SELECTED DRAWING: Figure 1

Description

本発明は、動画出力システムに関する。 The present invention relates to a moving image output system.

身近な人物であれば実際にコミュニケーションをとることが可能である一方、故人や有名人等が対象である場合、実際のコミュニケーションをとることは難しい。例えば下記の特許文献１には、故人や有名人など実際のコミュニケーションが困難な対象人物に関する情報をホームページサーバに登録しておき、人物毎のホームページの閲覧により対象人物に関する情報をインターネット上で閲覧可能とする技術的思想が記載されている。 While it is possible to actually communicate with a person close to you, it is difficult to actually communicate with a deceased person or a celebrity. For example, in Patent Document 1 below, information on a target person such as a deceased person or a celebrity who has difficulty in actual communication is registered in a homepage server, and information on the target person can be browsed on the Internet by browsing the homepage of each person. The technical idea to do is described.

特開２００９−１８７５１４号公報JP-A-2009-187514

しかしながら、例えば対象人物が故人であれば生前に得た情報しか入手できず、有名人であれば一般に公開されている写真などの汎用的な情報以外は入手が難しいため、ホームページで閲覧可能な情報が限られた情報のみとなり、対象人物を身近に感じることが難しいという問題がある。 However, for example, if the target person is a deceased person, only the information obtained during his lifetime can be obtained, and if it is a celebrity, it is difficult to obtain information other than general-purpose information such as photographs that are open to the public. There is a problem that it is difficult to feel close to the target person because only limited information is available.

本案は上記事情に鑑み、情報入手が難しい人物であっても、見る側がより身近に感じられるように表現できる動画出力システムを提供することを目的とするものである。 In view of the above circumstances, the purpose of this proposal is to provide a moving image output system that can express a person whose information is difficult to obtain so that the viewer can feel closer to him / her.

本発明の動画出力システムは、集音マイクを介して入力された音声を加工し、声色及び話し方の少なくともいずれかが異なる音声に変換して出力可能である。さらに、この動画出力システムは、人の顔が撮像された静止画を元にして音声に同期した動画を生成し、音声と共に動画を出力可能である。 The moving image output system of the present invention can process a voice input via a sound collecting microphone, convert it into a voice having at least one of a different voice color and speaking style, and output it. Further, this moving image output system can generate a moving image synchronized with the sound based on a still image of a human face, and output the moving image together with the sound.

本発明の動画出力システムによれば、入力する音声次第で、多様な意味内容の音声を出力可能である。したがって、この動画出力システムによれば、限られた情報しか入手できない人物であっても、多様な情報を含む音声を動画と共に出力可能である。このように動画と共に音声を出力すれば、見る側にとって、その人物をより身近に感じられるようになる。 According to the moving image output system of the present invention, it is possible to output audio having various meanings depending on the input audio. Therefore, according to this moving image output system, even a person who can obtain only limited information can output audio including various information together with the moving image. By outputting the audio together with the video in this way, the viewer can feel the person closer.

このように本発明の動画出力システムは、情報入手が難しい人物であっても、見る側がより身近に感じられるように表現できる優れた特性の動画出力システムである。 As described above, the moving image output system of the present invention is a moving image output system having excellent characteristics that can be expressed so that the viewer can feel closer to the viewer even if the person has difficulty in obtaining information.

実施例１における、デジタルポートレート（動画出力システム）の斜視図。The perspective view of the digital portrait (video output system) in Example 1. FIG. 実施例１における、デジタルポートレートの運用状況の説明図。The explanatory view of the operation state of the digital portrait in Example 1. FIG. 実施例１における、デジタルポートレートの電気的構成を示すブロック図。The block diagram which shows the electrical structure of the digital portrait in Example 1. FIG. 実施例１における、特徴点抽出処理の説明図。The explanatory view of the feature point extraction process in Example 1. 実施例１における、故人をモデル化する手順を示すフロー図。The flow chart which shows the procedure of modeling a deceased person in Example 1. FIG. 実施例１における、話者情報抽出処理の流れを示すフロー図。The flow chart which shows the flow of the speaker information extraction processing in Example 1. FIG. 実施例１における、動画生成処理、音声変換処理の流れを示すフロー図。The flow chart which shows the flow of moving image generation processing, audio conversion processing in Example 1. 実施例２における、話者情報抽出処理の流れを示すフロー図。The flow chart which shows the flow of the speaker information extraction processing in Example 2. 実施例２における、音声変換処理の流れを示すフロー図。The flow chart which shows the flow of the voice conversion processing in Example 2. 実施例４における、動画出力システムの説明図。The explanatory view of the moving image output system in Example 4. 実施例５における、動画出力システムの説明図。The explanatory view of the moving image output system in Example 5. 実施例６における、動画生成処理の流れを示すフロー図。The flow chart which shows the flow of the moving image generation processing in Example 6. 実施例７における、立体表示部の説明図。The explanatory view of the stereoscopic display part in Example 7. 実施例７における、立体像を例示する説明図。Explanatory drawing which illustrates the stereoscopic image in Example 7. 実施例７における、他の立体表示部の説明図。Explanatory drawing of another stereoscopic display part in Example 7. 実施例７における、立体像を例示する説明図。Explanatory drawing which illustrates the stereoscopic image in Example 7. 実施例８における、デジタルポートレートを示す図。The figure which shows the digital portrait in Example 8. 実施例８における、立体表示部の構造の説明図。The explanatory view of the structure of the stereoscopic display part in Example 8. 実施例８における、他のデジタルポートレートを示す図。The figure which shows the other digital portrait in Example 8.

本発明の実施の形態につき、以下の実施例を用いて具体的に説明する。
（実施例１）
本例は、音声と共に動画を出力可能なデジタルポートレート１に関する例である。この内容について、図１〜図７を用いて説明する。
動画出力システムの一例をなすデジタルポートレート１は、音声付の動画による故人の遺影として利用される。デジタルポートレート１は、遺影の故人が参列者の問掛けに受け答えする高機能の出力装置として機能する。 Embodiments of the present invention will be specifically described with reference to the following examples.
(Example 1)
This example is an example relating to Digital Portrait 1 capable of outputting moving images together with audio. This content will be described with reference to FIGS. 1 to 7.
Digital portrait 1, which is an example of a moving image output system, is used as a deceased's deceased by moving image with sound. The digital portrait 1 functions as a high-performance output device in which the deceased deceased responds to the questions of attendees.

デジタルポートレート１（図１）は、漆塗の黒色のフレーム１Ｆを有し、フレーム１Ｆの内側に液晶ディスプレイ２１（図３参照。）の表示画面２１０が配置されている。デジタルポートレート１の背面側には、フレーム１Ｆよりもひと回り小さい筐体１３が設けられ、電気的な構成の収容スペースが確保されている。 Digital portrait 1 (FIG. 1) has a lacquered black frame 1F, and a display screen 210 of a liquid crystal display 21 (see FIG. 3) is arranged inside the frame 1F. On the back side of the digital portrait 1, a housing 13 that is slightly smaller than the frame 1F is provided, and a storage space for an electrical configuration is secured.

筐体１３（図１及び図２）の外周では、内蔵スピーカ２２３が左右両側の側面に設けられているほか、電源スイッチ１３１や、外部端子１３２〜１３５などが配設されている。外部端子としては、スピーカ端子１３４や、マイク端子１３５や、ＵＳＢ端子１３３や、外部モニタ端子１３２などがある。また、フレーム１Ｆの前面には、内蔵カメラ２２１、内蔵マイク２２２が埋設されている。内蔵マイク２２２は、例えば葬儀の参列者等が個人に問掛ける音声を取得するための集音マイクである。内蔵カメラ２２１は、故人に向かって問掛ける参列者を撮像するための撮像カメラである。 On the outer periphery of the housing 13 (FIGS. 1 and 2), built-in speakers 223 are provided on both left and right side surfaces, and a power switch 131, external terminals 132 to 135, and the like are arranged. Examples of the external terminal include a speaker terminal 134, a microphone terminal 135, a USB terminal 133, and an external monitor terminal 132. Further, a built-in camera 221 and a built-in microphone 222 are embedded in the front surface of the frame 1F. The built-in microphone 222 is, for example, a sound collecting microphone for acquiring a voice asking an individual by a funeral attendee or the like. The built-in camera 221 is an imaging camera for imaging attendees asking the deceased.

ＵＳＢ端子１３３には、ＵＳＢ規格のキーボードやマウスなどを接続可能である。キーボードやマウスを接続すれば、一般的なＰＣと同様にデジタルポートレート１を操作でき、各種の設定操作が可能である。設定操作の際には、黒塗りのフレーム１Ｆに囲まれた表示画面２１０をモニタとして利用することも可能である。さらに、ＵＳＢ端子１３３には、一般的なＵＳＢメモリ等の外付メモリ３９（図３）を装着可能である。ＵＳＢメモリなどの記憶媒体を利用すれば、動画の元になる故人の静止画の画像データや、故人の音声データや、デジタルポートレート１の機能を高めるためのソフトウェアなどの転送が可能である。 A USB standard keyboard, mouse, or the like can be connected to the USB terminal 133. If a keyboard or mouse is connected, the digital portrait 1 can be operated in the same manner as a general PC, and various setting operations can be performed. At the time of setting operation, it is also possible to use the display screen 210 surrounded by the black-painted frame 1F as a monitor. Further, an external memory 39 (FIG. 3) such as a general USB memory can be attached to the USB terminal 133. By using a storage medium such as a USB memory, it is possible to transfer image data of a still image of the deceased, which is the source of a moving image, audio data of the deceased, software for enhancing the function of Digital Portrait 1, and the like.

マイク端子１３５には、外部マイク（集音マイク）３１を接続可能である。外部マイク３１を利用すれば、バックヤードに控えたオペレータ３００が音声を入力可能である。マイク端子１３５を複数系統設けることも良い。この場合には、参列者が音声を入力するための外部マイクを接続できる。祭壇に飾られた遺影の間近に参列者が近づけない場合であっても、参列者用の外部マイクがあれば、問掛け音声を確実性高く入力できる。 An external microphone (sound collecting microphone) 31 can be connected to the microphone terminal 135. If the external microphone 31 is used, the operator 300 in the backyard can input voice. It is also possible to provide a plurality of microphone terminals 135. In this case, an external microphone for the attendees to input voice can be connected. Even if the attendees cannot get close to the remains displayed on the altar, if there is an external microphone for the attendees, the question voice can be input with high certainty.

スピーカ端子１３４には、外部スピーカ３２を接続可能である。例えば、参列者の問掛け音声に対応するオペレータ３００が控えるバックヤードに外部スピーカ３２を設置すると良い。問掛け音声を出力する外部スピーカ３２を設定すれば、問掛け音声に対応するオペレータ３００が控えるバックヤードの設置自由度が高くなる。 An external speaker 32 can be connected to the speaker terminal 134. For example, it is preferable to install the external speaker 32 in the backyard where the operator 300 corresponding to the questioning voice of the attendees refrains. If the external speaker 32 that outputs the questioning voice is set, the degree of freedom in installing the backyard that the operator 300 corresponding to the questioning voice refrains from increases.

外部モニタ端子１３２には、汎用のＰＣモニタを接続可能である。例えば、オペレータ３００が控えるバックヤードに外部モニタ３３を設置し、内蔵カメラ２２１による参列者の撮像画像を表示することも良い。参列者を表示する外部モニタ３３をバックヤードに設置すれば、オペレータ３００が参列者の表情やしぐさを確認しながら、参列者の問掛けに対してより的確に対応できる。また例えば、デジタルポートレート１の設定操作を実施する際、例えばオペレータ３００が作業する机上に、キーボード、マウス、外部モニタが揃っていれば、デジタルポートレート１の各種の設定操作や機能の切替操作などをバックヤードから実施できる。また、バックヤードに２台目の外部モニタを設置することも良い。参列者の問掛け音声に対する対応作業と並行して、デジタルポートレート１の設定操作等を実施できる。 A general-purpose PC monitor can be connected to the external monitor terminal 132. For example, it is also possible to install an external monitor 33 in the backyard reserved by the operator 300 and display an image captured by the attendees by the built-in camera 221. If an external monitor 33 that displays attendees is installed in the backyard, the operator 300 can more accurately respond to the questions of attendees while checking the facial expressions and gestures of attendees. Further, for example, when performing the setting operation of the digital portrait 1, for example, if the keyboard, mouse, and external monitor are provided on the desk on which the operator 300 works, various setting operations and function switching operations of the digital portrait 1 are performed. Etc. can be carried out from the backyard. It is also good to install a second external monitor in the backyard. In parallel with the work of responding to the question voice of the attendees, the setting operation of the digital portrait 1 can be performed.

なお、外部端子の機能は、全て、Bluetooth（登録商標）やワイヤレスディスプレイなど、無線技術によって代替可能である。無線技術によれば、設定操作や、問掛け音声に応答するオペレータ３００が控えるバックヤードの設置自由度を格段に向上できる。オペレータ３００が控えるバックヤードを遠隔地に設けることも可能である。例えばインターネットを利用すれば、遠隔地のオペレータ３００が問掛け音声に対応可能である。 All the functions of the external terminals can be replaced by wireless technology such as Bluetooth (registered trademark) and wireless display. According to the wireless technology, it is possible to significantly improve the degree of freedom in setting the backyard, which is reserved by the operator 300 who responds to the question voice and the setting operation. It is also possible to provide a backyard for the operator 300 in a remote location. For example, if the Internet is used, an operator 300 at a remote location can respond to a question voice.

筐体１３には、図３のごとく、動画生成部２０２や音声変換部２０４等としての機能を実現するメイン基板２や、液晶ディスプレイ２１を制御する表示制御基板２１１や、ハードディスク（ＨＤ）２４などの記憶装置や、内蔵スピーカ２２３や内蔵マイク２２２のアンプ基板２２や、内蔵カメラ２２１、図示しない電源回路などが収容されている。 As shown in FIG. 3, the housing 13 includes a main board 2 that realizes functions as a moving image generation unit 202, a sound conversion unit 204, and the like, a display control board 211 that controls a liquid crystal display 21, a hard disk (HD) 24, and the like. The storage device, the amplifier board 22 of the built-in speaker 223 and the built-in microphone 222, the built-in camera 221 and the power supply circuit (not shown) are housed.

メイン基板２は、各種の演算処理を実行するＣＰＵ（Central Processing Unit）２０、ＲＯＭ２０７やＲＡＭ２０８などの記憶素子や、入出力を制御するＩ／Ｏコントローラ２０９などが実装された電子基板である。デジタルポートレート１は、ディスプレイ一体型のＰＣ（Personal Computer）と同様のハードウェア構成を備えている。デジタルポートレート１には、ウィンドウズ（登録商標）やＬｉｎｕｘ（登録商標）などの汎用ＯＳ（Operating System）を組込みしても良く、独自のＯＳを組込みしても良い。 The main board 2 is an electronic board on which a CPU (Central Processing Unit) 20 that executes various arithmetic processes, storage elements such as ROM 207 and RAM 208, and an I / O controller 209 that controls input / output are mounted. The digital portrait 1 has a hardware configuration similar to that of a PC (Personal Computer) with an integrated display. A general-purpose OS (Operating System) such as Windows (registered trademark) or Linux (registered trademark) may be incorporated in the digital portrait 1, or an original OS may be incorporated.

ハードディスク２４は、メイン基板２のＣＰＵ２０が読み出し可能な記憶領域をなし、故人情報記憶部２４０としての機能を備えている。故人情報記憶部２４０は、故人の静止画及び音声データのほか、故人に関わる各種の情報を記憶するための記憶領域をなしている。 The hard disk 24 has a storage area that can be read by the CPU 20 of the main board 2, and has a function as a deceased information storage unit 240. The deceased information storage unit 240 serves as a storage area for storing various information related to the deceased, in addition to still images and audio data of the deceased.

ＣＰＵ２０は、ハードディスク２４に格納されたソフトウェアプログラムを実行することで、故人の３次元モデルを生成する故人モデル化部２０１、故人の動画を生成する動画生成部２０２、故人の声色情報（話者情報の一例）を抽出する話者情報抽出部２０３、オペレータの返答音声の声色を変換する音声変換部２０４、音声認識部２０５等としての各種の機能を実現する。 The CPU 20 has a deceased modeling unit 201 that generates a three-dimensional model of the deceased by executing a software program stored in the hard disk 24, a moving image generation unit 202 that generates a moving image of the deceased, and voice information (speaker information) of the deceased. Various functions are realized as a speaker information extraction unit 203 for extracting (one example), a voice conversion unit 204 for converting the voice color of the operator's response voice, a voice recognition unit 205, and the like.

次に、（１）故人情報記憶部２４０、（２）故人モデル化部２０１、（３）動画生成部２０２、（４）話者情報抽出部２０３、（５）音声変換部２０４、（６）音声認識部２０５の各機能について説明する。 Next, (1) deceased information storage unit 240, (2) deceased modeling unit 201, (3) video generation unit 202, (4) speaker information extraction unit 203, (5) voice conversion unit 204, (6) Each function of the voice recognition unit 205 will be described.

（１）故人情報記憶部（話者情報記憶部）
故人情報記憶部２４０は、例えば、故人に関わる以下の各情報を記憶している。
（１．１）故人の静止画：予め撮影された故人の静止画。
（１．２）故人の音声データ：予め録音された故人の音声データ。
（１．３）故人の声色情報：故人の声色を特定する声色情報。
（１．４）故人の３次元モデルデータ：故人の静止画に基づく３次元モデル（ワイヤフレームモデル）のデータ。
（１．５）故人の情報：故人の趣味や好物や交友関係や姻戚関係等の情報。故人と交友関係や姻戚関係を有する人物の顔画像や個人情報などを、故人の情報に含めることも良い。なお、故人の情報は、オペレータによる操作に応じて適宜、参照可能に構成しておくと良い。故人の情報があれば、オペレータが参列者の問掛けに対してより的確に対応できる。 (1) Deceased information storage unit (speaker information storage unit)
The deceased information storage unit 240 stores, for example, the following information related to the deceased.
(1.1) Still image of the deceased: A still image of the deceased taken in advance.
(1.2) Voice data of the deceased: Pre-recorded voice data of the deceased.
(1.3) Voice information of the deceased: Voice information that identifies the voice of the deceased.
(1.4) Three-dimensional model data of the deceased: Data of a three-dimensional model (wireframe model) based on a still image of the deceased.
(1.5) Information on the deceased: Information on the deceased's hobbies, favorite foods, friendships, and relatives. The information on the deceased may include facial images and personal information of persons who have a friendship or relative relationship with the deceased. It is preferable that the information of the deceased person can be referred to as appropriate according to the operation by the operator. With the information of the deceased, the operator can respond more accurately to the questions of the attendees.

（２）故人モデル化部
故人モデル化部２０１は、顔の特徴点を抽出する特徴点抽出部、３次元モデル生成部により構成されている。
（２．１）特徴点抽出部
特徴点抽出部は、故人の静止画の顔領域について、目、鼻、唇、眉などの特徴点ＦＰ（図４参照。）を抽出する特徴点抽出処理（画像処理の一例）を実行する。例えば目については、目頭、目尻、黒目、目の輪郭をなす点などが特徴点ＦＰとして抽出される。例えば口については、口角、上唇の輪郭をなす点、下唇の輪郭をなす点などが特徴点ＦＰとして抽出される。
（２．２）３次元モデル生成部
３次元モデル生成部は、故人の３次元モデルを生成する３次元モデル生成処理を実行する。画像処理の一例をなす３次元モデル生成処理は、標準的な顔（標準顔）の３次元的な形状を表すワイヤフレームモデルである標準３次元モデルを故人に適合するように変形させることで、故人の３次元モデルを生成する処理である。 (2) Deceased Modeling Unit The deceased modeling unit 201 is composed of a feature point extraction unit that extracts facial feature points and a three-dimensional model generation unit.
(2.1) Feature point extraction unit The feature point extraction unit extracts feature point FPs (see FIG. 4) such as eyes, nose, lips, and eyebrows from the face area of the deceased still image (see FIG. 4). Execute an example of image processing). For example, for the eyes, the inner corners of the eyes, the outer corners of the eyes, the black eyes, the points forming the contour of the eyes, and the like are extracted as feature points FP. For example, for the mouth, corners of the mouth, points forming the contour of the upper lip, points forming the contour of the lower lip, and the like are extracted as feature points FP.
(2.2) 3D model generation unit The 3D model generation unit executes a 3D model generation process for generating a 3D model of the deceased. The 3D model generation process, which is an example of image processing, is to transform the standard 3D model, which is a wireframe model representing the 3D shape of a standard face (standard face), to fit the deceased. This is a process to generate a three-dimensional model of the deceased.

標準３次元モデルは、例えば、目、鼻、唇、眉などに対応する特徴点が、頂点あるいは交点として規定されたワイヤフレームモデルである。３次元モデル生成処理では、故人の静止画から抽出された特徴点と、標準３次元モデルの特徴点と、の対応付けが行われる。故人に係る特徴点の位置関係に合致するように標準３次元モデルを変形することで、故人の３次元モデルを生成できる。故人の３次元モデルをなすワイヤフレームモデルの各サーフェスには、静止画の対応領域の色やテクスチャーを割り付けると良い。 The standard three-dimensional model is a wire frame model in which feature points corresponding to eyes, nose, lips, eyebrows, etc. are defined as vertices or intersections, for example. In the 3D model generation process, the feature points extracted from the still image of the deceased and the feature points of the standard 3D model are associated with each other. A 3D model of the deceased can be generated by modifying the standard 3D model so as to match the positional relationship of the feature points related to the deceased. It is advisable to assign the color and texture of the corresponding area of the still image to each surface of the wireframe model that forms the three-dimensional model of the deceased.

（３）動画生成部
動画生成部２０２は、故人の３次元モデルを変形させて動きを生み出す動画生成処理を実行する。この動画生成処理では、３次元モデルをなすワイヤフレームモデルを局所的あるいは全体的に変形させる処理が行われる。例えば、瞼を閉じる動作を実現する際には、目を構成する特徴点などワイヤフレームモデルの頂点あるいは交点を変位させると良い。例えば、口を開く動作を実現する際には、口を構成する特徴点を変位させる処理が行われる。さらに例えば、頷く動作を実現する際には、故人の３次元モデルをなすワイヤフレームモデルの局所的な変形に代えて、軽く前回りさせるようにワイヤフレームモデル全体を回転変位させた後、すぐに元の位置に戻す処理が行われる。さらに、怒りの表情を故人に行わせる際には、眉や目尻を吊り上げるようなワイヤフレームモデルの変形処理が行われる。例えば故人の発話中では、目のまばたきや、口の開閉や、頷きなどを適宜組み合わせることで、故人の自然な発話動作を再現できる。 (3) Movie Generation Unit The movie generation unit 202 executes a movie generation process of transforming the deceased's three-dimensional model to generate motion. In this moving image generation process, a process of locally or totally deforming the wireframe model forming the three-dimensional model is performed. For example, when realizing the operation of closing the eyelids, it is preferable to displace the vertices or intersections of the wire frame model such as the feature points constituting the eyes. For example, when the operation of opening the mouth is realized, a process of displacing the feature points constituting the mouth is performed. Furthermore, for example, when realizing a nodding motion, instead of the local deformation of the wire frame model forming the deceased's three-dimensional model, the entire wire frame model is rotationally displaced so as to lightly move forward, and then immediately. The process of returning to the original position is performed. Furthermore, when the deceased is made to express anger, the wire frame model is deformed so as to lift the eyebrows and the outer corners of the eyes. For example, during the deceased's utterance, the deceased's natural utterance can be reproduced by appropriately combining blinking eyes, opening and closing of the mouth, and nodding.

（４）話者情報抽出部
話者情報抽出部２０３は、予め録音された故人の音声データから声色情報を分離、抽出する話者情報抽出処理を実行する。音声データに信号処理を施すと、声帯の振動に由来する音源情報と、話者のあごの骨格や口腔形状等が反映された声色情報に分解できる。音源情報は、音の強弱や音程など、音源から発生する音の特性を表す情報である。声色情報は、音源から生じた音が伝達される経路の特性を表す情報である。声色情報は、声質を表す情報であり、声帯の振動が伝わる口腔の形状などが反映される情報である。話者情報抽出部２０３によって抽出された声色情報は、故人の声色の特徴を表す情報として、故人情報記憶部２４０としてのハードディスク２４に保存される。 (4) Speaker Information Extraction Unit The speaker information extraction unit 203 executes a speaker information extraction process for separating and extracting voice color information from pre-recorded voice data of the deceased. When signal processing is applied to the voice data, it can be decomposed into sound source information derived from the vibration of the vocal cords and voice color information reflecting the skeleton of the speaker's jaw and the shape of the oral cavity. The sound source information is information representing the characteristics of the sound generated from the sound source, such as the strength and pitch of the sound. The voice color information is information representing the characteristics of the path through which the sound generated from the sound source is transmitted. The voice color information is information indicating the voice quality, and is information that reflects the shape of the oral cavity through which the vibration of the vocal cords is transmitted. The voice color information extracted by the speaker information extraction unit 203 is stored in the hard disk 24 as the deceased information storage unit 240 as information representing the characteristics of the voice color of the deceased person.

（５）音声変換部
音声変換部２０４は、バックヤードに控えるオペレータの返答音声の声色を変換（加工）する音声変換処理を実行する。音声変換部２０４は、返答音声の音声データから音源情報を抽出する音源情報抽出部、音源情報に対して故人に係る声色情報を組み合わせる音声合成部、を含んで構成される。
（５．１）音源情報抽出部
音源情報抽出部は、外部マイク３１を利用して入力されたオペレータの音声データ（返答音声）から音源情報を分離、抽出する音源情報抽出処理を実行する。この音源情報抽出部は、上記の話者情報抽出部２０３と構成が似通っている。音源情報抽出部は、音声データを音源情報と声色情報とに分解する点において、話者情報抽出部２０３と同様の技術的構成を備えている。話者情報抽出部２０３が声色情報を分離、抽出するのに対して、音源情報抽出部は、音源情報を分離、抽出する点が相違している。
（５．２）音声合成部
音声合成部は、音源情報抽出部によって抽出された音源情報に対して、故人に係る声色情報を組み合わせて新たな音声を生成する音声合成処理を実行する。音声合成部によって合成された音声データは、内蔵スピーカ２２３から出力される。 (5) Voice conversion unit The voice conversion unit 204 executes a voice conversion process for converting (processing) the voice color of the operator's response voice in the backyard. The voice conversion unit 204 includes a sound source information extraction unit that extracts sound source information from the voice data of the response voice, and a voice synthesis unit that combines the voice color information related to the deceased with the sound source information.
(5.1) Sound Source Information Extraction Unit The sound source information extraction unit executes a sound source information extraction process that separates and extracts sound source information from the operator's voice data (response voice) input using the external microphone 31. This sound source information extraction unit has a structure similar to that of the speaker information extraction unit 203 described above. The sound source information extraction unit has the same technical configuration as the speaker information extraction unit 203 in that the voice data is decomposed into sound source information and voice color information. The speaker information extraction unit 203 separates and extracts voice color information, whereas the sound source information extraction unit separates and extracts sound source information.
(5.2) Voice synthesis unit The voice synthesis unit executes a voice synthesis process for generating a new voice by combining the voice color information related to the deceased with the sound source information extracted by the sound source information extraction unit. The voice data synthesized by the voice synthesis unit is output from the built-in speaker 223.

（６）音声認識部
音声認識部２０５は、参列者の問掛け音声及びオペレータの返答音声の意味内容を特定するための音声認識処理を実行する。本例の音声認識部２０５は、音声の意味内容を厳密に特定するものではなく、穏やかなものか、ユーモラスなものか、シリアスなものか、悲しみを含むものか、等を特定するものである。本例の音声認識処理では、例えば、音韻論で分析される最小の音韻的単位である音素の単位時間当たりの個数（発話速度）、音声の音程や、音声の音量、笑い声の有無、等によって、上記のような意味内容が特定される。 (6) Voice recognition unit The voice recognition unit 205 executes a voice recognition process for specifying the meaning and content of the questioning voice of the attendees and the response voice of the operator. The voice recognition unit 205 of this example does not strictly specify the meaning and content of the voice, but specifies whether it is calm, humorous, serious, sad, or the like. .. In the voice recognition process of this example, for example, the number of phonemes per unit time (speech speed), which is the smallest phonological unit analyzed by phonology, the pitch of the voice, the volume of the voice, the presence or absence of laughter, etc. , The meaning and content as described above are specified.

以上のような構成のデジタルポートレート１は、例えば、故人の葬儀が執り行われる祭場において、祭壇の遺影とは別に、故人を偲ぶための遺影として活用される。デジタルポートレート１は、例えば、故人を偲ぶために設けられたスペースの壁面に取り付けられ、参列者は、デジタルポートレート１の故人との対話が可能である。デジタルポートレート１による音声は、バックヤードに控えるオペレータ３００（図２参照。）の音声が故人の声色に変換されたものである。参列者にとっては、生前の故人と対話するように感じられ、故人を懐かしむことが可能である。 The digital portrait 1 having the above configuration is used, for example, in a funeral home where the funeral of the deceased is held, as a lie to remember the deceased, in addition to the deceased of the altar. The digital portrait 1 is attached to the wall surface of a space provided for remembering the deceased, for example, and the attendees can interact with the deceased in the digital portrait 1. The voice of the digital portrait 1 is the voice of the operator 300 (see FIG. 2) in the backyard converted into the voice of the deceased. For attendees, it feels like interacting with the deceased in his lifetime, and it is possible to miss the deceased.

次に、このようなデジタルポートレート１の運用を可能とするための（ａ）準備作業、（ｂ）運用について、順番に説明する。
（ａ）準備作業
デジタルポートレート１を運用する際には、故人の静止画及び音声データを予め、故人情報記憶部２４０としてのハードディスク２４に保存しておく必要がある。例えば、故人の静止画及び音声データを保存したＵＳＢメモリ等の外付メモリ３９を利用すれば、静止画等のデータをデジタルポートレート１に容易に転送できる。なお、精度の高い声色情報を抽出するためには、故人の存命中に、所定パターンの会話文を音読させて記録しておくと良い。所定パターンの会話文には、声色情報を精度高く抽出可能な会話のパターンを設定すると良い。 Next, (a) preparatory work and (b) operation for enabling such operation of Digital Portrait 1 will be described in order.
(A) Preparatory work When operating the digital portrait 1, it is necessary to save the still image and audio data of the deceased in advance on the hard disk 24 as the deceased information storage unit 240. For example, if an external memory 39 such as a USB memory that stores a still image and audio data of the deceased is used, data such as a still image can be easily transferred to the digital portrait 1. In order to extract highly accurate voice information, it is advisable to read aloud a predetermined pattern of conversational sentences and record them during the life of the deceased. It is advisable to set a conversation pattern that can extract voice color information with high accuracy in the conversation sentence of a predetermined pattern.

さらに、故人情報記憶部２４０としてのハードディスク２４には、上記の（１．５）故人の情報を保存しておくと良い。故人の情報は、参列者の問掛け音声にオペレータが返答する際の参考情報として有用である。なお、上記の通り、故人情報記憶部２４０に記憶させる故人に係る情報（静止画、音声データ、故人の情報）は、例えばＵＳＢメモリを利用してデジタルポートレート１に転送可能である。ＵＳＢメモリ等に代えて、フラッシュＲＯＭを内蔵するＳＤカードを利用することも良い。さらに、デジタルポートレート１がＷｉＦｉ（登録商標）などの無線通信機能を備えていれば、別体のＰＣ装置から無線でデータ転送することも良い。さらにデジタルポートレート１がＷｉＦｉ（登録商標）等を介してインターネットに接続可能であれば、インターネットを介してデジタルポートレート１と接続可能な別体のＰＣ装置あるいはサーバ装置からデータ転送することも良い。 Further, the hard disk 24 as the deceased information storage unit 240 may store the above-mentioned (1.5) deceased information. The information of the deceased is useful as reference information when the operator responds to the question voice of the attendee. As described above, the information related to the deceased (still image, audio data, deceased information) stored in the deceased information storage unit 240 can be transferred to the digital portrait 1 using, for example, a USB memory. It is also possible to use an SD card having a built-in flash ROM instead of a USB memory or the like. Further, if the digital portrait 1 has a wireless communication function such as WiFi (registered trademark), data transfer may be performed wirelessly from a separate PC device. Further, if the digital portrait 1 can be connected to the Internet via WiFi (registered trademark) or the like, it is also possible to transfer data from another PC device or server device that can connect to the digital portrait 1 via the Internet. ..

デジタルポートレート１を運用可能な状態に設定するためには、ハードディスク２４に保存されたスタンバイプログラムによる所定のスタンバイ処理の実行が必要である。このスタンバイ処理は、上記の特徴点抽出処理、３次元モデル生成処理、及び話者情報抽出処理を含む処理である。スタンバイ処理の内容について図５及び図６を参照して説明する。 In order to set the digital portrait 1 in an operable state, it is necessary to execute a predetermined standby process by the standby program stored in the hard disk 24. This standby process is a process including the above-mentioned feature point extraction process, three-dimensional model generation process, and speaker information extraction process. The contents of the standby process will be described with reference to FIGS. 5 and 6.

スタンバイ処理では、図５のごとく、メイン基板２のＣＰＵ２０が、まず、故人情報記憶部２４０としてのハードディスク２４から故人の静止画の画像データを読み込む（Ｓ１０１）。そして、故人の静止画について特徴点ＦＰを抽出する特徴点抽出処理を実行する（Ｓ１０２、図４参照。）。 In the standby process, as shown in FIG. 5, the CPU 20 of the main board 2 first reads the image data of the still image of the deceased from the hard disk 24 as the deceased information storage unit 240 (S101). Then, a feature point extraction process for extracting the feature point FP from the still image of the deceased is executed (see S102, FIG. 4).

メイン基板２のＣＰＵ２０は、抽出された故人の特徴点ＦＰによって標準３次元モデルを変形する処理を実行し、故人の３次元モデルを生成する（Ｓ１０３、３次元モデル生成処理）。ＣＰＵ２０は、故人情報記憶部２４０としてのハードディスク２４に故人の３次元モデルを書き込む（Ｓ１０４）。これにより、デジタルポートレート１の運用中に、ＣＰＵ２０が故人の３次元モデルを利用可能になる。 The CPU 20 of the main board 2 executes a process of transforming the standard three-dimensional model by the extracted feature point FP of the deceased to generate a three-dimensional model of the deceased (S103, three-dimensional model generation process). The CPU 20 writes a three-dimensional model of the deceased on the hard disk 24 as the deceased information storage unit 240 (S104). This allows the CPU 20 to use the deceased's 3D model during the operation of Digital Portrait 1.

さらにスタンバイ処理では、図６のごとく、メイン基板２のＣＰＵ２０が、故人情報記憶部２４０としてのハードディスク２４から故人の音声データを読み込む（Ｓ２０１）。そして、ＣＰＵ２０は、故人の音声データから声色情報を分離、抽出する話者情報抽出処理を実行する（Ｓ２０２）。ＣＰＵ２０は、故人情報記憶部２４０としてのハードディスク２４に故人の声色情報を書き込む（Ｓ２０３）。これにより、デジタルポートレート１の運用中に、ＣＰＵ２０が故人の声色情報を利用可能になる。 Further, in the standby process, as shown in FIG. 6, the CPU 20 of the main board 2 reads the voice data of the deceased from the hard disk 24 as the deceased information storage unit 240 (S201). Then, the CPU 20 executes a speaker information extraction process for separating and extracting voice color information from the voice data of the deceased person (S202). The CPU 20 writes the voice information of the deceased person to the hard disk 24 as the deceased person information storage unit 240 (S203). As a result, the CPU 20 can use the voice information of the deceased during the operation of the digital portrait 1.

（ｂ）運用
デジタルポートレート１は、所定の運用プログラムの実行によって運用される。デジタルポートレート１の運用中では、故人の動画が表示画面２１０に表示され、対話も可能である。デジタルポートレート１の故人は、まばたきや、時折、頭を傾けたり、顔の向きを変えるといった動作を行う。特に、会話中の故人は、発話に合わせて口を開閉したり、話の内容に応じて軽く頷くといった様々な動作を行う。 (B) Operation Digital portrait 1 is operated by executing a predetermined operation program. During the operation of Digital Portrait 1, a moving image of the deceased is displayed on the display screen 210, and dialogue is possible. The deceased in Digital Portrait 1 blinks, occasionally tilts his head, and turns his face. In particular, the deceased during conversation performs various actions such as opening and closing the mouth according to the utterance and nodding lightly according to the content of the conversation.

デジタルポートレート１の故人に対面する参列者は、フレーム１Ｆに埋設された内蔵カメラ２２１によって撮像されてバックヤードの外部モニタ３３に表示される。また、その参列者が故人に問掛けた音声は、フレーム１Ｆに埋設された内蔵マイク２２２によって電気信号に変換され、デジタルポートレート１を経由して、バックヤードの外部スピーカ３２から出力される。 Attendees facing the deceased in Digital Portrait 1 are imaged by the built-in camera 221 embedded in the frame 1F and displayed on the external monitor 33 in the backyard. Further, the voice that the attendee asks the deceased is converted into an electric signal by the built-in microphone 222 embedded in the frame 1F, and is output from the external speaker 32 in the backyard via the digital portrait 1.

参列者に応対するオペレータが控えるバックヤードには、上記のごとく、外部スピーカ３２のほかに、オペレータの音声を電気信号に変換する外部マイク３１や、参列者の撮像画像を表示する外部モニタ３３等が設置されている。オペレータは、外部モニタ３３に表示された参列者を視認しながら、問掛け音声に対して適宜、返答できる。オペレータによる返答音声は、音声合成により故人の声色に変換されて内蔵スピーカ２２３から出力される。 As described above, in the backyard where the operator who responds to the attendees refrains, in addition to the external speaker 32, an external microphone 31 that converts the operator's voice into an electric signal, an external monitor 33 that displays the captured image of the attendees, etc. Is installed. The operator can appropriately respond to the question voice while visually recognizing the attendees displayed on the external monitor 33. The response voice by the operator is converted into the voice of the deceased by voice synthesis and output from the built-in speaker 223.

上記のようなデジタルポートレート１の動作の流れを、図７のフロー図を参照して説明する。メイン基板２のＣＰＵ２０は、デジタルポートレート１の運用開始時に、まず、故人情報記憶部２４０としてのハードディスク２４から故人の３次元モデル及び声色情報を読み込む（Ｓ３０１）。 The operation flow of the digital portrait 1 as described above will be described with reference to the flow chart of FIG. When the operation of the digital portrait 1 is started, the CPU 20 of the main board 2 first reads the three-dimensional model and voice information of the deceased from the hard disk 24 as the deceased information storage unit 240 (S301).

続いてＣＰＵ２０は、参列者の音声あるいはオペレータの音声の有無、すなわち音声の入力状態であるか否かを判断する（Ｓ３０２）。音声としては、参列者の問掛け音声、問掛け音声に対するオペレータの返答音声、オペレータの挨拶音声等がある。挨拶音声は、定型文の読み上げ音声などである。定型文としては、例えば「本日は、私の葬儀に御列席頂き、まことにありがとうございます。・・・・」等の挨拶文などがある。 Subsequently, the CPU 20 determines whether or not there is a voice of the attendee or a voice of the operator, that is, whether or not the voice is in the input state (S302). The voice includes a questioning voice of attendees, an operator's response voice to the questioning voice, an operator's greeting voice, and the like. The greeting voice is a reading voice of a fixed phrase. As a fixed phrase, for example, there is a greeting such as "Thank you for attending my funeral today ....".

いずれかの音声が有る場合には（Ｓ３０２：有）、ＣＰＵ２０は、その音声の音声データの取込を実行する（Ｓ３０３）。ＣＰＵ２０は、音声の発話元がオペレータであるか参列者であるかを判断する（Ｓ３０４）。バックヤードのオペレータが発話元であるとき、ＣＰＵ２０は、音声データに対して上記の音源情報抽出処理を適用して、オペレータの音声から音源情報を分離、抽出する（Ｓ３０４：ＹＥＳ→Ｓ３０５）。そして、ＣＰＵ２０は、オペレータの音声から抽出された音源情報に対して、故人の声色情報を組み合わせる音声合成を実行する（Ｓ３０６）。一方、音声の発話元がオペレータではなく参列者であったとき（Ｓ３０４：ＮＯ）、ＣＰＵ２０は、故人の声を音声合成によって再現するための上記のＳ３０５、Ｓ３０６の処理を迂回する。 When any of the voices is present (S302: Yes), the CPU 20 executes the acquisition of the voice data of the voice (S303). The CPU 20 determines whether the voice source is an operator or an attendee (S304). When the operator in the backyard is the utterance source, the CPU 20 applies the above sound source information extraction process to the voice data to separate and extract the sound source information from the operator's voice (S304: YES → S305). Then, the CPU 20 executes voice synthesis that combines the voice color information of the deceased with the sound source information extracted from the operator's voice (S306). On the other hand, when the voice source is not the operator but the attendee (S304: NO), the CPU 20 bypasses the above-mentioned processes S305 and S306 for reproducing the voice of the deceased by voice synthesis.

ＣＰＵ２０は、音声の発話元がオペレータであるか参列者であるかに関わらず、上記のＳ３０３で取り込みした音声に対して音声認識処理を適用し、音声の意味内容を特定する（Ｓ３０７）。なお上記の通り、このときの音声認識は、音声の意味内容を厳密に特定するものではない。例えば、単位時間当たりの音素の数（発話速度）、音声の音程や、音声の音量、笑い声の有無、等によって、音声の意味内容が、穏やかか、ユーモラスか、シリアスか、悲しみを含むものか、等の意味内容を特定するのみである。 The CPU 20 applies voice recognition processing to the voice captured in S303, and specifies the meaning and content of the voice, regardless of whether the voice is uttered by the operator or the attendee (S307). As described above, the voice recognition at this time does not strictly specify the meaning and content of the voice. For example, depending on the number of phonemes per unit time (speech speed), the pitch of the voice, the volume of the voice, the presence or absence of laughter, etc., whether the meaning of the voice is calm, humorous, serious, or includes sadness. It only specifies the meaning and content of, etc.

続いてＣＰＵ２０は、デジタルポートレート１に表示された故人に動きを与えるために、上記のＳ３０１で読み込んだ故人の３次元モデルを適宜、変形させる処理を実行する（Ｓ３０８）。上記のごとく、３次元モデルの変形態様としては、目のまばたきや、口の開閉や、頷きや、顔の向きの変更や、顔の表情の変化などがある。 Subsequently, the CPU 20 appropriately deforms the three-dimensional model of the deceased read in S301 in order to give motion to the deceased displayed in the digital portrait 1 (S308). As described above, the deformation modes of the three-dimensional model include blinking eyes, opening and closing the mouth, nodding, changing the direction of the face, and changing the facial expression.

参列者の音声もオペレータの音声も入力されていない場合（Ｓ３０２：無）、故人の動作は、目のまばたき、辺りを見まわすような顔の向きの変更などが主体的になる。参列者の問掛け音声の入力中（Ｓ３０２：有→Ｓ３０４：ＮＯ）における故人の動作は、問掛け音声の時間的な間や区切り等に応じた頷き、問掛け音声の意味内容に応じた故人の表情変化などがある。また、オペレータの発話中（Ｓ３０２：有→・・・→Ｓ３０４：ＹＥＳ）における故人の動作は、返答音声の発話に応じた口の開閉や、返答音声の意味内容に応じた表情変化などがある。 When neither the attendee's voice nor the operator's voice is input (S302: None), the deceased's movements are mainly blinking eyes and changing the direction of the face to look around. The movement of the deceased while inputting the questioning voice of the attendees (S302: Yes → S304: NO) is a nod according to the time interval or division of the questioning voice, and the deceased according to the meaning and content of the questioning voice. There is a change in facial expression. In addition, the movement of the deceased during the operator's utterance (S302: Yes → ... → S304: YES) includes opening and closing of the mouth according to the utterance of the reply voice, and a facial expression change according to the meaning and content of the reply voice. ..

例えば、発話に合わせて口を開閉させる際には、例えば、返答音声を構成する音源データの強弱や音源情報の種別（母音か子音か等）に応じて口を開閉させるための変形処理を故人の３次元モデルに適用すると良い。変形処理は、３次元モデルをなすワイヤフレームモデルの頂点あるいは交点の一部または全部を変位させる処理である。例えば、口を開閉させる際には、ＣＰＵ２０は、口の輪郭を形成するワイヤフレームモデルの頂点あるいは交点を適宜、変位させることで、口の開閉動作を実現する。 For example, when opening and closing the mouth according to the utterance, for example, the deceased performs a transformation process for opening and closing the mouth according to the strength of the sound source data constituting the response voice and the type of sound source information (vowel or consonant, etc.). It is good to apply it to the 3D model of. The deformation process is a process of displacing a part or all of the vertices or intersections of the wireframe model forming the three-dimensional model. For example, when opening and closing the mouth, the CPU 20 realizes the opening and closing operation of the mouth by appropriately displacing the vertices or intersections of the wire frame model forming the contour of the mouth.

ＣＰＵ２０は、故人の３次元モデルを正面から見た２次元画像に随時、変換し、表示画面２１０の故人の画像を更新する。これにより、デジタルポートレート１に故人の動画を表示できる（Ｓ３０９）。オペレータが発話中のとき、ＣＰＵ２０は、故人の動画と同期して、上記のＳ３０６で音声合成した故人の音声を内蔵スピーカ２２３から出力する。故人の音声としては、参列者の問掛け音声に対する返答音声と、定型の挨拶文の読み上げ音声等がある。 The CPU 20 converts the three-dimensional model of the deceased into a two-dimensional image viewed from the front at any time, and updates the image of the deceased on the display screen 210. As a result, the moving image of the deceased can be displayed in the digital portrait 1 (S309). When the operator is speaking, the CPU 20 synchronizes with the moving image of the deceased person and outputs the voice of the deceased person, which is voice-synthesized in S306, from the built-in speaker 223. The voice of the deceased includes a response voice to the question voice of the attendees and a reading voice of a standard greeting sentence.

以上のように構成されたデジタルポートレート１は、デジタル技術を利用して故人を動画で表示すると共に、故人との対話が可能な高機能の遺影である。このデジタルポートレート１を活用すれば、故人との対話を通じて故人を偲び懐かしむことができる。故人との対話中では、参列者からの問掛けに応じて故人が頷いたり、話の内容に応じて故人の表情が変わることもある。問掛ける側の参列者にとっては、自分の問掛けに応じて故人が反応を示すように感じられ、故人を身近に感じることができる。 The digital portrait 1 configured as described above is a high-performance deceased that can display the deceased as a moving image using digital technology and can interact with the deceased. By utilizing this digital portrait 1, it is possible to remember and miss the deceased through dialogue with the deceased. During the dialogue with the deceased, the deceased may nod in response to a question from the attendees, or the facial expression of the deceased may change depending on the content of the story. For the attendees on the questioning side, the deceased feels as if they are responding to their own questions, and the deceased can be felt closer to them.

さらに参列者は、故人に問掛けたとき、故人の声色で返答を受けることができる。さらに、デジタルポートレート１の故人は、返答する際に口を開閉させたり、返答の内容によって笑ったり怒ったり等、様々な表情を見せる。参列者にとっては、自分の問掛けに応じて故人が生前と同様、誠実に返答してくれるように感じられ、故人に対して親しみを感じることができる。 Furthermore, when asking the deceased, the attendees can receive a reply in the voice of the deceased. Furthermore, the deceased person of Digital Portrait 1 shows various facial expressions such as opening and closing his mouth when replying, and laughing or getting angry depending on the content of the reply. For attendees, it feels like the deceased responds to their questions in the same way as they did in their lifetime, and they can feel familiar with the deceased.

なお、内蔵カメラ２２１による撮像画像に画像処理を施し、デジタルポートレート１の前を通り過ぎる参列者や、立ち止まる参列者を検出することも良い。通り過ぎる参列者を目で追うような動きをデジタルポートレート１の故人に行わせることも良く、前を通る参列者に対して会釈する動作を故人に行わせることも良い。 It is also possible to perform image processing on the image captured by the built-in camera 221 to detect attendees passing in front of the digital portrait 1 and attendees stopping. The deceased person in Digital Portrait 1 may be made to follow the passing attendees with his eyes, or the deceased person may be made to give a bow to the attendees passing in front of him.

メイン基板２のＣＰＵ２０の機能として、参列者を特定するための人物特定処理を実行する人物特定部としての機能を追加することも良い。この人物特定部は、例えば、内蔵カメラ２２１による参列者の撮像画像から顔領域を切り出し、故人に縁のある人物の顔画像等を故人の情報として記憶するハードディスク２４の記憶領域を参照することで、参列者を特定する。参列者を特定したとき、メイン基板２のＣＰＵ２０が、その参列者に関する情報をハードディスク２４から読み出してオペレータに提示すると良い。提示方法としては、参列者の撮像画像を表示する外部モニタ３３の表示画面の一部を占有する表示窓を設け、参列者に関する情報を表示することも良い。この場合には、バックヤードのオペレータは、参列者の様子を視認すると共に、参列者に関する情報を把握した上で、参列者の問掛けに対して適切に返答できる。参列者と故人との関係や参列者の情報等を把握していれば、問掛け音声に対してオペレータが的確に返答可能である。なお、オペレータの操作に応じて、表示窓の表示内容が外部モニタ３３の表示画面に大きく表示されるように切り替えることも良い。参列者を特定できたとき、参列者に関する情報が有る旨をオペレータに報知する手段を設けることも良い。この場合、参列者の情報を求めるオペレータの切替操作に応じて、外部モニタ３３の表示画面を参列者に関する情報の表示画面に切り替えると良い。 As a function of the CPU 20 of the main board 2, it is also possible to add a function as a person identification unit that executes a person identification process for identifying attendees. For example, the person identification unit cuts out a face area from an image captured by an attendee by the built-in camera 221 and refers to a storage area of a hard disk 24 that stores a face image of a person who is related to the deceased as information on the deceased. , Identify attendees. When the attendees are specified, the CPU 20 of the main board 2 may read the information about the attendees from the hard disk 24 and present it to the operator. As a presentation method, it is also possible to provide a display window that occupies a part of the display screen of the external monitor 33 that displays the captured image of the attendees, and display information about the attendees. In this case, the backyard operator can visually recognize the attendees, grasp the information about the attendees, and appropriately respond to the questions of the attendees. If the relationship between the attendee and the deceased and the information of the attendee are known, the operator can accurately respond to the question voice. In addition, it is also possible to switch so that the display content of the display window is displayed larger on the display screen of the external monitor 33 according to the operation of the operator. When the attendees can be identified, it is also possible to provide a means for notifying the operator that there is information about the attendees. In this case, it is preferable to switch the display screen of the external monitor 33 to the display screen of the information about the attendees in response to the switching operation of the operator who requests the information of the attendees.

本例では、予め故人の音声データをハードディスク２４等に保存しておき、スタンバイ処理の実行により故人の声色情報（話者情報の一例）を生成してハードディスク２４に保存している。これに代えて、故人の音声データを取得したとき、直ちにその声色情報を分離、抽出しておくことも良い。そして、故人情報記憶部２４０としてのハードディスク２４に、故人の声色情報を保存しておくことも良い。 In this example, the voice data of the deceased is stored in the hard disk 24 or the like in advance, and the voice information of the deceased (an example of speaker information) is generated by executing the standby process and stored in the hard disk 24. Instead of this, when the voice data of the deceased is acquired, the voice color information may be immediately separated and extracted. Then, the voice information of the deceased may be stored in the hard disk 24 as the deceased information storage unit 240.

本例では、故人の存命中に所定パターンの会話文を音読させることで音声データを収集する例を説明している。これに代えて、生前の故人が撮影されたビデオ画像等の中から故人の音声データを切り出し、声色情報を抽出することも可能である。あるいは、兄弟や親子など声質の似た親族に所定パターンの会話文を音読させ、声色情報を抽出することも良い。近親者の場合、声質を左右する口腔形状や骨格の形状が似通っていることが多いため、故人の声色情報として利用できる可能性がある。 In this example, an example of collecting voice data by reading aloud a predetermined pattern of conversational sentences during the life of the deceased is described. Instead of this, it is also possible to cut out the voice data of the deceased from the video image or the like taken by the deceased before life and extract the voice color information. Alternatively, it is also possible to have relatives with similar voice qualities, such as siblings and parents and children, read aloud a predetermined pattern of conversational sentences to extract voice information. In the case of close relatives, the shape of the oral cavity and the shape of the skeleton, which affect the voice quality, are often similar, so it may be used as voice information for the deceased.

また、本例では、予め故人の静止画をハードディスク２４等に保存しておき、スタンバイ処理の実行により、故人の３次元モデルを生成してハードディスク２４に保存する例を説明している。これに代えて、静止画に基づく故人の３次元モデルを予め生成しておき、ハードディスク２４に保存しておくことも良い。 Further, in this example, an example is described in which a still image of the deceased is stored in the hard disk 24 or the like in advance, and a three-dimensional model of the deceased is generated by executing the standby process and stored in the hard disk 24. Instead of this, a three-dimensional model of the deceased based on a still image may be generated in advance and saved on the hard disk 24.

本例では、参列者による問掛け音声に対してオペレータが考えて返答する構成を例示している。さらに、オペレータを補助する機能として、参列者の撮像画像を外部モニタ３３に表示する機能や、参列者に関する情報や、故人の情報などのテキスト表示を外部モニタ３３に表示する機能等を例示している。これに代えて、メイン基板２のＣＰＵ２０が返答テキストを生成し、外部モニタ３３にテキスト表示する構成を採用しても良い。メイン基板２のＣＰＵ２０が返答のテキストを生成するための構成としては、例えば、問掛け音声の意味内容を音声認識によって特定し、その認識結果に応じて返答例を生成する構成がある。このときの音声認識により特定する意味内容としては、問掛け音声に対する対話が可能なレベルであると良い。対話可能なレベルまで問掛け音声の意味内容を特定できた場合、メイン基板２のＣＰＵ２０が、ハードディスク２４に保存された故人の情報の中から、問掛け音声の意味内容に対応する情報を選択的に取得できる。問掛け音声の意味内容に対応する故人の情報を取得できれば、その情報を反映させた返答例のテキストを生成できる。なお、問掛け音声の意味内容の特定（音声認識）、故人の情報の選択的な取得、故人の情報に基づく返答例のテキスト生成、などの各処理については、例えば機械学習を利用する人工知能的なアプローチを適用できる。人工知能の技術を活用すれば、これらの処理を効率的、かつ精度高く実行できる。 In this example, the configuration in which the operator thinks and responds to the question voice by the attendee is illustrated. Further, as a function of assisting the operator, a function of displaying a captured image of an attendee on an external monitor 33, a function of displaying a text display such as information on attendees and information on a deceased person on an external monitor 33, and the like are exemplified. There is. Instead of this, a configuration may be adopted in which the CPU 20 of the main board 2 generates the response text and displays the text on the external monitor 33. As a configuration for the CPU 20 of the main board 2 to generate a response text, for example, there is a configuration in which the meaning and content of the question voice are specified by voice recognition and a response example is generated according to the recognition result. As the meaning content specified by the voice recognition at this time, it is preferable that the level is such that the dialogue with the question voice is possible. When the meaning and content of the question voice can be specified to a level that allows dialogue, the CPU 20 of the main board 2 selectively selects the information corresponding to the meaning and content of the question voice from the deceased information stored in the hard disk 24. Can be obtained. If the information of the deceased corresponding to the meaning and content of the questioning voice can be obtained, the text of the response example reflecting the information can be generated. For each process such as specifying the meaning and content of the question voice (speech recognition), selectively acquiring the information of the deceased, and generating the text of the response example based on the information of the deceased, for example, artificial intelligence using machine learning. Approach can be applied. By utilizing artificial intelligence technology, these processes can be executed efficiently and with high accuracy.

故人の音声データから音声合成に必要な音素を切り出し、故人の音素データとしてハードディスク２４に保存しておくことも良い。この場合には、故人の音素データに基づく音声合成により、予め用意された挨拶文等のテキストや、メイン基板２のＣＰＵ２０が生成した返答のテキスト等を、故人の声色で読み上げできる。オペレータが返答テキストを打ち込み、ＣＰＵ２０がそのテキストを音声合成する態様を採用することも良い。 It is also possible to cut out the phonemes necessary for voice synthesis from the voice data of the deceased and save them on the hard disk 24 as the phoneme data of the deceased. In this case, texts such as greetings prepared in advance and response texts generated by the CPU 20 of the main board 2 can be read aloud in the voice of the deceased by voice synthesis based on the phoneme data of the deceased. It is also possible to adopt a mode in which the operator inputs the response text and the CPU 20 voice-synthesizes the text.

なお、本例では、参列者の問掛け音声に対して故人が返答するデジタルポートレート１を例示しているが、デジタルポートレート１の故人が返答せず、微笑むだけといったように音声出力しないようにしても良いし、あるいは定型の挨拶を行うだけの態様であっても良い。故人が微笑む表情を行う動作は、周囲の音に反応して実行しても良く、一定時間毎に実行しても良く、前を通過する人の検出を契機として実行しても良く、参列者の問掛け音声の入力を契機として実行しても良い。 In this example, the digital portrait 1 in which the deceased responds to the question voice of the attendee is illustrated, but the deceased in the digital portrait 1 does not respond and does not output the voice such as just smiling. However, it may be a mode in which only a standard greeting is given. The action of making a smiling expression of the deceased person may be executed in response to the surrounding sound, may be executed at regular intervals, may be executed with the detection of a person passing in front of the person, and attendees. It may be executed with the input of the question voice of.

本例は、デジタルポートレート１を遺影として利用する例である。動画出力システム１Ｓの一例であるデジタルポートレート１は、遺影のほか、芸能人や文化人などの有名人のポートレートや、会社や店の経営者のポートレートや、デパートや会社の受付嬢のポートレート等として幅広く活用できる。例えば、会社等の受付嬢のポートレートとして活用する場合であれば、インターネットを利用してバックヤードを遠隔地に設け、一人のオペレータが複数の会社の受付業務を兼業することも可能である。 This example is an example of using Digital Portrait 1 as a ghost. Digital portrait 1, which is an example of the video output system 1S, is a portrait of a celebrity such as a celebrity or a cultural person, a portrait of a company or store owner, or a portrait of a department store or a receptionist of a company. It can be widely used as such. For example, when it is used as a portrait of a receptionist of a company or the like, it is possible to set up a backyard in a remote place using the Internet so that one operator can concurrently perform reception work of a plurality of companies.

本例では、音声に合わせた動画を生成する構成を例示したが、例えば笑った表情や悲しい表情等の表情のパターンを指定することで、その指定された表情のパターンにて動画生成しても良く、この場合、音声に合わせて指定された表情での動画としても良い。
また、本例では、例えば遺影への適用において、遺影となる１枚の静止画に基づいて動画を生成する構成を例示している。この場合、遺影となる静止画としては、故人に対する親しみを感じられるよう、亡くなるよりも少し前の故人の撮像画像が選ばれることが多い。亡くなるよりも少し前の故人の静止画に基づく動画生成に加えて、例えば、故人の１０代の頃の静止画や、２０代の頃の静止画や、３０代の頃の静止画などに基づき、若い年代の頃の故人の動画を生成することも良い。この場合には、遺影だけでなく故人の人生を年代別に顧みたり、思い出深い年代における故人を偲んだりすることが可能となる。このような運用は、遺影だけでなく、上述した有名人のような他のポートレートを対象とした運用においても有用である。 In this example, a configuration for generating a moving image according to the voice is illustrated, but for example, by specifying a facial expression pattern such as a laughing expression or a sad expression, the moving image can be generated with the specified facial expression pattern. Well, in this case, it may be a moving image with a facial expression specified according to the sound.
Further, in this example, for example, in application to an iei, a configuration in which a moving image is generated based on one still image that becomes an iei is illustrated. In this case, as the still image to be the deceased, a captured image of the deceased a little before the death is often selected so that the deceased can feel familiarity. In addition to video generation based on the deceased's still image shortly before his death, for example, based on the deceased's teenage still image, 20's still image, 30's still image, etc. It is also good to generate a video of the deceased when he was young. In this case, it is possible to look back not only on the deceased but also on the life of the deceased by age group, and to remember the deceased in a memorable age. Such an operation is useful not only for the ghost but also for other portraits such as the celebrity mentioned above.

（実施例２）
本例は、実施例１のデジタルポートレートに基づいて、故人の話し方を話者情報に含めた例である。本例の構成のうち、故人の音声データから話者情報を抽出する話者情報抽出部、及びオペレータの音声を変換する音声変換部の構成が実施例１とは相違している。本例の内容について、図８及び図９を参照して説明する。図８は、話者情報抽出部による処理の流れを示すフロー図である。図９は、音声変換部による処理の流れを示すフロー図である。 (Example 2)
This example is an example in which the speaking style of the deceased is included in the speaker information based on the digital portrait of Example 1. Among the configurations of this example, the configurations of the speaker information extraction unit that extracts speaker information from the voice data of the deceased and the voice conversion unit that converts the operator's voice are different from those of the first embodiment. The contents of this example will be described with reference to FIGS. 8 and 9. FIG. 8 is a flow chart showing a processing flow by the speaker information extraction unit. FIG. 9 is a flow chart showing a processing flow by the voice conversion unit.

実施例１では、故人の話者情報として、故人の声色を特定可能な声色情報を例示している。これに対して、本例は、声色情報に加えて、故人の話し方を特定可能な話し方情報を話者情報に含める構成例である。本例の話し方情報は、標準語を含めて秋田弁や熊本弁や名古屋弁などの話し方の種別情報である。 In the first embodiment, voice information that can identify the voice of the deceased is illustrated as the speaker information of the deceased. On the other hand, this example is a configuration example in which, in addition to the voice information, the speaker information includes the speech information that can identify the speech of the deceased. The speaking style information in this example is the type information of speaking styles such as Akita dialect, Kumamoto dialect, and Nagoya dialect, including standard language.

次に、図８を参照して話者情報抽出部が話者情報を抽出する手順を説明する。話者情報抽出部は、まず、所定パターンの会話文の音読による故人の音声データを読み込む（Ｓ４０１）。そして、故人の音声データに話者情報抽出処理を適用することで、声色情報、話し方情報を抽出する（Ｓ４０２）。 Next, a procedure for the speaker information extraction unit to extract speaker information will be described with reference to FIG. The speaker information extraction unit first reads the voice data of the deceased by reading aloud a conversational sentence having a predetermined pattern (S401). Then, by applying the speaker information extraction process to the voice data of the deceased, the voice color information and the speaking style information are extracted (S402).

話者情報抽出部は、音声データを音素に分解して音声認識処理を施すことで、音声をテキスト化する。そして、特定の単語をなす音素列や、特定の文節をなす音素列を切り出し、そのイントネーションから方言の種別を特定する。例えば、地名である「なごや」に対応する音素列について、１文字目の「な」にアクセントがあったり、「あんたぁなにいっとるの」の音素列のなかで３文字目の「た」と最後から３文字目の「と」にアクセントがある等の音声データについては、名古屋弁に分類できる。 The speaker information extraction unit decomposes the voice data into phonemes and performs voice recognition processing to convert the voice into text. Then, the phoneme sequence forming a specific word or the phoneme sequence forming a specific phrase is cut out, and the type of dialect is specified from the intonation. For example, regarding the phoneme string corresponding to the place name "Nagoya", the first letter "na" has an accent, or the third letter "ta" in the phoneme string of "You are what you want". Voice data such as the accent on the third character "to" from the end can be classified as Nagoya dialect.

話者情報抽出部は、実施例１と同様、声色情報及び話し方情報を、話者情報としてハードディスクに書き込む（Ｓ４０３）。例えば、名古屋弁に分類された音声データについては、声色情報に加えて、名古屋弁を表す話し方の種別情報を含む話者情報がハードディスクに保存される。 As in the first embodiment, the speaker information extraction unit writes the voice color information and the speaking style information to the hard disk as speaker information (S403). For example, for voice data classified into the Nagoya dialect, speaker information including voice type information representing the Nagoya dialect is stored in the hard disk in addition to the voice color information.

次に、図９を参照して本例の音声変換部が音声を変換する手順を説明する。音声変換部は、実施例１と同様、まず、オペレータの音声を取り込み（Ｓ５０１）、音源情報を分離、抽出する（Ｓ５０２）。そして、抽出された音源情報に対して、故人の声色情報を組み合わせる音声合成を実行すると共に（Ｓ５０３）、話し方の変換処理を実行する（Ｓ５０４）。 Next, a procedure in which the voice conversion unit of this example converts voice will be described with reference to FIG. Similar to the first embodiment, the voice conversion unit first captures the operator's voice (S501), separates and extracts the sound source information (S502). Then, with respect to the extracted sound source information, voice synthesis that combines the voice color information of the deceased is executed (S503), and the speaking style conversion process is executed (S504).

なお、Ｓ５０４の話し方の変換処理では、オペレータの音声を音素に分解して音声認識処理を施すことで、音声をテキスト化する。そして、特定の単語や特定の文節等が現れたとき、その単語に所定パターンのイントネーションを割り当てる。例えば、「あんた、なにいってるの」という文節に対しては、３文字目の「た」と、最後から３文字目の「て」にアクセントを付けると良い。さらに、「た」を「たぁ」と音を伸ばすと共に、「て」の発音を（ｔｅ）から（ｔｏ）に近づけるように変更すると、名古屋弁らしくなる。なお、イントネーションを変更する対象の上記の特定の単語あるいは文節は、話し方情報が表す話し方の種別（方言の種類）に応じて異なる。話し方の種別によって、特徴ある単語や文節が相違するからである。 In the speech conversion process of S504, the operator's voice is decomposed into phonemes and voice recognition processing is performed to convert the voice into text. Then, when a specific word, a specific phrase, or the like appears, a predetermined pattern of intonation is assigned to the word. For example, for the phrase "you, what are you talking about", it is advisable to add accents to the third letter "ta" and the third letter "te" from the end. Furthermore, if the sound of "ta" is extended to "ta" and the pronunciation of "te" is changed from (te) to (to), it becomes more like a Nagoya dialect. The above-mentioned specific word or phrase whose intonation is to be changed differs depending on the type of speech (type of dialect) represented by the speech information. This is because characteristic words and phrases differ depending on the type of speech.

本例のデジタルポートレート（動画出力システム）は、オペレータの音声の声色を変換すると共に話し方も変換する。このデジタルポートレートによれば、出力する音声を故人の音声により近づけることで、生前の故人の再現性を向上できる。 The digital portrait (video output system) of this example converts the voice color of the operator's voice as well as the speaking style. According to this digital portrait, the reproducibility of the deceased person in his lifetime can be improved by bringing the output sound closer to the sound of the deceased person.

なお、話者情報として、故人の話し方のうちの特徴あるイントネーションのパターンを音声データから抽出しておき、話者情報としてハードディスクに保存しておくことも良い。さらに、イントネーションのパターンには、テキストを対応付けておくと良い。音声変換部は、オペレータの音声を処理して得られたテキストを元にして、音声合成によりオペレータの音声を故人の音声に変換すると良い。テキストのうち、イントネーションのパターンに該当しない部分については、標準語に近いイントネーションを割り当てると良い。一方、イントネーションのパターンに係るテキストに該当する部分については、話者情報をなすイントネーションのパターンを割り当てると良い。
なお、その他の構成及び作用効果については実施例１と同様である。 It is also possible to extract a characteristic intonation pattern of the deceased's speaking style from the voice data as speaker information and save it on the hard disk as speaker information. Furthermore, it is advisable to associate text with the intonation pattern. The voice conversion unit may convert the operator's voice into the deceased's voice by voice synthesis based on the text obtained by processing the operator's voice. For the part of the text that does not correspond to the intonation pattern, it is advisable to assign an intonation close to the standard language. On the other hand, for the part corresponding to the text related to the intonation pattern, it is preferable to assign the intonation pattern that forms speaker information.
The other configurations and actions and effects are the same as in Example 1.

（実施例３）
本例は、実施例１のデジタルポートレート（動画出力システム）に基づいて、バックヤードのオペレータを必要としない構成例である。
本例の構成では、デジタルポートレートのハードディスクが、音声パターンを記憶する音声パターン記憶部としての機能を備えている。音声パターン記憶部としてのハードディスクには、故人が自発的に発話するパターン（自発パターン）や、問掛けに対する返答パターンなどの音声パターンが記録されている。デジタルポートレートは、音声パターンの音源情報に対して故人の声色情報を組み合わせる（加工）ことで、故人の音声を生成する。なお、音声パターンは、予め録音されたアナウンサーなどの音声でも良く、音声から分離、抽出された音源情報であっても良い。 (Example 3)
This example is a configuration example that does not require a backyard operator based on the digital portrait (video output system) of the first embodiment.
In the configuration of this example, the digital portrait hard disk has a function as a voice pattern storage unit for storing voice patterns. The hard disk as the voice pattern storage unit records voice patterns such as a pattern in which the deceased spontaneously speaks (spontaneous pattern) and a response pattern to a question. The digital portrait generates the voice of the deceased by combining (processing) the voice color information of the deceased with the sound source information of the voice pattern. The voice pattern may be a voice recorded in advance by an announcer or the like, or may be sound source information separated and extracted from the voice.

自発パターンとしては、例えば、「おはようございます。」、「ようこそお越し頂きました。」等の挨拶のパターンなどがある。自発パターンに基づく故人の音声は、所定の時刻になったときなどに出力しても良く、デジタルポートレートの前を誰かが横切ったときに出力しても良い。誰かが前を横切ったときの挨拶のパターンは、朝であれば「おはようございます。」、昼であれば「こんにちは。」等、時間帯に応じて変えると良い。デジタルポートレートは、選択された自発パターンの音源情報に対して、故人の声色情報を組み合わせることで故人の音声を生成し、出力する。 Spontaneous patterns include, for example, greeting patterns such as "Good morning" and "Welcome to you." The voice of the deceased based on the spontaneous pattern may be output when a predetermined time is reached, or when someone crosses in front of the digital portrait. Pattern of greeting when someone across the front, if the morning, "Good morning.", If it is daytime, "Hello.", Etc., may vary depending on the time zone. The digital portrait generates and outputs the voice of the deceased by combining the voice information of the deceased with the sound source information of the selected spontaneous pattern.

返答パターンとしては、参列者との間で想定される問答に対応するパターンが用意される。デジタルポートレートは、参列者の問掛け音声の意味内容を特定し（音声認識処理）、対応する返答パターンを選択する。そして、返答パターンの音源情報に対して故人の声色情報を組み合わせることで返答パターンの故人の返答音声を生成し、出力する。 As a response pattern, a pattern corresponding to a question and answer expected with attendees is prepared. The digital portrait identifies the meaning and content of the attendee's question voice (voice recognition processing) and selects the corresponding response pattern. Then, the response voice of the deceased in the response pattern is generated and output by combining the voice color information of the deceased with the sound source information of the response pattern.

本例のデジタルポートレートでは、オペレータが音声を入力しなくても、故人の声による発話、対話が可能である。
なお、自発パターンや返答パターンは、テキストであっても良い。故人の声色情報を利用し、故人の声色でテキストを読み上げることで故人の音声を生成できる。
自発パターンや返答パターンに係る故人の音声は、発話する際に音声合成等により生成しても良いが、各パターンを予め故人の音声に変換しておき、音声パターン記憶部としてのハードディスクに保存しておくことも良い。この場合には、各パターンが必要になったとき、ハードディスクから読み出して直ちに出力可能できる。故人の音声への変換処理は、デジタルポートレート本体で行っても良いが、別体のＰＣ装置で変換した音声をデジタルポートレートに転送しても良い。音声の転送は、例えばＵＳＢメモリなどの記録媒体を利用しても良く、ＷｉＦｉ（登録商標）等の無線通信を利用して転送しても良い。インターネットを経由して遠隔地のサーバ装置等から故人の音声をデジタルポートレートに転送しても良い。
なお、その他の構成及び作用効果については実施例１と同様である。 In the digital portrait of this example, it is possible to speak and interact with the voice of the deceased without the operator inputting voice.
The spontaneous pattern and the response pattern may be text. The voice of the deceased can be generated by reading the text in the voice of the deceased using the voice information of the deceased.
The voice of the deceased related to the spontaneous pattern and the response pattern may be generated by voice synthesis or the like when speaking, but each pattern is converted into the voice of the deceased in advance and saved in the hard disk as the voice pattern storage unit. It is also good to keep it. In this case, when each pattern is needed, it can be read from the hard disk and output immediately. The conversion process to the voice of the deceased may be performed by the digital portrait main body, but the voice converted by another PC device may be transferred to the digital portrait. The voice transfer may be performed using a recording medium such as a USB memory, or may be transferred using wireless communication such as WiFi (registered trademark). The voice of the deceased may be transferred to a digital portrait from a remote server device or the like via the Internet.
The other configurations and actions and effects are the same as in Example 1.

（実施例４）
本例は、実施例１に基づいて、別体のＰＣ装置１００にて生成された動画をデジタルポートレート１が表示するように構成された動画出力システム１Ｓの例である。本例の内容について、図１０を参照して説明する。
本例のデジタルポートレート１は、業者が利用者に提供して運用されるレンタル機器である。デジタルポートレート１の利用者は、故人の静止画等を保存したＵＳＢメモリ３９１等を一旦、業者に送るか引き渡すと、別体のＰＣ装置１００で生成された動画等が保存されたＵＳＢメモリ３９１の返却を受けることができる。ＵＳＢメモリ３９１を利用してやり取りする情報には、動画に加えて、音声を含めても良い。 (Example 4)
This example is an example of the moving image output system 1S configured so that the digital portrait 1 displays the moving image generated by the separate PC device 100 based on the first embodiment. The contents of this example will be described with reference to FIG.
The digital portrait 1 of this example is a rental device provided and operated by a trader to a user. Once the user of Digital Portrait 1 sends or hands over the USB memory 391 or the like that stores the still image of the deceased to a trader, the USB memory 391 or the like that stores the moving image or the like generated by the separate PC device 100 is stored. Can be returned. The information exchanged using the USB memory 391 may include audio in addition to the moving image.

デジタルポートレート１を利用するに当たって、利用者は、デジタルポートレート１のＵＳＢ端子に、業者から返却されたＵＳＢメモリ３９１を装着するだけで良い。ＵＳＢメモリ３９１に保存された動画等は、出力装置としてのデジタルポートレート１のＣＰＵによって自動的に読み出され、出力可能な状態になる。 In using the digital portrait 1, the user only needs to attach the USB memory 391 returned from the vendor to the USB terminal of the digital portrait 1. The moving image or the like stored in the USB memory 391 is automatically read by the CPU of the digital portrait 1 as an output device, and is ready for output.

業者が予め用意する動画としては、目のまばたきのパターン、口の開閉パターン、微笑みのパターン、頷きのパターンなど、各種の動作パターンがある。デジタルポートレート１は、問掛け音声や、故人が発話する音声等の内容や、周囲の状況に応じて、動作パターンを組み合わせることで、故人を動画で表示する。 There are various motion patterns such as blinking eyes, opening and closing of the mouth, smiling patterns, and nodding patterns as videos prepared in advance by the vendor. The digital portrait 1 displays the deceased as a moving image by combining operation patterns according to the contents of the question voice, the voice uttered by the deceased, and the surrounding situation.

本例は、デジタルポートレート１と別体のＰＣ装置１００との組み合わせにより動画出力システム１Ｓを構成した例である。本例の構成では、故人の静止画から３次元モデルを生成する故人モデル化部、故人の動画を生成する動画生成部などの機能を、外部のＰＣ装置１００に持たせている。本例の動画出力システム１Ｓでは、デジタルポートレート１の処理負担を軽減でき、製品コストを低減できる。 This example is an example in which the moving image output system 1S is configured by combining the digital portrait 1 and the separate PC device 100. In the configuration of this example, the external PC device 100 is provided with functions such as a deceased modeling unit that generates a three-dimensional model from a still image of the deceased and a moving image generating unit that generates a moving image of the deceased. In the moving image output system 1S of this example, the processing load of the digital portrait 1 can be reduced, and the product cost can be reduced.

なお、本例の構成では、デジタルポートレート１、及び外部のＰＣ装置１００の双方が、データのやり取りを実現するための構成を備えている必要がある。ＵＳＢメモリ３９１等の記録媒体を利用してデータをやり取りする場合であれば、双方がＵＳＢメモリ等の記録媒体から直接的あるいは間接的にデータを読み取るための構成が必要である。 In the configuration of this example, both the digital portrait 1 and the external PC device 100 need to have a configuration for realizing data exchange. When data is exchanged using a recording medium such as a USB memory 391, both sides need a configuration for directly or indirectly reading the data from the recording medium such as a USB memory.

本例では、静止画や動画等のデータの記録媒体としてＵＳＢメモリ３９１を例示しているが、記録媒体はＵＳＢメモリ３９１に限定されず、ＳＤカード、ＣＤ−Ｒなど様々な記録媒体を利用できる。 In this example, the USB memory 391 is illustrated as a recording medium for data such as still images and moving images, but the recording medium is not limited to the USB memory 391, and various recording media such as an SD card and a CD-R can be used. ..

利用者から業者への静止画等のデータ供給、及び業者から利用者への動画等のデータ供給のうちの少なくともいずれか一方については、ＵＳＢメモリ３９１等の記録媒体を利用する態様に代えて、インターネットを利用する通信による態様により実現することも良い。 For at least one of the data supply of still images and the like from the user to the trader and the data supply of the moving image and the like from the trader to the user, instead of using a recording medium such as a USB memory 391, It may be realized by the mode of communication using the Internet.

デジタルポートレート１がＷｉＦｉあるいは有線ＬＡＮ等の通信機能を備えており、インターネットに接続された状態であっても良い。この場合には、静止画あるいは音声のデータを保存するＵＳＢメモリ３９１がＵＳＢ端子に装着されたとき、静止画等を自動的に読み出し、インターネット経由で業者が運用する専用サイトに静止画等のデータを送信すると良い。専用サイトにて、静止画に基づく動画等を生成して、送信元のデジタルポートレート１に返信するように構成すると良い。デジタルポートレート１では、専用サイトから受信した動画や音声を出力できる。この場合には、故人の静止画や故人の音声データを保存するＵＳＢメモリ３９１をＵＳＢ端子に装着するだけで、デジタルポートレート１が故人の動画や音声を出力できるようになる。 The digital portrait 1 may have a communication function such as WiFi or a wired LAN, and may be connected to the Internet. In this case, when the USB memory 391 that stores the still image or audio data is attached to the USB terminal, the still image or the like is automatically read out, and the data such as the still image is sent to the dedicated site operated by the trader via the Internet. Should be sent. It is preferable to generate a moving image based on a still image on a dedicated site and reply to the sender's digital portrait 1. Digital portrait 1 can output video and audio received from a dedicated site. In this case, the digital portrait 1 can output the video and audio of the deceased only by attaching the USB memory 391 for storing the still image of the deceased and the audio data of the deceased to the USB terminal.

なお、静止画に基づいて動画を生成するアプリケーションソフト、あるいはそのアプリケーションソフトがインストールされたＰＣ装置を、販売、貸与等により利用者に提供することも良い。この場合には、デジタルポートレート１の利用者が、自分のＰＣ装置あるいは提供を受けたＰＣ装置を利用して動画等を生成できる。生成した動画は、デジタルポートレート１にて出力可能である。 It is also possible to provide the user with application software that generates a moving image based on a still image, or a PC device in which the application software is installed, by selling, renting, or the like. In this case, the user of Digital Portrait 1 can generate a moving image or the like by using his / her own PC device or the provided PC device. The generated moving image can be output in digital portrait 1.

なお、本例では、各種の動画パターンを保存したＵＳＢメモリ３９１を利用者に返却しているが、動画生成に必要な故人の３次元モデルを保存したＵＳＢメモリを利用者に返却することも良い。この場合には、デジタルポートレート１側で３次元モデルを適宜、変形させる処理を実行することで、故人の動画を生成できる。
なお、その他の構成及び作用効果については実施例１と同様である。 In this example, the USB memory 391 that stores various video patterns is returned to the user, but it is also possible to return the USB memory that stores the 3D model of the deceased required for video generation to the user. .. In this case, a moving image of the deceased can be generated by appropriately deforming the three-dimensional model on the digital portrait 1 side.
The other configurations and actions and effects are the same as in Example 1.

（実施例５）
本例は、実施例１のデジタルポートレートに基づいて、インターネット１０１を介在してデジタルポートレート１を運用する動画出力システム１Ｓの例である。この内容について、図１１を参照して説明する。 (Example 5)
This example is an example of a moving image output system 1S that operates the digital portrait 1 via the Internet 101 based on the digital portrait of the first embodiment. This content will be described with reference to FIG.

本例の動画出力システム１Ｓでは、故人の動画や音声を出力する出力装置であるデジタルポートレート１と、動画や音声を生成するサーバ装置１００と、が別の場所に設置され、両者がインターネット１０１を介して通信可能に接続されている。オペレータ３００は、サーバ装置１００と通信可能に接続されたＰＣ装置（図示略）を利用して、参列者の問掛け音声に対する返答が可能である。 In the video output system 1S of this example, the digital portrait 1 which is an output device for outputting the video and audio of the deceased and the server device 100 for generating the video and audio are installed in different places, and both are installed on the Internet 101. It is connected so that it can communicate via. The operator 300 can respond to the question voice of the attendees by using a PC device (not shown) communicatively connected to the server device 100.

なお、オペレータ３００のＰＣ装置は、サーバ装置１００とは別の場所に設置することも良い。インターネット１０１を利用すれば、サーバ装置１００とＰＣ装置との間でのデータ通信が可能になる。オペレータ３００のＰＣ装置を省略し、サーバ装置１００に外部マイク３１や外部モニタ３３や外部スピーカ３２等を設けることも良い。 The PC device of the operator 300 may be installed in a place different from that of the server device 100. If the Internet 101 is used, data communication between the server device 100 and the PC device becomes possible. It is also possible to omit the PC device of the operator 300 and provide the server device 100 with an external microphone 31, an external monitor 33, an external speaker 32, or the like.

動画出力システム１Ｓでは、デジタルポートレート１にサーバ装置１００が管理する専用サイトが登録されている。デジタルポートレート１は、例えばＵＳＢメモリ３９１等の記録媒体から故人の静止画や、故人の音声データや、故人の情報等を読み込むと、上記の専用サイトに自動的にアクセスするアクセス手段を備えている。サーバ装置１００が管理する専用サイトへの自動アクセスに応じて、ＵＳＢメモリ３９１等の記録媒体に保存された故人の静止画や、故人の音声データや、故人の情報等が、サーバ装置１００にアップロードされる。 In the moving image output system 1S, a dedicated site managed by the server device 100 is registered in the digital portrait 1. The digital portrait 1 is provided with an access means for automatically accessing the above-mentioned dedicated site when a still image of the deceased, voice data of the deceased, information of the deceased, etc. are read from a recording medium such as a USB memory 391. There is. The still image of the deceased, the voice data of the deceased, the information of the deceased, etc., saved in the recording medium such as the USB memory 391, are uploaded to the server device 100 in response to the automatic access to the dedicated site managed by the server device 100. Will be done.

また、参列者がデジタルポートレート１の故人に問い掛けると、その問掛け音声や、参列者の撮像画像等がサーバ装置１００にアップロードされ、オペレータ３００のＰＣ装置に転送される。また、オペレータの音声は、サーバ装置１００に送信され、故人の音声に変換された上でデジタルポートレート１に送信される。デジタルポートレート１は、このようにサーバ装置１００から受信した音声や動画を出力する。本例の動画出力システム１Ｓにおけるデジタルポートレート１は、参列者の問掛け音声や撮像画像をアップロードし、インターネット１０１を介して受信する動画や故人の音声等を出力するのみの端末となる。 Further, when the attendee asks the deceased person of the digital portrait 1, the questioning voice, the captured image of the attendee, and the like are uploaded to the server device 100 and transferred to the PC device of the operator 300. Further, the operator's voice is transmitted to the server device 100, converted into the voice of the deceased person, and then transmitted to the digital portrait 1. The digital portrait 1 outputs the audio and the moving image received from the server device 100 in this way. The digital portrait 1 in the moving image output system 1S of this example is a terminal that only uploads the questioning voice of the attendees and the captured image and outputs the moving image received via the Internet 101, the voice of the deceased, and the like.

サーバ装置１００は、故人情報記憶部、故人モデル化部、動画生成部、話者情報抽出部、音声変換部、音声認識部、としての機能を備えている。サーバ装置１００は、参列者の問掛け音声の意味内容の特定、故人の３次元モデルの生成、故人の動画生成、オペレータの返答音声の変換等の処理を実行する。特に、故人の動画の生成に当たっては、音声の意味内容に応じた動作あるいは表情を実現できるように故人の動画を生成する。 The server device 100 has functions as a deceased information storage unit, a deceased modeling unit, a moving image generation unit, a speaker information extraction unit, a voice conversion unit, and a voice recognition unit. The server device 100 executes processing such as specifying the meaning and content of the questioning voice of the attendee, generating a three-dimensional model of the deceased, generating a moving image of the deceased, and converting the response voice of the operator. In particular, when generating a moving image of the deceased, the moving image of the deceased is generated so that an operation or a facial expression according to the meaning and content of the voice can be realized.

本例の構成では、例えば、故人の静止画や音声や情報を保存したＵＳＢメモリ３９１等の記録媒体をデジタルポートレート１に装着したとき、インターネット経由で故人の静止画等がサーバ装置に送信されると良い。この構成では、ＵＳＢ端子、ＵＳＢメモリからデータを読み取るＣＰＵ等が、故人の静止画を取得するための静止画取得部を構成している。この静止画取得部によって取得された静止画等は、デジタルポートレート１の識別情報を対応付けて、動画生成部としてのサーバ装置１００にアップロードすると良い。サーバ装置１００側では、動画生成部が生成する動画に対して、元データである静止画に係る識別情報を対応付けると良い。出力装置であるデジタルポートレート１の識別情報が動画に対応付けされていれば、対応するデジタルポートレート１に確実に動画を送信できる。 In the configuration of this example, for example, when a recording medium such as a USB memory 391 that stores a still image of the deceased or audio or information is attached to the digital portrait 1, the still image of the deceased is transmitted to the server device via the Internet. It is good. In this configuration, a USB terminal, a CPU that reads data from a USB memory, and the like constitute a still image acquisition unit for acquiring a still image of the deceased. The still image or the like acquired by the still image acquisition unit may be uploaded to the server device 100 as the moving image generation unit in association with the identification information of the digital portrait 1. On the server device 100 side, it is preferable to associate the moving image generated by the moving image generation unit with the identification information related to the still image which is the original data. If the identification information of the digital portrait 1 which is the output device is associated with the moving image, the moving image can be reliably transmitted to the corresponding digital portrait 1.

なお、デジタルポートレート１から故人の静止画等が自動的あるい半自動的にサーバ装置１００にアップロードされる構成に代えて、デジタルポートレート１の操作によりサーバ装置が管理する専用サイトにアクセスし、故人の静止画や音声をアップロードする構成を採用しても良い。故人の静止画等のアップロードの際、対応するデジタルポートレート１の識別情報を対応付けると良い。
なお、その他の構成及び作用効果については実施例１と同様である。 Instead of the configuration in which the still image of the deceased is automatically or semi-automatically uploaded from the digital portrait 1 to the server device 100, the dedicated site managed by the server device is accessed by the operation of the digital portrait 1. A configuration for uploading a still image or audio of the deceased may be adopted. When uploading a still image of the deceased, it is preferable to associate the identification information of the corresponding digital portrait 1.
The other configurations and actions and effects are the same as in Example 1.

（実施例６）
本例は、実施例１を元にして、動画の生成方法を変更した構成例である。この内容について、図１２を参照して説明する。 (Example 6)
This example is a configuration example in which the method of generating a moving image is changed based on the first embodiment. This content will be described with reference to FIG.

本例の動画出力システムをなすデジタルポートレートには、バックヤードのオペレータを撮像する外部カメラが接続されている。デジタルポートレートは、オペレータの撮像画像を利用して、故人の動画を生成する構成を備えている。 An external camera that captures the operator in the backyard is connected to the digital portrait that forms the video output system of this example. The digital portrait has a configuration in which a moving image of the deceased is generated by using an image captured by the operator.

本例のデジタルポートレートが動画を生成する動作について、図１２のフロー図を参照して説明する。同図は、故人の３次元データを読み込み済みであることを前提とした処理の流れを説明するためのフロー図である。ここでは、デジタルポートレートが備えるメイン基板のＣＰＵ（図３中の符号２０）を主体として処理の内容を説明する。 The operation of the digital portrait of this example to generate a moving image will be described with reference to the flow chart of FIG. The figure is a flow chart for explaining the flow of processing on the premise that the three-dimensional data of the deceased has been read. Here, the content of the processing will be described mainly by the CPU (reference numeral 20 in FIG. 3) of the main board included in the digital portrait.

メイン基板のＣＰＵは、バックヤードのオペレータの撮像画像を取り込み（Ｓ６０１）、撮像画像から顔領域の切り出しを実行する（Ｓ６０２）。そして、ＣＰＵは、オペレータの顔の特徴点を抽出し（Ｓ６０３）、故人の特徴点と、の対応付けを実行する（Ｓ６０４）。特徴点の対応付けは、例えば、オペレータの口元と故人の口元との対応付け、オペレータの目と故人の目との対応付け等、対応する部位の特徴点が対応付けられる。 The CPU of the main board captures the captured image of the operator in the backyard (S601) and cuts out the face region from the captured image (S602). Then, the CPU extracts the feature points of the operator's face (S603) and executes the association with the feature points of the deceased person (S604). The feature points of the corresponding parts are associated with each other, for example, the association between the operator's mouth and the deceased's mouth, the association between the operator's eyes and the deceased's eyes, and the like.

このような特徴点の対応付けがなされた状態で、ＣＰＵは、オペレータの撮像画像（動画）について、特徴点の動きを検出する（Ｓ６０５）。そしてＣＰＵは、オペレータに係る各特徴点の動きの情報に基づき、故人に係る各特徴点に同様の動きが生じるように故人の３次元モデルを変形させる（Ｓ６０６）。そして、この３次元モデルに基づく動画を表示する（Ｓ６０７）。 In the state where the feature points are associated with each other, the CPU detects the movement of the feature points in the captured image (moving image) of the operator (S605). Then, the CPU deforms the three-dimensional model of the deceased so that the same movement occurs at each feature point related to the deceased based on the information of the movement of each feature point related to the operator (S606). Then, a moving image based on this three-dimensional model is displayed (S607).

本例の構成によれば、オペレータと同様の表情および動作を、デジタルポートレートの中の故人に行わせることが可能である。例えば、参列者との会話中にオペレータが微笑めば、デジタルポートレートの故人も微笑むことになる。また、例えば、参列者との会話中にオペレータが頷けば、デジタルポートレートの故人が同様に頷くことになる。
なお、その他の構成及び作用効果については実施例１と同様である。 According to the configuration of this example, it is possible to make the deceased person in the digital portrait perform the same facial expressions and movements as the operator. For example, if the operator smiles during a conversation with attendees, the deceased in Digital Portrait will also smile. Also, for example, if the operator nods during a conversation with the attendees, the deceased in the digital portrait will nod as well.
The other configurations and actions and effects are the same as in Example 1.

（実施例７）
本例は、実施例１のデジタルポートレートに基づいて、動画の立体表示を可能とした構成例である。この内容について、図１３〜図１６を参照して説明する。
本例は、故人を立体的に表示するための立体表示部５が組み込まれたデジタルポートレートの例である。
立体表示部５（図１３）は、ハーフミラー５０を利用して故人５１１を立体的に表示させるように構成されている。さらに、この立体表示部５では、立体的に表示された故人５１１と重ねて像５３１が表示される。 (Example 7)
This example is a configuration example that enables stereoscopic display of a moving image based on the digital portrait of Example 1. This content will be described with reference to FIGS. 13 to 16.
This example is an example of a digital portrait in which a three-dimensional display unit 5 for three-dimensionally displaying the deceased is incorporated.
The three-dimensional display unit 5 (FIG. 13) is configured to display the deceased 511 three-dimensionally by using the half mirror 50. Further, in the stereoscopic display unit 5, the image 531 is displayed so as to be superimposed on the deceased 511 that is stereoscopically displayed.

立体表示部５は、参列者等の観者に対面して配設される透明なガラスパネル５００の内側に、ハーフミラー５０が斜めに配設された空間を有している。この空間の底面には、ハーフミラー５０に対して斜めに対面するように液晶ディスプレイ５１が配設されている。観者の視線は、ハーフミラー５０によって曲げられて液晶ディスプレイ５１に向かう。 The three-dimensional display unit 5 has a space in which the half mirror 50 is diagonally arranged inside the transparent glass panel 500 arranged so as to face a viewer such as an attendee. A liquid crystal display 51 is arranged on the bottom surface of this space so as to face the half mirror 50 diagonally. The line of sight of the viewer is bent by the half mirror 50 and directed toward the liquid crystal display 51.

ハーフミラー５０を介して観者と対面する空間の奥側の壁面５３には、各種の像５３１が描かれている。本例では、仏様の背景に描かれることが多い蓮、鳥などの像５３１が壁面５３に描かれている。観者は、ハーフミラー５０を透して像５３１を見込むことが可能である。 Various images 531 are drawn on the wall surface 53 on the back side of the space facing the viewer through the half mirror 50. In this example, a lotus, a bird, or other statue 531 that is often drawn on the background of a Buddha is drawn on the wall surface 53. The viewer can see the image 531 through the half mirror 50.

本例の立体表示部５では、ガラスパネル５００の内側を見込む観者の視線の一部がハーフミラー５０によって曲げられ、空間の底面に沿うように配設された液晶ディスプレイ５１の故人５１１に向かう。一方、観者の視線の一部は、ハーフミラー５０を通過して壁面５３に向かっている。このとき、ガラスパネル５００に対面している観者の目には、故人の立体像５１１Ａと像５３１とが重なって視認される（図１４）。 In the stereoscopic display unit 5 of this example, a part of the line of sight of the viewer looking inside the glass panel 500 is bent by the half mirror 50 and heads toward the deceased 511 of the liquid crystal display 51 arranged along the bottom surface of the space. .. On the other hand, a part of the line of sight of the viewer passes through the half mirror 50 and heads toward the wall surface 53. At this time, the stereoscopic image 511A and the image 531 of the deceased are visually recognized by the viewer facing the glass panel 500 in an overlapping manner (FIG. 14).

壁面５３の奥行き方向の位置を調整することで、図１４のように観者に視認される像に奥行感を持たせることも可能である。例えば、壁面５３を奥側に移動させれば、像５３１に対して故人５１１が飛び出しているように観者に感じさせることができる。
空間の底面に配設される液晶ディスプレイ５１に代えて、液晶プロジェクターと映像を映し出すスクリーンとの組み合わせを採用することもできる。この場合、液晶ディスプレイ５１に代えてスクリーンを配置すると共に、スクリーンと対面する空間の天井に液晶プロジェクターを配設すれば良い。 By adjusting the position of the wall surface 53 in the depth direction, it is possible to give a sense of depth to the image visually recognized by the viewer as shown in FIG. For example, if the wall surface 53 is moved to the back side, the viewer can feel that the deceased 511 is protruding from the image 531.
Instead of the liquid crystal display 51 arranged on the bottom surface of the space, a combination of a liquid crystal projector and a screen for projecting an image can be adopted. In this case, the screen may be arranged instead of the liquid crystal display 51, and the liquid crystal projector may be arranged on the ceiling of the space facing the screen.

本例では、壁面５３として、像５３１を描いた壁面を例示しているが、これに代えて、液晶ディスプレイを壁面に設けることも良い。液晶ディスプレイを壁面に設ければ、故人の立体像５１１Ａの背景の像の変更表示が可能である。 In this example, the wall surface 53 on which the image 531 is drawn is illustrated, but instead, a liquid crystal display may be provided on the wall surface. If a liquid crystal display is provided on the wall surface, it is possible to change and display the background image of the deceased stereoscopic image 511A.

本例に代えて、図１５の立体表示部５を採用しても良い。同図の立体表示部５では、ハーフミラー５０の斜めの角度が変更されており、故人５１１を表示する液晶ディスプレイ５１が空間の天井に沿って配設されている。また、観者と対面する壁面には、表示パネル５３が配設されている。同図の表示パネル５３は、光源となるＬＥＤがちりばめて配置された光源パネルである。この表示パネル５３では、一部または全部のＬＥＤを選択的に点灯できる。 Instead of this example, the stereoscopic display unit 5 of FIG. 15 may be adopted. In the three-dimensional display unit 5 of the figure, the oblique angle of the half mirror 50 is changed, and a liquid crystal display 51 displaying the deceased 511 is arranged along the ceiling of the space. Further, a display panel 53 is arranged on the wall surface facing the viewer. The display panel 53 in the figure is a light source panel in which LEDs serving as a light source are studded. In this display panel 53, some or all of the LEDs can be selectively turned on.

さらに、図１５の立体表示部５では、ハーフミラー５０が配設された空間が奥行き方向に延長されている。この立体表示部５では、故人５１１の立体像５１１Ａの背面側に光源となるＬＥＤが位置している。それ故、この立体表示部５では、故人５１１の立体像５１１Ａが、表示パネル５３の手前側に結像する。 Further, in the stereoscopic display unit 5 of FIG. 15, the space in which the half mirror 50 is arranged is extended in the depth direction. In the stereoscopic display unit 5, an LED serving as a light source is located on the back side of the stereoscopic image 511A of the deceased 511. Therefore, in the stereoscopic display unit 5, the stereoscopic image 511A of the deceased 511 is imaged on the front side of the display panel 53.

図１５の立体表示部５では、観者が、背面側に光源であるＬＥＤ５３８が位置する状態で、故人５１１の立体像５１１Ａを視認できる（図１６）。観者側から見て、立体像５１１Ａよりも離れた位置で点灯するＬＥＤ５３８は、立体像５１１Ａに一層の奥行感を与えるという効果を生じさせる。例えば、図１６のＬＥＤ５３８を結ぶ多角形状の相似形を維持したまま、多角形状の大きさが次第に小さくなるように点灯状態のＬＥＤを切り替えることも良い。この場合には、背景が遠ざかるのに相対して、立体像５１１Ａが手前側にせり出すように観者に感じさせることができる。なお、ＬＥＤ５３８をちりばめた表示パネル５３に代えて、液晶ディスプレイを採用することも良い。
なお、その他の構成及び作用効果については実施例１と同様である。 In the stereoscopic display unit 5 of FIG. 15, the viewer can visually recognize the stereoscopic image 511A of the deceased 511 in a state where the LED 538, which is a light source, is located on the back side (FIG. 16). The LED 538 that lights up at a position farther than the stereoscopic image 511A when viewed from the viewer side has the effect of giving the stereoscopic image 511A a further sense of depth. For example, it is also possible to switch the lit LEDs so that the size of the polygons gradually decreases while maintaining the similar figures of the polygons connecting the LEDs 538 of FIG. In this case, it is possible to make the viewer feel that the stereoscopic image 511A protrudes toward the front side as the background moves away. It is also possible to use a liquid crystal display instead of the display panel 53 studded with LED 538.
The other configurations and actions and effects are the same as in Example 1.

（実施例８）
本例は、実施例７のデジタルポートレートに基づいて、故人の全身表示の動画を立体的に表示可能に構成した例である。この内容について、図１７及び図１８を参照して説明する。本例のデジタルポートレート１は、図１７のように故人の全身表示が可能であるうえ、故人を立体的に表示可能である。 (Example 8)
This example is an example in which a moving image of the whole body of the deceased can be displayed three-dimensionally based on the digital portrait of Example 7. This content will be described with reference to FIGS. 17 and 18. The digital portrait 1 of this example can display the deceased as a whole body as shown in FIG. 17, and can also display the deceased in three dimensions.

デジタルポートレート１が備える立体表示部５の構造について、図１７中のＡ−Ａ矢視断面の構造を示す図１８の断面図を参照して説明する。立体表示部５は、参列者等の観者に対面して配設される透明なガラスパネル５００の内側に、ハーフミラー５０が斜めに配設された空間を有している。この空間の側面には、ハーフミラー５０に対して斜めに対面するように故人５１１の全身を動画表示する液晶ディスプレイ５１が配設されている。 The structure of the three-dimensional display unit 5 included in the digital portrait 1 will be described with reference to the cross-sectional view of FIG. 18 showing the structure of the cross-sectional view taken along the line AA in FIG. The three-dimensional display unit 5 has a space in which the half mirror 50 is diagonally arranged inside the transparent glass panel 500 arranged so as to face a viewer such as an attendee. On the side surface of this space, a liquid crystal display 51 that displays a moving image of the whole body of the deceased 511 so as to face the half mirror 50 diagonally is arranged.

この立体表示部５によれば、ハーフミラー５０の裏側に結像する立体像５１１Ａを観者に視認させることができる。このデジタルポートレート１は、例えば、葬儀会場のエントランス等への設置に好適である。 According to the stereoscopic display unit 5, the stereoscopic image 511A formed on the back side of the half mirror 50 can be visually recognized by the viewer. This digital portrait 1 is suitable for installation at, for example, the entrance of a funeral hall.

本例に代えて、透明スクリーン５２と、下方から斜め上方に向けて透明スクリーン５２に像を投影するプロジェクタ５２１と、を備える図１９のデジタルポートレート１を採用することも良い。このデジタルポートレート１では、透明スクリーン５２に、故人の像５２１Ａを結像できる。観者にとっては、故人の像５２１Ａが空間に浮かぶ立体像として視認できる。
なお、その他の構成及び作用効果については実施例７と同様である。 Instead of this example, it is also possible to employ the digital portrait 1 of FIG. 19 including the transparent screen 52 and the projector 521 that projects an image onto the transparent screen 52 obliquely upward from below. In this digital portrait 1, the image 521A of the deceased can be imaged on the transparent screen 52. For the viewer, the image of the deceased 521A can be visually recognized as a three-dimensional image floating in space.
The other configurations and actions and effects are the same as in Example 7.

以上、実施例のごとく本発明の具体例を詳細に説明したが、これらの具体例は、特許請求の範囲に包含される技術の一例を開示しているにすぎない。言うまでもなく、具体例の構成や数値等によって、特許請求の範囲が限定的に解釈されるべきではない。特許請求の範囲は、公知技術や当業者の知識等を利用して前記具体例を多様に変形、変更あるいは適宜組み合わせた技術を包含している。 Although the specific examples of the present invention have been described in detail as in the examples, these specific examples merely disclose an example of the technology included in the claims. Needless to say, the scope of claims should not be construed in a limited manner depending on the composition and numerical values of specific examples. The scope of claims includes technologies that are variously modified, modified, or appropriately combined with the specific examples by utilizing known technologies, knowledge of those skilled in the art, and the like.

１デジタルポートレート（出力装置）
１Ｓ動画出力システム
１３筐体
１３３ＵＳＢ端子
２０１故人モデル化部
２０２動画生成部
２０３話者情報抽出部
２０４音声変換部
２０５音声認識部
２１液晶ディスプレイ
２１０表示画面
２２１内蔵カメラ
２２２内蔵マイク
２２３内蔵スピーカ
２４ハードディスク（ＨＤ）
２４０故人情報記憶部
３００オペレータ
３１外部マイク（集音マイク）
３２外部スピーカ
３３外部モニタ
３９外付メモリ
３９１ＵＳＢメモリ
５立体表示部
５０ハーフミラー
５１１Ａ、５２１Ａ立体像 1 Digital portrait (output device)
1S video output system 13 housing 133 USB terminal 201 deceased modeling unit 202 video generation unit 203 speaker information extraction unit 204 voice conversion unit 205 voice recognition unit 21 liquid crystal display 210 display screen 221 built-in camera 222 built-in microphone 223 built-in speaker 24 hard disk (HD)
240 Deceased information storage unit 300 Operator 31 External microphone (sound collecting microphone)
32 External speaker 33 External monitor 39 External memory 391 USB memory 5 Stereoscopic display 50 Half mirror 511A, 521A Stereoscopic image

本発明の動画出力システムは、集音マイクを介して入力された音声を加工し、声色及び話し方の少なくともいずれかが異なる音声に変換して出力可能である。さらに、この動画出力システムは、年代の異なる同一の人物の顔が撮像された複数の静止画を元にして、音声に同期して変化する同一人物の年代別の動画を生成する。この動画出力システムによれば、同一人物の年代別の動画を音声と共に出力可能である。 The moving image output system of the present invention can process a voice input via a sound collecting microphone, convert it into a voice having at least one of a different voice color and speaking style, and output it. Furthermore, the video output system, face different same person the ages based on a plurality of still image captured to generate the age of video of the same person which varies in synchronization with the audio. According to this video output system can output the age of video of the same person to the voice and co.

Claims

A video that produces a video that includes at least one of facial expression changes, eye blinks, mouth movements, and face orientation changes by applying image processing to a still image of a human face. Generation part and
A voice conversion unit that processes voice input via a sound collecting microphone that converts sound into an electric signal and converts it into voice that has at least one of voice color and speaking style different.
Including a moving image generated by the moving image generation unit and an output device for outputting the sound converted by the audio conversion unit.
The moving image generation unit is a moving image output system that generates a moving image that changes in synchronization with the sound output by the output device.

The voice conversion unit includes a speaker information storage unit that stores speaker information capable of specifying a voice color, and the voice conversion unit is configured to convert voice into a voice color specified by the speaker information stored by the speaker information storage unit. The moving image output system according to claim 1.

Claim 1 or 2 includes a speaker information storage unit that stores speaker information capable of specifying a speaking style, and the voice conversion unit is configured to convert voice into a speaking style specified by the speaker information. Video output system described in.

It includes a voice pattern storage unit that stores in advance a voice pattern for realizing a dialogue with a question voice input via a sound collecting microphone, and a voice recognition unit that specifies the meaning and content of the question voice.
The voice conversion unit processes the voice related to the voice pattern and processes the voice.
The output device according to any one of claims 1 to 3, wherein the output device is configured to output a voice related to a processed voice pattern while corresponding to the meaning and content of the question voice specified by the voice recognition unit. Described video output system.

The still image acquisition unit for acquiring the still image is included.
The still image acquired by the still image acquisition unit is transmitted to the moving image generation unit in association with the identification information of the output device of the output destination of the moving image based on the still image.
The moving image output system according to any one of claims 1 to 4, wherein the moving image generated by the moving image generating unit is transmitted to an output device relating to identification information associated with the original still image.

The moving image output system according to any one of claims 1 to 5, further comprising a three-dimensional display unit that displays the moving image generated by the moving image generating unit in three dimensions.

The moving image output system according to claim 6, wherein the stereoscopic display unit includes a light source located on the back side of a stereoscopic image which is a three-dimensional display of a moving image.