JP5531654B2

JP5531654B2 - Control information generating apparatus and shape control apparatus

Info

Publication number: JP5531654B2
Application number: JP2010024346A
Authority: JP
Inventors: 橘　　誠
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2010-02-05
Filing date: 2010-02-05
Publication date: 2014-06-25
Anticipated expiration: 2030-02-05
Also published as: JP2011164763A

Description

本発明は、口を模した部分を持つ装置における口の形状を音声の発音にあわせて制御する技術に関する。 The present invention relates to a technique for controlling the shape of a mouth in a device having a mouth-like part in accordance with the pronunciation of a voice.

話をすることができる人型のロボットやＣＧ（Computer Graphics）において、声の発音と口の動きとを連動させること（リップシンク）により、本当に話をしているかのように見せる技術がある。リップシンクをＣＧに適用した一例として、特許文献１には、音節の母音の種類に対応した口の形状を持つ画像を示す画像データを予め記憶しておき、音節の発音に対応して画像を表示させる技術が記載されている。また、リップシンクをロボットに適用した例として、非特許文献１に開示された技術がある。 In humanoid robots and CG (Computer Graphics) that can talk, there is a technology that makes it appear as if you are really talking by linking voice pronunciation and mouth movement (lip sync). As an example of applying lip sync to CG, Patent Document 1 stores in advance image data indicating an image having a mouth shape corresponding to the type of vowel of a syllable, and images corresponding to syllable pronunciation are stored in advance. The technology to display is described. Further, as an example in which lip sync is applied to a robot, there is a technique disclosed in Non-Patent Document 1.

特開平１１−６６３４５号公報Japanese Patent Laid-Open No. 11-66345

中岡慎一郎，金広文男，三浦郁奈子，森澤光晴，藤原清司，金子健二，梶田秀司，比留川博久：サイバネティックヒューマンＨＲＰ−４Ｃの開発 −顔動作作成システム−，第２７回日本ロボット学会学術講演会予稿集，２００９．Shinichiro Nakaoka, Fumio Kinhiro, Reiko Miura, Mitsuharu Morisawa, Kiyoji Fujiwara, Kenji Kaneko, Shuji Kajita, Hirohisa Hiragawa: Development of Cybernetic Human HRP-4C-Proceedings of the 27th Annual Conference of the Robotics Society of Japan , 2009.

上述の先行技術文献によれば、音節の母音に対応して口の形状を変化させることができるが、現実には口の形状は様々に変化し、発音される音節の母音が変化して口の形状が変化する。そして、この変化は、見た目の自然さに与える影響は大きい。しかしながら、上述の先行技術文献によっては、現実には細かい制御を行うことはできなかった。
特許文献１に記載された技術においては、発音される音節に対応する画像を発音と同時に切り替えて表示する。そのため、発音と同時に表示される画像が突然切り替わることになり、視聴者からは不自然な動きに見えてしまう。
非特許文献１に記載された技術においては、自然な動きで口の形状を変化させるように、音節に応じた変化の態様を予め設定しておくことができる。しかしながら、この設定者は、発音される音節と口の形状の変化との時間的関係を予め調整しておく必要があり、煩雑なものとなっていた。 According to the above-mentioned prior art documents, the mouth shape can be changed corresponding to the syllable vowel, but in reality, the mouth shape changes variously, and the syllable syllable vowel changes to change the mouth. The shape changes. This change has a great influence on the natural appearance. However, depending on the above-described prior art documents, it has not been possible to actually perform fine control.
In the technique described in Patent Document 1, an image corresponding to a syllable to be pronounced is switched and displayed simultaneously with pronunciation. For this reason, the image displayed simultaneously with the pronunciation is suddenly switched, and the viewer looks unnatural.
In the technique described in Non-Patent Document 1, a change mode corresponding to the syllable can be set in advance so that the shape of the mouth is changed by natural movement. However, this setter needs to adjust in advance the temporal relationship between the syllable to be pronounced and the change in the shape of the mouth, which is complicated.

本発明は、上述の事情に鑑みてなされたものであり、音節の発音内容を規定した発音情報に基づいて、口を模した部分を持つ装置における口の形状を変化させる制御を行いながらも、口の形状の変化を自然なものにすることを目的とする。 The present invention has been made in view of the above circumstances, while performing control to change the shape of the mouth in a device having a mouth-like part based on the pronunciation information that defines the pronunciation content of the syllable, The purpose is to make the mouth shape change natural.

上述の課題を解決するため、本発明は、複数の音節および前記各音節の発音タイミングを規定した発音情報に基づいて発音する発音装置の発音内容に対応して口を模した部分の形状が変化するように制御される装置の当該口の形状を制御するために用いられる制御情報を生成する制御情報生成装置であって、前記発音情報を取得する発音情報取得手段と、前記口の形状を示す形状情報、前記口の形状を当該形状情報が示す形状にするタイミングを示すタイミング情報、および当該形状が変わるときにおける変化前から変化後までの遷移時間を示す遷移情報を有する制御情報を、前記発音情報取得手段によって取得された発音情報に基づいて生成する生成手段であって、前記発音情報が示す音節の母音に応じて前記形状情報を決定し、当該音節の発音タイミングに応じて前記タイミング情報を決定するとともに、予め決められたアルゴリズムに従って前記遷移情報を決定することによって、前記制御情報を生成する生成手段とを具備し、前記生成手段は、一の前記形状情報が示す形状と、次の前記形状情報が示す形状とが同一であり、それぞれの形状情報に係るタイミング情報が示すタイミング間の時間が予め決められた時間より長い場合には、当該形状とは異なる前記口の形状を示す形状情報と、当該タイミング間のいずれかのタイミングを示す、当該形状情報に係るタイミング情報とを決定することを特徴とする制御情報生成装置を提供する。 In order to solve the above-described problem, the present invention changes the shape of the portion simulating the mouth corresponding to the pronunciation content of the pronunciation device that produces sound based on the pronunciation information that defines the pronunciation timing of each syllable and each syllable. A control information generating device for generating control information used to control the shape of the mouth of the device controlled to perform the pronunciation information acquisition means for acquiring the pronunciation information and the shape of the mouth Control information having shape information, timing information indicating the timing when the mouth shape is changed to the shape indicated by the shape information, and control information including transition information indicating a transition time from before the change to when the shape is changed Generation means for generating based on the pronunciation information acquired by the information acquisition means, determining the shape information according to the vowel of the syllable indicated by the pronunciation information, and the syllable And determines the timing information in accordance with the tone generation timing, by determining the transition information in accordance with a predetermined algorithm, and a generation means for generating the control information, the generating means, one said shape If the shape indicated by the information is the same as the shape indicated by the next shape information and the time between timings indicated by the timing information related to each shape information is longer than a predetermined time, the shape is There is provided a control information generation device characterized by determining shape information indicating different mouth shapes and timing information related to the shape information indicating any timing between the timings .

また、別の好ましい態様において、前記遷移情報は、前記口の形状の変化を開始するタイミングを示す情報を有し、前記遷移情報が示す遷移時間は、当該情報が示すタイミングから当該変化した後の形状にするタイミングまでを示すことを特徴とする。 Moreover, in another preferable aspect, the transition information includes information indicating a timing for starting the change in the shape of the mouth, and the transition time indicated by the transition information is after the change from the timing indicated by the information. It is characterized by showing the timing up to the shape.

また、別の好ましい態様において、前記生成手段は、前記発音情報が示す音節の子音が特定の子音であるときには、当該子音に応じた前記口の形状を示す形状情報、および当該形状情報に係るタイミング情報をさらに有する制御情報を生成し、当該タイミング情報を当該音節の母音に応じた形状情報が示す形状にするタイミングより予め決められた時間だけ前のタイミングとして決定することを特徴とする。 In another preferable aspect, when the consonant of the syllable indicated by the pronunciation information is a specific consonant, the generation unit includes shape information indicating the shape of the mouth according to the consonant, and timing related to the shape information. Control information that further includes information is generated, and the timing information is determined as a timing that is a predetermined time before the timing indicated by the shape information corresponding to the vowel of the syllable.

また、本発明は、口を模した部分を有する装置における当該口の形状を制御する形状制御装置であって、発音させる複数の音節および発音タイミングを規定した発音情報に基づいて、発音装置に対して発音内容の制御を行う発音制御手段と、前記発音制御手段によって前記発音情報に基づいて発音制御が行われているときに、当該発音情報に基づいて前記口の形状を示す形状情報を出力する出力手段であって、当該発音情報が示す音節の母音に応じて前記形状情報を決定し、当該音節の発音タイミングより予め決められた遷移時間前に当該形状情報を出力する出力手段と、前記出力手段から形状情報が出力されると、前記遷移時間をかけて前記口の形状を当該形状情報が示す形状に変化させるように、前記装置を制御する形状制御手段とを具備し、前記出力手段は、一の前記形状情報が示す形状と、次の前記形状情報が示す形状とが同一であり、それぞれの形状情報に係る発音タイミング間の時間が予め決められた時間より長い場合には、当該形状とは異なる前記口の形状を示す形状情報を決定し、当該発音タイミング間のいずれかのタイミングを当該形状情報に係る発音タイミングとして決定することを特徴とする形状制御装置を提供する。 Further, the present invention is a shape control device for controlling the shape of the mouth in a device having a part simulating the mouth, and is based on pronunciation information defining a plurality of syllables to be pronounced and a pronunciation timing. Sound generation control means for controlling the sound content, and when the sound generation control is performed based on the sound generation information by the sound generation control means, shape information indicating the shape of the mouth is output based on the sound generation information Output means for determining the shape information according to the vowel of the syllable indicated by the pronunciation information, and outputting the shape information before a transition time determined in advance from the pronunciation timing of the syllable; and the output When shape information is output from the means, shape control means for controlling the device so as to change the shape of the mouth to the shape indicated by the shape information over the transition time. In the output means, the shape indicated by one of the shape information and the shape indicated by the next shape information are the same, and the time between sound generation timings related to each shape information is longer than a predetermined time. If determines the shape information indicating the shape of said different port from that of the shape, the shape control device one of the timing between the sound generation timing and determines the sounding timing related to the shape information provide.

本発明によれば、音節の発音内容を規定した発音情報に基づいて、口を模した部分を持つ装置における口の形状を変化させる制御を行いながらも、口の形状の変化を自然なものにすることができる。 According to the present invention, while performing control to change the shape of the mouth in a device having a mouth-like part based on the pronunciation information that defines the content of pronunciation of the syllable, the change in the shape of the mouth is made natural. can do.

第１実施形態に係る形状制御システム１の構成を示すブロック図である。It is a block diagram which shows the structure of the shape control system 1 which concerns on 1st Embodiment. 第１実施形態に係る発音情報の一例を説明する図である。It is a figure explaining an example of the pronunciation information which concerns on 1st Embodiment. 第１実施形態に係る形状対応情報を説明する図である。It is a figure explaining the shape corresponding | compatible information which concerns on 1st Embodiment. 第１実施形態に係る形状情報を説明する図である。It is a figure explaining the shape information which concerns on 1st Embodiment. 第１実施形態に係る形状制御機能の構成を示すブロック図である。It is a block diagram which shows the structure of the shape control function which concerns on 1st Embodiment. 図２に示す発音情報に基づいて生成される制御情報を説明する図である。It is a figure explaining the control information produced | generated based on the pronunciation information shown in FIG. 第１実施形態に係る発音情報および制御情報を用いた場合の発音装置および形状変化装置の動作を説明する図である。It is a figure explaining operation | movement of the sounding device and shape change apparatus at the time of using the sounding information and control information which concern on 1st Embodiment. 第１実施形態に係る発音情報の他の例を説明する図である。It is a figure explaining the other example of the pronunciation information which concerns on 1st Embodiment. 図８に示す発音情報に基づいて生成される制御情報の他の例を説明する図である。It is a figure explaining the other example of the control information produced | generated based on the pronunciation information shown in FIG. 図８に示す発音情報および図９に示す制御情報の例を用いた場合の発音装置および形状変化装置の動作を説明する図である。It is a figure explaining operation | movement of the sounding device and shape change apparatus at the time of using the example of the sounding information shown in FIG. 8, and the control information shown in FIG. 第２実施形態に係る形状制御機能の構成を示すブロック図である。It is a block diagram which shows the structure of the shape control function which concerns on 2nd Embodiment. 図２に示す発音情報に基づいて出力部から出力される内容の一例を説明する図である。It is a figure explaining an example of the content output from an output part based on the pronunciation information shown in FIG. 図２に示す発音情報および図１２に示す出力内容を用いた場合の発音装置および形状変化装置の動作を説明する図である。It is a figure explaining the operation | movement of a sounding device and a shape change apparatus at the time of using the sounding information shown in FIG. 2, and the output content shown in FIG. 変形例１に係る制御情報の一例を説明する図である。It is a figure explaining an example of the control information concerning the modification 1. 図２に示す発音情報および図１４に示す制御情報を用いた場合の発音装置および形状変化装置の動作を説明する図である。It is a figure explaining the operation | movement of a sounding device and a shape change apparatus at the time of using the sounding information shown in FIG. 2, and the control information shown in FIG. 変形例２に係る発音情報の一例を説明する図である。It is a figure explaining an example of the pronunciation information which concerns on the modification 2. FIG. 図１６に示す発音情報に基づいて生成される目蓋制御情報を説明する図である。It is a figure explaining the eyelid control information produced | generated based on the pronunciation information shown in FIG. 図１６に示す発音情報および図１７に示す目蓋制御情報を用いた場合の発音装置２０および形状変化装置３０の動作を説明する図である。It is a figure explaining operation | movement of the sounding apparatus 20 and the shape change apparatus 30 at the time of using the sounding information shown in FIG. 16, and the eyelid control information shown in FIG. 変形例２に係る形状情報を説明する図である。It is a figure explaining the shape information which concerns on the modification 2.

＜第１実施形態＞
[全体構成]
図１は、本発明の第１実施形態に係る形状制御システム１の構成を示すブロック図である。形状制御システム１は、言葉を発音する発音装置２０、口を模した部分の形状が変化する形状変化装置３０、および発音装置２０の発音内容と形状変化装置３０における形状の変化内容とを制御する形状制御装置１０を有する。形状制御装置１０は、発音装置２０からの発音内容に対応して、形状変化装置３０における口の形状が変化するように制御する。 <First Embodiment>
[overall structure]
FIG. 1 is a block diagram showing a configuration of a shape control system 1 according to the first embodiment of the present invention. The shape control system 1 controls the sound generation device 20 that pronounces words, the shape change device 30 that changes the shape of the mouth-like portion, and the sound content of the sound device 20 and the shape change content in the shape change device 30. A shape control device 10 is included. The shape control device 10 performs control so that the shape of the mouth in the shape change device 30 changes in accordance with the sound production content from the sound production device 20.

発音装置２０は、形状制御装置１０からの制御に従って音声波形信号を生成する音源やＤＳＰ（Digital Signal Processor）、および生成した音声波形信号を放音するスピーカを有する。形状変化装置３０は、この例においては、人型のロボットであり、その顔の部分に口を模した部分（以下、単に「口」という）を有する。形状変化装置３０は、形状制御装置１０からの制御により、この口の形状を変化させる。次に、形状制御装置１０の各構成について説明する。 The sound generation device 20 includes a sound source and a DSP (Digital Signal Processor) that generate a speech waveform signal according to control from the shape control device 10, and a speaker that emits the generated speech waveform signal. In this example, the shape changing device 30 is a humanoid robot, and has a part imitating a mouth (hereinafter simply referred to as “mouth”) on the face part. The shape changing device 30 changes the shape of the mouth under the control of the shape control device 10. Next, each structure of the shape control apparatus 10 is demonstrated.

[形状制御装置１０の構成]
形状制御装置１０は、制御部１１、記憶部１２、操作部１３、表示部１４およびインターフェイス１５を有する。これらの各要素は、バスを介して互いに接続されている。
制御部１１は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）などを有する。ＣＰＵは、ＲＯＭに記憶されている制御プログラムをＲＡＭにロードして実行することにより、形状制御装置１０の各部について、バスを介して制御し、各種機能を実現する。また、制御部１１は、記憶部１２、ＲＯＭなどに記憶された形状制御プログラムを実行することにより、発音装置２０と形状変化装置３０とを制御するための形状制御機能を実現する。ＲＡＭは、ＣＰＵが各データの加工などを行う際のワークエリアとしても機能する。 [Configuration of Shape Control Device 10]
The shape control device 10 includes a control unit 11, a storage unit 12, an operation unit 13, a display unit 14, and an interface 15. Each of these elements is connected to each other via a bus.
The control unit 11 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The CPU loads various control programs stored in the ROM into the RAM and executes them, thereby controlling each part of the shape control device 10 via the bus and realizing various functions. Moreover, the control part 11 implement | achieves the shape control function for controlling the sound generator 20 and the shape change apparatus 30 by running the shape control program memorize | stored in the memory | storage part 12, ROM, etc. FIG. The RAM also functions as a work area when the CPU processes each data.

記憶部１２は、例えば、不揮発性メモリ、ハードディスクなどの記憶手段であって、発音情報、形状対応情報、形状制御機能において生成される制御情報を記憶する。上述した制御プログラムをＲＯＭの代わりに記憶していてもよい。なお、記憶部１２は、外付けの不揮発性メモリなどの記憶手段を、接続インターフェイスなどを介して接続したものであってもよい。記憶部１２に記憶された発音情報、形状対応情報について説明する。 The storage unit 12 is, for example, a storage unit such as a nonvolatile memory or a hard disk, and stores sound generation information, shape correspondence information, and control information generated by the shape control function. The control program described above may be stored instead of the ROM. The storage unit 12 may be a storage unit such as an external non-volatile memory connected via a connection interface or the like. The pronunciation information and shape correspondence information stored in the storage unit 12 will be described.

図２は、第１実施形態に係る発音情報の一例を説明する図である。発音情報は、発音装置２０において発音させる内容を規定する情報である。発音情報は、図２に示すように、発音させる言葉を音節ごとに分けて、各音節の発音の「タイミング」、発音の「時間」（ｓｅｃ．）、「音高」（ＭＩＤＩ（Musical Instrument Digital Interface）データなどにおけるノートナンバに対応）が規定されている。発音のタイミングは、発音開始指示からの経過時間（ｓｅｃ．）で示したものである。なお、図２に示す発音情報においては、音節に対応する「文字」、その文字の音素（「子音」、「母音」）についても記載されているが、これらは各音節の内容がわかりやすいように記載したものである。したがって、発音情報は、音節に対応するこれらの情報全てを有する必要はなく、音節を特定できる情報を有していればよい。 FIG. 2 is a diagram illustrating an example of pronunciation information according to the first embodiment. The pronunciation information is information that defines the content to be generated by the sound generation device 20. As shown in FIG. 2, the pronunciation information is divided into syllable words, and “timing”, “time” (sec.), “Pitch” (MIDI (Musical Instrument Digital) Interface) corresponds to the note number in the data etc.). The sound generation timing is indicated by the elapsed time (sec.) From the sound generation start instruction. In the pronunciation information shown in FIG. 2, the “character” corresponding to the syllable and the phoneme of the character (“consonant”, “vowel”) are also described, but these are easy to understand the contents of each syllable. It is described. Therefore, the pronunciation information does not need to have all of these pieces of information corresponding to syllables, and may have information that can specify a syllable.

図３は、第１実施形態に係る形状対応情報を説明する図である。形状対応情報は、音素と形状変化装置３０の口の形状を示す形状情報とを対応付けた情報である。形状情報に対応付けられた音素としては、各母音（「ａ」「ｉ」「ｕ」「ｅ」「ｏ」）、子音の一部（「ｍ」「ｐ」「ｂ」、など、以下、まとめて「唇音」という）および文字「ん」に対応する「（ｎ）」である。なお、「（ｎ）」は「な行」に対応する子音「ｎ」とは異なるものであることを明確にするため、図３においては、便宜上、母音として扱っている。形状情報は、この例においては、６種類（Ａ、Ｉ、Ｕ、Ｅ、Ｏ、Ｎ）が設定されている。形状情報「Ａ」、「Ｉ」、「Ｕ」、「Ｅ」、「Ｏ」は、母音「ａ」「ｉ」「ｕ」「ｅ」「ｏ」に対応する。また、形状情報Ｎは、唇音および「（ｎ）」に対応している。 FIG. 3 is a diagram illustrating shape correspondence information according to the first embodiment. The shape correspondence information is information in which phonemes are associated with shape information indicating the shape of the mouth of the shape changing device 30. As phonemes associated with the shape information, each vowel (“a” “i” “u” “e” “o”), a part of consonant (“m” “p” “b”), etc. "(N)" corresponding to the character "n". In order to clarify that “(n)” is different from the consonant “n” corresponding to “na row”, it is treated as a vowel for convenience in FIG. In this example, six types of shape information (A, I, U, E, O, N) are set. The shape information “A”, “I”, “U”, “E”, “O” corresponds to the vowels “a”, “i”, “u”, “e”, and “o”. The shape information N corresponds to the lip sound and “(n)”.

図４は、第１実施形態に係る形状情報を説明する図である。図４に示すように、形状情報「Ａ」、「Ｉ」、「Ｕ」、「Ｅ」、「Ｏ」、「Ｎ」は、それぞれ「あ」、「い」、「う」、「え」、「お」、「ん」に対応した口の形状を示す情報である。 FIG. 4 is a diagram for explaining shape information according to the first embodiment. As shown in FIG. 4, the shape information “A”, “I”, “U”, “E”, “O”, “N” are “a”, “i”, “u”, “e”, respectively. , “O”, “N” are information indicating the shape of the mouth.

図１に戻って説明を続ける。操作部１３は、キーボード、マウス、表示部１４の表示画面の表面部分に設けられたタッチセンサなどの操作手段である。操作部１３は、利用者の操作によりその操作内容を示す操作データを制御部１１に出力する。操作内容としては、例えば、発音を開始する指示、制御情報を生成する指示などである。
表示部１４は、制御部１１の制御により画像を表示する表示画面を有する液晶ディスプレイなどの表示手段である。 Returning to FIG. 1, the description will be continued. The operation unit 13 is an operation unit such as a keyboard, a mouse, or a touch sensor provided on the surface portion of the display screen of the display unit 14. The operation unit 13 outputs operation data indicating the operation content to the control unit 11 by a user operation. The operation content includes, for example, an instruction to start sound generation and an instruction to generate control information.
The display unit 14 is a display unit such as a liquid crystal display having a display screen for displaying an image under the control of the control unit 11.

インターフェイス１５は、例えば、外部装置と有線接続する接続端子、無線接続する無線接続手段、基地局やネットワークなどを介して接続する通信手段などであって、接続した外部装置と各種データの送受信を行う。インターフェイス１５は、この例においては、発音装置２０および形状変化装置３０が接続される。これにより、形状制御装置１０は、発音装置２０および形状変化装置３０と接続して、それぞれを制御する。
以上が、形状制御装置１０の各部の構成についての説明である。 The interface 15 is, for example, a connection terminal for wired connection with an external device, a wireless connection unit for wireless connection, a communication unit for connection via a base station, a network, or the like, and transmits and receives various data to and from the connected external device. . In this example, the sound generator 20 and the shape changing device 30 are connected to the interface 15. Thereby, the shape control device 10 is connected to the sounding device 20 and the shape changing device 30 and controls each of them.
The above is the description of the configuration of each part of the shape control device 10.

[形状制御機能の構成]
次に、制御部１１が形状制御プログラムを実行することによって実現される形状制御機能について図５を用いて説明する。なお、以下に説明する形状制御機能を実現する各構成については、その一部または全部をハードウエアにより実現してもよい。 [Configuration of shape control function]
Next, the shape control function realized by the control unit 11 executing the shape control program will be described with reference to FIG. In addition, about each structure which implement | achieves the shape control function demonstrated below, you may implement | achieve part or all by hardware.

図５は、第１実施形態に係る形状制御機能の構成を示すブロック図である。制御部１１は、形状制御プログラムを実行することにより、発音情報取得部１１１、発音制御部１１２、生成部１１３および形状制御部１１４を構成して、形状制御機能を実現する。
発音情報取得部１１１は、利用者の操作などによる指示に応じて、記憶部１２から発音情報を取得する。発音情報取得部１１１は、この指示が制御情報の生成指示を含む場合には、取得した発音情報を生成部１１３に出力し、この指示が発音を開始する指示を含む場合には、発音制御部１１２に出力する。 FIG. 5 is a block diagram showing the configuration of the shape control function according to the first embodiment. The control unit 11 executes the shape control program to configure the pronunciation information acquisition unit 111, the pronunciation control unit 112, the generation unit 113, and the shape control unit 114, thereby realizing a shape control function.
The pronunciation information acquisition unit 111 acquires the pronunciation information from the storage unit 12 in response to an instruction by a user operation or the like. The pronunciation information acquisition unit 111 outputs the acquired pronunciation information to the generation unit 113 when the instruction includes an instruction to generate control information. When the instruction includes an instruction to start pronunciation, the pronunciation control unit To 112.

発音制御部１１２は、発音情報取得部１１１から取得した発音情報に応じた発音内容で発音させるように、インターフェイス１５を介して発音装置２０を制御する。発音情報に応じた発音内容で発音させるとは、発音情報が示す音節を音声合成技術により音声波形信号に変換させ、その音節に対応したタイミング、時間、音高で、その音声波形信号を放音させることをいう。この音声合成技術は、公知の様々な技術を用いることができるが、この例においては、特許第３８７９４０２号公報に記載された技術を用いる。この技術においては、発音のタイミングは、母音の発音のタイミングを示すものである。そのため、発音制御部１１２は、子音を音素として有する音節においては、予め決められた時間（この例においては、０．０１秒とする）だけ前に子音に対応した音声波形信号が放音されるように発音装置２０を制御する。 The sound generation control unit 112 controls the sound generation device 20 via the interface 15 so as to generate sound with the sound generation content corresponding to the sound generation information acquired from the sound generation information acquisition unit 111. To generate sound with the content corresponding to the pronunciation information, the syllable indicated by the pronunciation information is converted into a speech waveform signal using speech synthesis technology, and the speech waveform signal is emitted at the timing, time, and pitch corresponding to the syllable. It means to make it. Various known techniques can be used for this speech synthesis technique. In this example, the technique described in Japanese Patent No. 3879402 is used. In this technique, the sound generation timing indicates the sound generation timing of the vowel. For this reason, in the syllable having consonants as phonemes, the sound generation control unit 112 emits a sound waveform signal corresponding to the consonant before a predetermined time (in this example, 0.01 seconds). Thus, the sound generation device 20 is controlled.

また、発音制御部１１２は、発音情報におけるどの部分について発音させているかを示す同期情報を形状制御部１１４に対して出力する。これにより、発音制御部１１２における制御と形状制御部１１４における制御が同期して行えるようにする。なお、発音制御部１１２または形状制御部１１４は、それぞれにおける制御遅延を考慮して、同期情報を補正してもよい。 Further, the sound generation control unit 112 outputs synchronization information indicating which part of the sound generation information is sounded to the shape control unit 114. Thereby, the control in the sound generation control unit 112 and the control in the shape control unit 114 can be performed in synchronization. Note that the sound generation control unit 112 or the shape control unit 114 may correct the synchronization information in consideration of the control delay in each.

生成部１１３は、発音情報取得部１１１から取得した発音情報、および記憶部１２に記憶されている形状対応情報に基づいて、制御情報を生成して記憶部１２に記憶する。この制御情報は、形状情報、その形状情報が示す口の形状にするタイミング、および口の形状が変わるときにおける変化前から変化後までの遷移時間を規定した情報である。これらの各情報は、生成部１１３が発音情報および形状対応情報に基づいて決定する。なお、このタイミングは、発音情報と同様に発音開始指示からの経過時間を示すものである。 The generation unit 113 generates control information based on the pronunciation information acquired from the pronunciation information acquisition unit 111 and the shape correspondence information stored in the storage unit 12 and stores the control information in the storage unit 12. This control information is information defining the shape information, the timing of the mouth shape indicated by the shape information, and the transition time from before the change to after the change when the mouth shape changes. Each of these pieces of information is determined by the generation unit 113 based on the pronunciation information and the shape correspondence information. This timing indicates the elapsed time from the sound generation start instruction as with the sound generation information.

生成部１１３は、制御情報を生成するときには、まず、初期状態として形状情報Ｎとタイミング「０．００」とを対応付ける。そして、生成部１１３は、発音情報を参照して音節ごとに対応する形状情報を決定し、その音節に対応するタイミングを、形状を変化させるタイミングとする。このとき、生成部１１３は、音節が子音を有する場合には、形状を変化させるタイミングを０．０１秒早いタイミングに修正する。このタイミングの修正は、上述したように、発音制御部１１２における制御において、子音がある場合には、予め決められた時間だけ前に子音に対応する音声波形信号を放音することになるために行われる。 When generating the control information, the generation unit 113 first associates the shape information N with the timing “0.00” as an initial state. Then, the generation unit 113 refers to the pronunciation information, determines shape information corresponding to each syllable, and sets the timing corresponding to the syllable as the timing to change the shape. At this time, when the syllable has a consonant, the generation unit 113 corrects the timing of changing the shape to a timing that is 0.01 seconds earlier. As described above, this timing correction is because, in the control in the sound generation control unit 112, when there is a consonant, a sound waveform signal corresponding to the consonant is emitted a predetermined time before. Done.

また、生成部１１３は、発音情報に規定されたタイミング、時間から、音節が発音されない期間が存在する場合には、その期間は口を閉じている状態にするために、その期間の開始タイミングと、形状情報Ｎとを対応付けて、制御情報に追加する。 In addition, when there is a period in which the syllable is not pronounced from the timing and time specified in the pronunciation information, the generation unit 113 sets the start timing of the period to make the mouth closed. The shape information N is associated with the control information and added to the control information.

生成部１１３は、さらに、初期値以外の各形状情報に対応して、予め決められたアルゴリズムに従って、遷移時間を決定する。この例においては、生成部１１３は、音節が発音されない期間として口を閉じている状態にしたり、口を閉じている状態から空けた状態にしたりする場合には遷移時間を０．０１秒、それ以外の口の形状を変化させる場合には遷移時間を０．０２秒として決定するものである。
なお、このアルゴリズムは、様々なものとすることができる。例えば、生成部１１３は、口の形状が変化する場合には、全て同じ遷移時間として決定するようにしてもよいし、予め決められた範囲内でランダムに遷移時間を決定してもよい。全て同一の遷移時間である場合には、形状制御部１１４あるいは形状変化装置３０に予め設定しておき、制御情報に遷移時間を示す情報が含まれていなくてもよい。また、生成部１１３は、発音情報における時間や音節の種類などに応じて変化するようにして遷移時間を決定してもよいし、利用者の操作によって指定された時間を遷移時間として決定してもよい。 The generation unit 113 further determines a transition time according to a predetermined algorithm corresponding to each shape information other than the initial value. In this example, the generation unit 113 sets the transition time to 0.01 seconds when the mouth is closed as a period during which no syllable is pronounced or when the mouth is closed from the closed state. When changing the shape of the other mouth, the transition time is determined as 0.02 seconds.
It should be noted that this algorithm can be various. For example, when the shape of the mouth changes, the generation unit 113 may determine all transition times as the same transition time, or may randomly determine transition times within a predetermined range. When all the transition times are the same, it may be set in advance in the shape control unit 114 or the shape change device 30, and information indicating the transition time may not be included in the control information. In addition, the generation unit 113 may determine the transition time so as to change according to the time in the pronunciation information, the type of syllable, or the time specified by the user's operation as the transition time. Also good.

図６は、図２に示す発音情報に基づいて生成される制御情報を説明する図である。図６に示す制御情報は、図２に示す発音情報、および図３に示す形状対応情報に基づいて生成されたものである。図２に示す発音情報が示す音節のうち、文字「さ」および「た」については子音があり、「い」については子音がない。そのため、例えば、「さ」に対応する形状情報「Ａ」については、タイミングが「０．２」ではなく０．０１秒早いタイミングの「０．１９」として決定されている。また、遷移時間は、上述のアルゴリズムに従って決定され、直前の形状情報が「Ｎ」のタイミング「０．１９」の形状情報「Ａ」、およびタイミング「２．００」の形状情報「Ｎ」についてが「０．０１」、それ以外が「０．０２」として決定されている。 FIG. 6 is a diagram for explaining control information generated based on the pronunciation information shown in FIG. The control information shown in FIG. 6 is generated based on the pronunciation information shown in FIG. 2 and the shape correspondence information shown in FIG. Among the syllables indicated by the pronunciation information shown in FIG. 2, the characters “sa” and “ta” have consonants, and “i” has no consonants. Therefore, for example, the shape information “A” corresponding to “sa” is determined not to be “0.2” but “0.19”, which is 0.01 seconds earlier. The transition time is determined according to the algorithm described above, and the shape information “A” at the timing “0.19” when the previous shape information is “N” and the shape information “N” at the timing “2.00”. “0.01” and other values are determined as “0.02”.

図５に戻って説明を続ける。形状制御部１１４は、記憶部１２に記憶された制御情報に基づいて口の形状を変化させるように、インターフェイス１５を介して形状変化装置３０を制御する。この制御情報は、発音制御部１１２が制御に用いる発音情報に基づいて生成部１１３が生成した制御情報である。
このとき、形状制御部１１４は、発音制御部１１２から出力される同期情報により、発音制御部１１２が発音情報に基づいて行う発音装置２０の制御に同期して、形状変化装置３０の制御を行う。この同期とは、発音制御部１１２における発音情報に基づく制御と、形状制御部１１４における制御情報に基づく制御とが、同じ時間軸に沿って処理されることをいう。すなわち、発音情報が示すタイミングと制御情報が示すタイミングが同じ値である処理については、同じタイミングで行われる。 Returning to FIG. The shape control unit 114 controls the shape changing device 30 via the interface 15 so as to change the shape of the mouth based on the control information stored in the storage unit 12. This control information is control information generated by the generation unit 113 based on the pronunciation information used by the sound generation control unit 112 for control.
At this time, the shape control unit 114 controls the shape change device 30 in synchronization with the control of the sound generation device 20 performed by the sound generation control unit 112 based on the sound generation information by the synchronization information output from the sound generation control unit 112. . This synchronization means that the control based on the pronunciation information in the sound generation control unit 112 and the control based on the control information in the shape control unit 114 are processed along the same time axis. That is, the process in which the timing indicated by the pronunciation information and the timing indicated by the control information have the same value is performed at the same timing.

また、形状制御部１１４は、口の形状を変化させるときには、変化前の口の形状から変化後の形状までに遷移する時間が、制御情報に示される変化後の形状情報に対応した遷移時間となるように、形状変化装置３０を制御する。例えば、図６に示す制御情報において、タイミング「０．８」の形状情報「Ｉ」については、遷移時間「０．０２」である。従って、形状情報「Ａ」の口の形状から形状情報「Ｉ」の口の形状に変化するまでの遷移時間は、「０．０２秒」であることを示し、形状の変化が開始されるタイミングは、「０．８秒」より「０．０２秒」前の「０．７８秒」となる。すなわち、制御情報に示される「遷移時間」により、対応する口の形状へ変化する遷移時間だけでなく、制御情報に示される「タイミング」と組み合わせて、変化を開始するタイミングについても示すことになる。 When the shape control unit 114 changes the shape of the mouth, the transition time from the shape of the mouth before the change to the shape after the change is the transition time corresponding to the shape information after the change indicated in the control information. The shape changing device 30 is controlled so as to be. For example, in the control information shown in FIG. 6, the shape information “I” at timing “0.8” has a transition time “0.02”. Therefore, the transition time from the shape of the mouth of the shape information “A” to the shape of the mouth of the shape information “I” is “0.02 seconds”, and the timing at which the shape change is started Becomes “0.78 seconds” before “0.02 seconds” before “0.8 seconds”. That is, the “transition time” indicated in the control information indicates not only the transition time for changing to the corresponding mouth shape but also the timing for starting the change in combination with the “timing” indicated in the control information. .

また、形状制御部１１４は、変化前の口の形状から変化後の口の形状に至る遷移期間においては、徐々に口の形状が変化するように制御する。ここで、形状制御部１１４は、口の形状を徐々に変化させる場合には、遷移期間中の各タイミングにおける口の形状を、変化前後の口の形状から補間して決定する。なお、形状変化装置３０がこのような補間機能を備えている場合には、生成部１１３は、形状変化装置３０において形状を変化させるための制御に用いられる制御情報を生成し、形状制御部１１４は、徐々に変化させる処理を行わなくてもよい。
以上が、形状制御機能についての説明である。続いて、形状制御機能により制御される発音装置２０および形状変化装置３０の動作例について、図７を用いて説明する。 In addition, the shape control unit 114 performs control so that the shape of the mouth gradually changes during the transition period from the shape of the mouth before the change to the shape of the mouth after the change. Here, when the shape of the mouth is gradually changed, the shape control unit 114 interpolates and determines the shape of the mouth at each timing during the transition period from the shape of the mouth before and after the change. When the shape change device 30 has such an interpolation function, the generation unit 113 generates control information used for control for changing the shape in the shape change device 30, and the shape control unit 114. Does not have to perform the process of gradually changing.
This completes the description of the shape control function. Next, operation examples of the sound generation device 20 and the shape change device 30 controlled by the shape control function will be described with reference to FIG.

[動作例]
図７は、第１実施形態に係る発音情報および制御情報を用いた場合の発音装置２０および形状変化装置３０の動作を説明する図である。この例においては、発音情報は図２に示す内容、この発音情報により生成された制御情報は図６に示す内容であるものとして、説明する。
図７の上部に示されている音高と時刻との関係を示す図は、発音情報により規定されている内容を示すものである。その下に示されている「音素タイミング」については、発音装置２０から発音される内容を音素で表したものである。上述したように、子音を有する音節については、発音情報が示すタイミングで母音を示す音声波形信号が放音される。そして、子音を示す音声波形信号は、そのタイミングから予め決められた時間（０．０１秒）前に放音されるようになっている。 [Example of operation]
FIG. 7 is a diagram for explaining operations of the sound producing device 20 and the shape changing device 30 when the sound producing information and the control information according to the first embodiment are used. In this example, it is assumed that the pronunciation information has the contents shown in FIG. 2, and the control information generated from the pronunciation information has the contents shown in FIG.
The diagram showing the relationship between the pitch and the time shown in the upper part of FIG. 7 shows the contents defined by the pronunciation information. The “phoneme timing” shown below represents the content of the sound generated by the sound generator 20 in phonemes. As described above, for a syllable having a consonant, a speech waveform signal indicating a vowel is emitted at a timing indicated by pronunciation information. The speech waveform signal indicating the consonant is emitted before a predetermined time (0.01 seconds) from the timing.

音素タイミングの下に示されている「口の形状」については、形状変化装置３０の口の形状を形状情報により示している。例えば、「Ｎ」と表されている期間は、形状情報Ｎに対応して口を閉じた形状に制御されている期間である。また、ハッチングされている部分は、口の形状が変化する遷移期間を示す。ここで、図７において、遷移ＸＹは、形状情報Ｘが示す口の形状から形状情報Ｙが示す口の形状へ遷移する期間を示している。例えば、遷移ＮＡは、形状情報Ｎが示す口の形状から形状情報Ａが示す口の形状へ遷移する期間を示している。
また、「口の形状」の下に示されている形状情報については、制御情報が示す形状情報とタイミングとの関係を示している。 Regarding the “mouth shape” shown under the phoneme timing, the shape of the mouth of the shape changing device 30 is indicated by the shape information. For example, the period represented by “N” is a period in which the mouth is controlled to have a closed shape corresponding to the shape information N. A hatched portion indicates a transition period in which the shape of the mouth changes. Here, in FIG. 7, transition XY indicates a period of transition from the mouth shape indicated by the shape information X to the mouth shape indicated by the shape information Y. For example, the transition NA indicates a period of transition from the mouth shape indicated by the shape information N to the mouth shape indicated by the shape information A.
Further, the shape information shown under “Mouth shape” indicates the relationship between the shape information indicated by the control information and the timing.

このように、本発明の実施形態に係る形状制御システム１においては、発音装置２０および口を模した部分を有する形状変化装置３０を、形状制御装置１０を用いて制御する。形状制御装置１０は、発音情報に基づいて発音装置２０を制御するとともに、発音情報から制御情報を生成して、この制御情報を用いて形状変化装置３０を制御する。形状制御装置１０は、発音情報から制御情報を生成するときに、口の形状が変わるときにおける変化前から変化後までの遷移時間を示す遷移情報を、予め決められたアルゴリズムに従って決定する。そして、形状制御装置１０は、口の形状が変化させるときには、遷移時間で徐々にその形状が変化するように形状変化装置３０を制御する。
これにより、発音内容に応じて口の形状を自然に見えるように変化させることができる。また、形状制御装置１０は、このように形状を変化させるための制御に用いられる制御情報を生成することもできる。 As described above, in the shape control system 1 according to the embodiment of the present invention, the shape control device 10 is controlled by using the shape control device 10 having the sounding device 20 and the mouth-like part. The shape control device 10 controls the sound generation device 20 based on the sound generation information, generates control information from the sound generation information, and controls the shape changing device 30 using this control information. When generating the control information from the pronunciation information, the shape control device 10 determines transition information indicating a transition time from before the change when the mouth shape changes to after the change according to a predetermined algorithm. Then, when the shape of the mouth changes, the shape control device 10 controls the shape change device 30 so that the shape gradually changes with the transition time.
Thereby, the shape of the mouth can be changed so as to look natural according to the content of pronunciation. Further, the shape control device 10 can also generate control information used for control for changing the shape in this way.

[動作例２]
続いて、発音情報の内容が図２に示すものと異なる場合において、形状制御機能により制御される発音装置２０および形状変化装置３０の動作例について説明する。 [Operation example 2]
Next, an operation example of the sounding device 20 and the shape changing device 30 controlled by the shape control function when the content of the sounding information is different from that shown in FIG. 2 will be described.

図８は、第１実施形態に係る発音情報の他の例を説明する図である。図９は、図８に示す発音情報に基づいて生成される制御情報を説明する図である。図８に示す発音情報は、図２に示す発音情報と大きく異なる点は、唇音をもつ音節が存在すること、発音時間が長い音節が存在することである。生成部１１３は、このような図８に示す発音情報に基づいて、図９に示す制御情報を生成する。 FIG. 8 is a diagram for explaining another example of the pronunciation information according to the first embodiment. FIG. 9 is a diagram for explaining control information generated based on the pronunciation information shown in FIG. The phonetic information shown in FIG. 8 differs greatly from the phonetic information shown in FIG. 2 in that there is a syllable with a lip sound and a syllable with a long pronunciation time. The generation unit 113 generates the control information shown in FIG. 9 based on the pronunciation information shown in FIG.

生成部１１３は、このような発音情報に基づいて制御情報を生成する場合には、上述した処理に加えて、以下に説明する処理を行う。生成部１１３は、発音時間が予め決められた時間Ｔａ（この例においては、「０．８秒」）を超える音節があり、かつ、この音節と次の音節とに対応する形状情報が同一である場合に、次の音節に対応する形状情報に対応するタイミングよりも予め決められた時間Ｔｓ（この例においては、「０．３秒」）前のタイミングから、その形状情報が示す口の形状を、開き方が小さくなるようにした形状にするように制御情報を生成する。この例においては、開き方を小さくした口の形状を示す形状情報は、元の形状情報に「（ｓ）」を付して表す。例えば、形状情報が「Ａ」である場合には、開き方を小さくした口の形状を示す形状情報は、「Ａ（ｓ）」として表す。
なお、開き方を小さくした口の形状を示す形状情報「Ａ（ｓ）」、「Ｉ（ｓ）」、「Ｕ（ｓ）」、「Ｅ（ｓ）」、「Ｏ（ｓ）」は、生成部１１３で生成されるのではなく、記憶部１２に記憶されていてもよい。この場合、生成部１１３は、記憶部１２から対応する形状情報を取得すればよい。また、この形状情報「Ａ（ｓ）」、「Ｉ（ｓ）」、「Ｕ（ｓ）」、「Ｅ（ｓ）」、「Ｏ（ｓ）」は、開き方を小さくした口の形状ではなくてもよく、形状情報「Ａ」、「Ｉ」、「Ｕ」、「Ｅ」、「Ｏ」が示す口の形状と異なるものであればよい。 When generating the control information based on such pronunciation information, the generation unit 113 performs processing described below in addition to the processing described above. The generation unit 113 has a syllable whose pronunciation time exceeds a predetermined time Ta (in this example, “0.8 seconds”), and the shape information corresponding to this syllable and the next syllable is the same. In some cases, the shape of the mouth indicated by the shape information from the timing Ts (in this example, “0.3 seconds”) before the timing corresponding to the shape information corresponding to the next syllable. The control information is generated so as to have a shape in which the opening is small. In this example, the shape information indicating the shape of the mouth with a small opening is represented by adding “(s)” to the original shape information. For example, when the shape information is “A”, the shape information indicating the shape of the mouth whose opening is reduced is represented as “A (s)”.
In addition, the shape information “A (s)”, “I (s)”, “U (s)”, “E (s)”, “O (s)” indicating the shape of the mouth with a small opening is: Instead of being generated by the generation unit 113, it may be stored in the storage unit 12. In this case, the generation unit 113 may acquire the corresponding shape information from the storage unit 12. In addition, the shape information “A (s)”, “I (s)”, “U (s)”, “E (s)”, and “O (s)” is the shape of the mouth with a small opening. There is no need to be, as long as it is different from the mouth shape indicated by the shape information “A”, “I”, “U”, “E”, “O”.

また、生成部１１３は、この場合においても開き方を小さくした口の形状を示す形状情報に対応して、予め決められたアルゴリズムに従って遷移時間（この例においては、「０．０１秒」）を決定する。
なお、文字「た」の発音時間は、時間Ｔａ以上となるが、これに対応する形状情報「Ａ」と、次に続く形状情報「Ｎ」とは異なるため、口の形状を小さく開くようにする処理は行わない。 In this case, the generation unit 113 also sets the transition time (in this example, “0.01 seconds”) according to a predetermined algorithm corresponding to the shape information indicating the shape of the mouth whose opening is reduced. decide.
Note that the pronunciation time of the character “TA” is equal to or longer than the time Ta, but the shape information “A” corresponding to the character “TA” is different from the shape information “N” that follows, so that the mouth shape is opened slightly. No processing is performed.

また、生成部１１３は、発音情報が示す音節に唇音を持つものがある場合には、上述のようにその音節の母音に対応する形状情報ではなく、唇音に対応する形状情報「Ｎ」として決定して制御情報を生成する。すなわち、形状情報「Ｎ」に対応するタイミングは、その音節の子音に対応する音声波形信号が放音されるタイミングとして決定される。
そして、生成部１１３は、その音節に対応するタイミング、すなわち母音に対応する音声波形信号が放音されるタイミングを、その母音に対応する口の形状とするタイミングとして制御情報を生成する。この唇音に対する形状情報「Ｎ」から母音に対する形状情報に変化する遷移時間は、この例においては、「０．０１秒」とする。なお、生成部１１３は、母音に対応する口の形状とするタイミングを、母音に対応する音声波形信号が放音されるタイミングより前のタイミングとして決定してもよい。この場合には、生成部１１３は、対応する遷移時間についても短くなるように決定する。 In addition, when there is a syllable indicated by the pronunciation information having a lip sound, the generation unit 113 determines not the shape information corresponding to the vowel of the syllable as described above but the shape information “N” corresponding to the lip sound. Control information is generated. That is, the timing corresponding to the shape information “N” is determined as the timing at which the speech waveform signal corresponding to the consonant of the syllable is emitted.
Then, the generation unit 113 generates control information using the timing corresponding to the syllable, that is, the timing at which the speech waveform signal corresponding to the vowel is emitted as the timing of setting the mouth shape corresponding to the vowel. The transition time from the shape information “N” for the lip to the shape information for the vowel is set to “0.01 seconds” in this example. The generation unit 113 may determine the timing of the mouth shape corresponding to the vowel as the timing before the timing at which the speech waveform signal corresponding to the vowel is emitted. In this case, the generation unit 113 determines to shorten the corresponding transition time.

図１０は、図８に示す発音情報および図９に示す制御情報の例を用いた場合の発音装置２０および形状変化装置３０の動作を説明する図である。図１０の詳細な説明については、図７における場合と同様であるため省略する。なお、文字「ま」に対応する形状情報「Ｎ」は、すぐに形状情報「Ａ」に遷移を開始するため、図１０の記載においては、遷移ＡＮの後はすぐに遷移ＮＡになっている。 FIG. 10 is a diagram for explaining operations of the sounding device 20 and the shape changing device 30 when the sounding information shown in FIG. 8 and the example of the control information shown in FIG. 9 are used. The detailed description of FIG. 10 is omitted because it is the same as in FIG. Note that the shape information “N” corresponding to the character “MA” immediately starts a transition to the shape information “A”. Therefore, in the description of FIG. 10, the transition NA immediately follows the transition AN. .

このようにして、形状制御装置１０は、同じ口の形状が続くように形状変化装置３０を制御する場合に、開き方が小さくなるように口の形状を変化させることで、発音内容に応じた口の形状の変化をより自然に見えるようにすることができる。また、発音させる音節が唇音を持つ場合には、一旦口を閉じるように変化させることで、口の形状の変化をさらに自然に見えるようにすることができる。 In this way, when controlling the shape changing device 30 so that the same mouth shape continues, the shape control device 10 changes the shape of the mouth so that the opening is reduced, and thus according to the pronunciation content. It is possible to make the change in the shape of the mouth look more natural. In addition, when the syllable to be pronounced has a lip sound, it is possible to make the change in the shape of the mouth look more natural by changing it so that the mouth is once closed.

＜第２実施形態＞
第１実施形態において、形状制御装置１０は、生成部１１３により予め生成した制御情報を用いて、形状変化装置３０の口の形状を制御していた。第２実施形態においては、形状制御装置１０は、発音情報に基づいて発音装置２０の発音内容を制御しながら、リアルタイムに形状変化装置３０の口の形状を制御するものである。
第２実施形態における形状制御機能の構成について説明する。 Second Embodiment
In the first embodiment, the shape control device 10 controls the shape of the mouth of the shape change device 30 using the control information generated in advance by the generation unit 113. In the second embodiment, the shape control device 10 controls the shape of the mouth of the shape change device 30 in real time while controlling the sound content of the sound device 20 based on the sound information.
The configuration of the shape control function in the second embodiment will be described.

[形状制御機能の構成]
図１１は、第２実施形態に係る形状制御機能の構成を示すブロック図である。制御部１１は、形状制御プログラムを実行することにより、発音情報取得部１１１Ａ、発音制御部１１２Ａ、形状制御部１１４Ａおよび出力部１１５を構成して、形状制御機能を実現する。 [Configuration of shape control function]
FIG. 11 is a block diagram illustrating a configuration of a shape control function according to the second embodiment. By executing the shape control program, the control unit 11 configures the pronunciation information acquisition unit 111A, the sound generation control unit 112A, the shape control unit 114A, and the output unit 115 to realize the shape control function.

発音情報取得部１１１Ａは、利用者の操作などによる指示に応じて、記憶部１２から発音情報を取得する。発音情報取得部１１１Ａは、取得した発音情報を出力部１１５および発音制御部１１２Ａに出力する。
発音制御部１１２Ａは、第１実施形態における発音制御部１１２と同様の処理を行うが、発音情報におけるどの部分について発音させているかを示す同期情報については、出力部１１５に対して出力する。これにより、発音制御部１１２Ａにおける制御と出力部１１５からの出力が同期して行えるようにする。
形状制御部１１４Ａは、出力部１１５から出力される形状情報および遷移時間を示す情報を取得すると、形状変化装置３０の口の形状を形状情報が示す口の形状にするように変化を開始させ、徐々に口の形状を変化させ、遷移時間が経過するとその形状情報が示す口の形状になるように、形状変化装置３０を制御する。 The pronunciation information acquisition unit 111A acquires the pronunciation information from the storage unit 12 in response to an instruction by a user operation or the like. The pronunciation information acquisition unit 111A outputs the acquired pronunciation information to the output unit 115 and the pronunciation control unit 112A.
The sound generation control unit 112A performs the same process as the sound generation control unit 112 in the first embodiment, but outputs to the output unit 115 synchronization information indicating which part of the sound generation information is sounded. Thereby, the control in the sound generation control unit 112A and the output from the output unit 115 can be performed in synchronization.
When the shape control unit 114A acquires the shape information output from the output unit 115 and the information indicating the transition time, the shape control unit 114A starts the change so that the shape of the mouth of the shape change device 30 becomes the shape of the mouth indicated by the shape information. The shape changing device 30 is controlled so that the shape of the mouth is gradually changed and, when the transition time elapses, the shape of the mouth indicated by the shape information is obtained.

出力部１１５は、発音情報取得部１１１Ａから取得した発音情報、記憶部１２に記憶されている形状対応情報、および発音制御部１１２Ａから出力される同期情報に基づいて、形状情報および遷移時間を示す情報を形状制御部１１４Ａに出力する。
出力部１１５は、発音制御部１１２Ａによる発音制御が開始されると、まず初期値として、形状情報「Ｎ」を出力する。そして、出力部１１５は、発音装置２０から音節が発音されるタイミングより遷移時間だけ前のタイミングに、その音節に対応する形状情報およびその遷移時間を示す情報を、形状制御部１１４Ａに対して出力する。
なお、出力部１１５は、予め決められたアルゴリズムに従って、遷移時間を決定する。このアルゴリズムは、第１実施形態において説明したように様々な態様があり得る。遷移時間が形状情報により変化せず、予め決められた時間とする場合には、遷移時間を形状制御部１１４Ａに予め設定しておけば、出力部１１５は、遷移時間を示す情報を出力しなくてもよい。 The output unit 115 indicates the shape information and transition time based on the pronunciation information acquired from the pronunciation information acquisition unit 111A, the shape correspondence information stored in the storage unit 12, and the synchronization information output from the pronunciation control unit 112A. Information is output to the shape controller 114A.
When the sound generation control by the sound generation control unit 112A is started, the output unit 115 first outputs the shape information “N” as an initial value. Then, the output unit 115 outputs the shape information corresponding to the syllable and information indicating the transition time to the shape control unit 114A at a timing earlier than the timing at which the syllable is generated from the sound generation device 20 by the transition time. To do.
The output unit 115 determines the transition time according to a predetermined algorithm. This algorithm can have various aspects as described in the first embodiment. When the transition time does not change according to the shape information and is set to a predetermined time, if the transition time is set in the shape control unit 114A in advance, the output unit 115 does not output information indicating the transition time. May be.

図１２は、図２に示す発音情報に基づいて出力部から出力される内容の一例を説明する図である。図１２に示す出力内容は、図２に示す発音情報、および図３に示す形状対応情報に基づいて、出力部１１５によって出力される内容を示したものである。例えば、出力部１１５は、タイミング「０．１８」において、形状情報「Ａ」、遷移時間「０．０１」を出力する。
以上が、第２実施形態における形状制御機能についての説明である。続いて、第２実施形態における形状制御機能により制御される発音装置２０および形状変化装置３０の動作例について、図１３を用いて説明する。 FIG. 12 is a diagram for explaining an example of contents output from the output unit based on the pronunciation information shown in FIG. The output content shown in FIG. 12 shows the content output by the output unit 115 based on the pronunciation information shown in FIG. 2 and the shape correspondence information shown in FIG. For example, the output unit 115 outputs the shape information “A” and the transition time “0.01” at the timing “0.18”.
The above is the description of the shape control function in the second embodiment. Next, operation examples of the sound generation device 20 and the shape change device 30 controlled by the shape control function in the second embodiment will be described with reference to FIG.

[動作例]
図１３は、図２に示す発音情報および図１２に示す出力内容を用いた場合の発音装置２０および形状変化装置３０の動作を説明する図である。図１３の詳細な説明については、図７における場合と同様であるため省略する。なお、図１３の下部に示す形状情報については、矢印が示すタイミングにおいて出力部１１５から出力される形状情報を示している。
このように、第２実施形態における形状制御システム１において、形状制御装置１０は、出力部１１５が形状情報を遷移開始タイミングにあわせて出力することにより、制御情報を生成することなく、発音情報に基づく発音装置２０および形状変化装置３０を同期して制御することができる。 [Example of operation]
FIG. 13 is a diagram for explaining the operation of the sound producing device 20 and the shape changing device 30 when the sound production information shown in FIG. 2 and the output contents shown in FIG. 12 are used. The detailed description of FIG. 13 is the same as that in FIG. The shape information shown in the lower part of FIG. 13 indicates the shape information output from the output unit 115 at the timing indicated by the arrow.
As described above, in the shape control system 1 according to the second embodiment, the shape control apparatus 10 outputs the shape information in accordance with the transition start timing, so that the output information is output to the pronunciation information without generating control information. The sound generation device 20 and the shape changing device 30 can be controlled synchronously.

＜変形例＞
以上、本発明の実施形態について説明したが、本発明は以下のように、さまざまな態様で実施可能である。
[変形例１]
上述した第１実施形態においては、生成部１１３が生成する制御情報には、遷移時間を示す情報が含まれていたが、別の態様で遷移時間を特定するようにしてもよい。遷移時間を特定する態様のうち、形状情報を用いて遷移時間を特定する場合の一例を、図１４を用いて説明する。 <Modification>
As mentioned above, although embodiment of this invention was described, this invention can be implemented in various aspects as follows.
[Modification 1]
In the first embodiment described above, the control information generated by the generation unit 113 includes information indicating the transition time, but the transition time may be specified in another manner. Of the modes for specifying the transition time, an example of specifying the transition time using the shape information will be described with reference to FIG.

図１４は、変形例１に係る制御情報の一例を説明する図である。図１４が示す変形例１に係る制御情報は、上述した図６が示す第１実施形態に係る制御情報と、制御内容は同じである。変形例１に係る生成部１１３は、変化後の形状を示す形状情報の前の遷移を開始するタイミングに対応して、変化前の形状を示す形状情報を有する制御情報を生成する。すなわち、制御情報は、形状情報が必ず２つセットとして規定される。これにより、２つ目の形状情報に対応するタイミングが、口の形状変化の遷移が開始されるタイミングとなる。そして、遷移開始のタイミングから次の形状情報に対応するタイミング（遷移終了タイミング）までが遷移時間となる。このように制御情報は、遷移時間を示す情報を含んでいれば、どのような態様で含んでいてもよい。 FIG. 14 is a diagram illustrating an example of control information according to the first modification. The control information according to the first modification illustrated in FIG. 14 is the same as the control information according to the first embodiment illustrated in FIG. 6 described above. The generation unit 113 according to the first modification generates control information having shape information indicating the shape before the change in correspondence with the timing of starting the previous transition of the shape information indicating the shape after the change. That is, the control information is always defined as two sets of shape information. Thus, the timing corresponding to the second shape information is the timing at which the transition of the mouth shape change is started. The transition time is from the transition start timing to the timing corresponding to the next shape information (transition end timing). As described above, the control information may be included in any manner as long as it includes information indicating the transition time.

図１５は、図２に示す発音情報および図１４に示す制御情報を用いた場合の発音装置２０および形状変化装置３０の動作を説明する図である。図１５の詳細な説明については、図７における場合と同様であるため省略する。図１５に示すように、遷移期間の開始タイミングおよび終了タイミングは、変化前の形状を示す形状情報と、変化後の形状を示す形状情報とに挟むようにして制御情報に規定される。 FIG. 15 is a diagram for explaining the operation of the sound producing device 20 and the shape changing device 30 when the sound producing information shown in FIG. 2 and the control information shown in FIG. 14 are used. The detailed description of FIG. 15 is omitted because it is the same as in FIG. As shown in FIG. 15, the start timing and end timing of the transition period are defined in the control information so as to be sandwiched between the shape information indicating the shape before the change and the shape information indicating the shape after the change.

[変形例２]
上述した実施形態において、形状制御装置１０は、形状変化装置３０における口の形状以外の形状についても制御してもよい。例えば、形状変化装置３０が目を模した部分（以下、単に「目」という）を有し、目の開閉が可能に構成されていれば、形状制御装置１０は、この開閉を制御するようにしてもよい。この場合の形状制御装置１０の構成について説明する。 [Modification 2]
In the embodiment described above, the shape control device 10 may also control shapes other than the mouth shape in the shape change device 30. For example, if the shape changing device 30 has a portion simulating eyes (hereinafter simply referred to as “eyes”) and is configured to be able to open and close the eyes, the shape control device 10 controls the opening and closing. May be. The configuration of the shape control device 10 in this case will be described.

変形例２において、記憶部１２は、目の形状を示す形状情報を記憶している。この例において、形状情報「ＯＰＥＮ」「ＣＬＯＳＥ」は、それぞれ開状態、閉状態の目の形状を示すものとする。
変形例２において、生成部１１３は、発音情報に基づいて、目蓋制御情報を生成して記憶部１２に記憶する。この目蓋制御情報は、形状情報、その形状情報が示す目の形状にするタイミング、および目の形状が変わるときにおける変化前から変化後までの遷移時間を規定した情報である。これらの各情報は、生成部１１３が発音情報および形状対応情報に基づいて決定する。なお、このタイミングは、発音情報と同様に発音開始指示からの経過時間を示すものである。 In the second modification, the storage unit 12 stores shape information indicating the shape of the eyes. In this example, the shape information “OPEN” and “CLOSE” indicate the shapes of the eyes in the open state and the closed state, respectively.
In the second modification, the generation unit 113 generates eyelid control information based on the pronunciation information and stores it in the storage unit 12. This eyelid control information is information defining the shape information, the timing of the eye shape indicated by the shape information, and the transition time from before the change to after the change when the eye shape changes. Each of these pieces of information is determined by the generation unit 113 based on the pronunciation information and the shape correspondence information. This timing indicates the elapsed time from the sound generation start instruction as with the sound generation information.

生成部１１３は、目蓋制御情報を生成するときには、この例においては、発音情報が示す各音節の発音時間に基づいて、形状情報を「ＯＰＥＮ」または「ＣＬＯＳＥ」のいずれかに決定する。生成部１１３は、予め決められたアルゴリズムに従って、形状情報を決定する。この例においては、生成部１１３は、特定の範囲の時間から、音節ごとにランダムに時間Ｔｂを決定し、その音節の発音時間が時間Ｔｂを超える場合に、その音節に対応する形状情報を「ＣＬＯＳＥ」として決定する。そして、生成部１１３は、形状情報が「ＣＬＯＳＥ」となる条件を満たさない音節に対応する形状情報を「ＯＰＥＮ」として決定する。 When generating the eyelid control information, the generation unit 113 determines the shape information to be “OPEN” or “CLOSE” based on the pronunciation time of each syllable indicated by the pronunciation information in this example. The generation unit 113 determines shape information according to a predetermined algorithm. In this example, the generation unit 113 randomly determines the time Tb for each syllable from a specific range of time, and when the pronunciation time of the syllable exceeds the time Tb, the shape information corresponding to the syllable is “ “CLOSE”. Then, the generation unit 113 determines the shape information corresponding to the syllable that does not satisfy the condition that the shape information is “CLOSE” as “OPEN”.

なお、生成部１１３は、このような形状情報の決定に限定されることなく、様々な態様で形状情報の決定を行うことができる。例えば、生成部１１３は、音節を構成する音素（子音、母音）の種類、組み合わせ、音高、前後の音節のつながり（フレーズの流れ）に基づいて、形状情報を決定してもよい。また、生成部１１３は、音節の発音時間が一定の時間Ｔｃを超えた場合には、形状情報を一定の確率で「ＣＬＯＳＥ」となるように決定してもよい。
生成部１１３における形状情報の決定にランダム性を持たせることで、同じフレーズが続いた場合でも、常に同じ部分で目を閉じてしまうといったことを回避することができる。
生成部１１３は、遷移時間については、第１実施形態における制御情報における遷移時間と同様に、予め決められたアルゴリズムに従って決定する。 Note that the generation unit 113 is not limited to such determination of shape information, and can determine shape information in various modes. For example, the generation unit 113 may determine the shape information based on the types and combinations of phonemes (consonants, vowels) constituting the syllable, the pitch, and the connection between syllables (phrase flow). The generation unit 113 may determine the shape information to be “CLOSE” with a certain probability when the syllable sounding time exceeds a certain time Tc.
By giving randomness to the determination of the shape information in the generation unit 113, it is possible to avoid closing eyes at the same part even when the same phrase continues.
The generation unit 113 determines the transition time according to a predetermined algorithm in the same manner as the transition time in the control information in the first embodiment.

変形例２において、形状制御部１１４は、記憶部１２に記憶された目蓋制御情報に基づいて、目の開閉を変化させるように、インターフェイス１５を介して形状変化装置３０を制御する。この目蓋制御情報は、発音制御部１１２が制御に用いる発音情報に基づいて生成部１１３が生成した目蓋制御情報である。形状制御部１１４における目蓋制御情報に基づく制御は、第１実施形態における制御情報に基づく制御と制御対象が異なるだけである。そのため、形状制御部１１４における目蓋制御情報に基づく制御の内容については、詳細の説明を省略する。
次に、発音情報を例示して、変形例２における発音装置２０および形状変化装置３０の動作について説明する。 In the second modification, the shape control unit 114 controls the shape changing device 30 via the interface 15 so as to change the opening and closing of the eyes based on the eyelid control information stored in the storage unit 12. This eyelid control information is the eyelid control information generated by the generation unit 113 based on the sound generation information used for control by the sound generation control unit 112. The control based on the eyelid control information in the shape control unit 114 is only different from the control based on the control information in the first embodiment. Therefore, detailed description of the content of control based on the eyelid control information in the shape control unit 114 is omitted.
Next, the operation of the sounding device 20 and the shape changing device 30 in the modification 2 will be described by exemplifying the sounding information.

図１６は、変形例２に係る発音情報の一例を説明する図である。図１７は、図１６に示す発音情報に基づいて生成される目蓋制御情報を説明する図である。図１８は、図１６に示す発音情報および図１７に示す目蓋制御情報を用いた場合の発音装置２０および形状変化装置３０の動作を説明する図である。図１８に示す「目の形状」については、形状変化装置３０の目の形状を形状情報により示している。 FIG. 16 is a diagram illustrating an example of pronunciation information according to the second modification. FIG. 17 is a diagram for explaining eyelid control information generated based on the pronunciation information shown in FIG. 18 is a diagram for explaining operations of the sound generation device 20 and the shape change device 30 when the sound generation information shown in FIG. 16 and the eyelid control information shown in FIG. 17 are used. Regarding the “eye shape” shown in FIG. 18, the shape of the eye of the shape changing device 30 is indicated by shape information.

図１９は、変形例２に係る形状情報を説明する図である。図１９に示す例においては、形状情報「Ｉ」と「ＯＰＥＮ」との組み合わせ、形状情報「Ｉ」と「ＣＬＯＳＥ」との組み合わせを例として、目の形状および口の形状を示している。
このように、形状制御装置１０は、発音情報に基づいて様々な部分の形状を制御する制御情報を生成して、形状変化装置３０の口の形状だけでなく、目の形状、その他の部分の形状についても制御するようにしてもよい。 FIG. 19 is a diagram illustrating shape information according to the second modification. In the example illustrated in FIG. 19, the shape of the eye and the shape of the mouth are illustrated using a combination of shape information “I” and “OPEN” and a combination of shape information “I” and “CLOSE” as examples.
In this way, the shape control device 10 generates control information for controlling the shapes of various parts based on the pronunciation information, and not only the shape of the mouth of the shape changing device 30 but also the shape of the eyes and other parts. The shape may also be controlled.

[変形例３]
上述した実施形態においては、形状変化装置３０は人型ロボットであったが、表示画面を有し、表示画面にＣＧで人の顔などを表示させたものであってもよい。すなわち、形状変化装置３０は、口を模した部分を有する装置であればよい。なお、口とは、人の口に限らず、動物の口であってもよいし、口を有さない植物、構造物などに擬似的に設けた口であってもよい。
また、ＣＧなど表示画面に表示させる場合には、形状変化装置３０が表示するのではなく、表示部１４が表示するようにしてもよい。 [Modification 3]
In the embodiment described above, the shape change device 30 is a humanoid robot, but it may have a display screen and display a human face or the like on the display screen by CG. In other words, the shape changing device 30 may be a device having a portion that simulates a mouth. The mouth is not limited to a person's mouth, but may be an animal's mouth, or may be a mouth that is artificially provided in a plant or structure that does not have a mouth.
Moreover, when displaying on a display screen, such as CG, you may make it the display part 14 display instead of the shape change apparatus 30 displaying.

[変形例４]
上述した第１実施形態においては、形状制御装置１０は、制御情報を生成する生成部１１３と形状制御部１１４とを有していたが、それぞれ別の装置で処理が行われるようにしてもよい。すなわち、形状制御装置１０は、生成プログラムをＣＰＵにより実行することにより発音情報取得部１１１および生成部１１３を有する制御情報生成機能を実現する制御情報生成装置と、制御プログラムをＣＰＵにより実行することにより形状制御部１１４を有する制御機能を実現する装置とにより構成されるようにしてもよい。この形状制御部１１４は、生成部１１３が生成した制御情報に基づいて、形状変化装置３０を制御すればよい。 [Modification 4]
In the first embodiment described above, the shape control device 10 includes the generation unit 113 and the shape control unit 114 that generate control information. However, the processing may be performed by different devices. . That is, the shape control device 10 executes a generation program by the CPU, thereby realizing a control information generation function having a pronunciation information acquisition unit 111 and a generation unit 113, and by executing the control program by the CPU. You may make it comprise with the apparatus which implement | achieves the control function which has the shape control part 114. FIG. The shape control unit 114 may control the shape change device 30 based on the control information generated by the generation unit 113.

[変形例５]
上述した第１実施形態において、生成部１１３は子音を持つ音節に対応する形状情報について、その形状情報が示す形状に変化させる遷移開始タイミングを、その子音の音素タイミングとし、遷移終了タイミングをその母音の音素タイミングとするように制御情報を生成してもよい。第２実施形態においては、出力部１１５が上記遷移開始タイミング、終了タイミングとなるように、形状情報および遷移時間を示す情報を出力するようにしてもよい。 [Modification 5]
In the first embodiment described above, the generation unit 113 sets the transition start timing for changing the shape information corresponding to the syllable having consonant to the shape indicated by the shape information as the phoneme timing of the consonant, and sets the transition end timing as the vowel. Control information may be generated so as to be the phoneme timing. In the second embodiment, the shape information and the information indicating the transition time may be output so that the output unit 115 has the transition start timing and the end timing.

[変形例６]
上述した実施形態においては、形状制御装置１０は、発音情報が示す音節に唇音を持つものがある場合には、母音に対応する口の形状とする前に、形状情報「Ｎ」により示される口を閉じる形状となるように、形状変化装置３０を制御していた。これと同様に、唇音でなくても、音節を発音するときに人の口の形状が変化するような場合、例えば、「わ」、「さ」など特定の子音をもつ音節である場合に、記憶部１２は、これに対応する口の形状を示す形状情報を記憶しておいてもよい。そして、生成部１１３、出力部１１５は、唇音に対応して形状情報「Ｎ」を用いたように、この形状情報を用いて処理を行えばよい。 [Modification 6]
In the embodiment described above, the shape control device 10, when there is a lip sound in the syllable indicated by the pronunciation information, before the mouth shape corresponding to the vowel is made, the mouth indicated by the shape information “N”. The shape changing device 30 is controlled so as to have a shape that closes. Similarly, if the shape of a person's mouth changes when a syllable is pronounced even if it is not a lip sound, for example, if it is a syllable with a specific consonant such as `` wa '', `` sa '', The memory | storage part 12 may memorize | store the shape information which shows the shape of the mouth corresponding to this. Then, the generation unit 113 and the output unit 115 may perform processing using this shape information as in the case of using the shape information “N” corresponding to the lip sound.

[変形例７]
上述した実施形態においては、形状制御装置１０は、母音「ａ」「ｉ」「ｕ」「ｅ」「ｏ」に対応する形状情報を用いていたが、これ以外の特殊な母音に対応する口の形状を示す形状情報を用いてもよい。 [Modification 7]
In the embodiment described above, the shape control device 10 uses shape information corresponding to the vowels “a”, “i”, “u”, “e”, and “o”. You may use the shape information which shows the shape of this.

[変形例８]
上述した実施形態における形状制御プログラムは、磁気記録媒体（磁気テープ、磁気ディスクなど）、光記録媒体（光ディスクなど）、光磁気記録媒体、半導体メモリなどのコンピュータ読み取り可能な記録媒体に記憶した状態で提供し得る。また、形状制御装置１０は、形状制御プログラムをネットワーク経由でダウンロードしてもよい。 [Modification 8]
The shape control program in the above-described embodiment is stored in a computer-readable recording medium such as a magnetic recording medium (magnetic tape, magnetic disk, etc.), an optical recording medium (optical disk, etc.), a magneto-optical recording medium, or a semiconductor memory. Can be provided. Further, the shape control apparatus 10 may download the shape control program via a network.

１…形状制御システム、１０…形状制御装置、１１…制御部、１２…記憶部、１３…操作部、１４…表示部、１５…インターフェイス、１１１，１１１Ａ…発音情報取得部、１１２，１１２Ａ…発音制御部、１１３…生成部、１１４，１１４Ａ…形状制御部、１１５…出力部、２０…発音装置、３０…形状変化装置 DESCRIPTION OF SYMBOLS 1 ... Shape control system, 10 ... Shape control apparatus, 11 ... Control part, 12 ... Memory | storage part, 13 ... Operation part, 14 ... Display part, 15 ... Interface, 111, 111A ... Sound generation information acquisition part, 112, 112A ... Sound generation Control unit, 113 ... generating unit, 114, 114A ... shape control unit, 115 ... output unit, 20 ... sound generator, 30 ... shape change device

Claims

The shape of the mouth of the device controlled so that the shape of the portion imitating the mouth changes corresponding to the pronunciation content of the sounding device that is sounded based on the sound information defining the sounding timing of each syllable and the syllable A control information generation device for generating control information used for controlling
Pronunciation information acquisition means for acquiring the pronunciation information;
It has shape information indicating the shape of the mouth, timing information indicating the timing at which the shape of the mouth is changed to the shape indicated by the shape information, and transition information indicating the transition time from before change to after change when the shape changes. Generation means for generating control information based on the pronunciation information acquired by the pronunciation information acquisition means, determining the shape information according to the vowel of the syllable indicated by the pronunciation information, and generating the syllable pronunciation timing And generating the control information by determining the transition information according to a predetermined algorithm, and determining the timing information accordingly.
In the generation unit, the shape indicated by one of the shape information and the shape indicated by the next shape information are the same, and the time between the timings indicated by the timing information related to the respective shape information is based on a predetermined time. Control information generation characterized by determining shape information indicating the shape of the mouth different from the shape and timing information related to the shape information indicating any timing between the timings when the length is long apparatus.

The transition information includes information indicating timing for starting a change in the shape of the mouth,
The control information generation device according to claim 1, wherein the transition time indicated by the transition information indicates a timing from the timing indicated by the information to the timing of forming the shape after the change.

When the consonant of the syllable indicated by the pronunciation information is a specific consonant, the generation unit generates control information further including shape information indicating the shape of the mouth according to the consonant and timing information related to the shape information 3. The control according to claim 1, wherein the timing information is determined as a timing that is a predetermined time before a timing at which the shape information indicated by the shape information corresponding to the syllable of the syllable is set. Information generator.

A shape control device for controlling the shape of the mouth in a device having a part simulating a mouth,
Pronunciation control means for controlling the content of pronunciation for the sound generation device based on the sound information defining the plurality of syllables and sound generation timing;
Output means for outputting shape information indicating the shape of the mouth based on the pronunciation information when the pronunciation control is performed based on the pronunciation information by the pronunciation control means, the syllable indicated by the pronunciation information; Output means for determining the shape information in accordance with the vowel of, and outputting the shape information before a transition time determined in advance from the sounding timing of the syllable;
When the shape information is output from the output means, the shape control means for controlling the device so as to change the shape of the mouth to the shape indicated by the shape information over the transition time, and
When the shape indicated by one of the shape information is the same as the shape indicated by the next shape information, and the time between sound generation timings related to each shape information is longer than a predetermined time, the output means determines the shape information indicating the shape of said different port from that of the shape, the shape control device one of the timing between the sound generation timing and determines the sounding timing related to the shape information.