JP7156138B2

JP7156138B2 - Information processing device, light action generation method, and light action generation program

Info

Publication number: JP7156138B2
Application number: JP2019065702A
Authority: JP
Inventors: 勝井出; 雅芳清水
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-03-29
Filing date: 2019-03-29
Publication date: 2022-10-19
Anticipated expiration: 2039-03-29
Also published as: JP2020166500A

Description

本発明は、情報処理装置、光アクション生成方法、および光アクション生成プログラムに関する。 The present invention relates to an information processing device, a light action generation method, and a light action generation program.

近年、コミュニケーションやエンターテインメントなどの用途で利用されるコミュニケーション装置が開発されている。コミュニケーション装置は、例えば、関節などを動かして動作するロボット、並びに音声で視聴者とコミュニケーションを図るスマートスピーカおよびＡＩ（Artificial Intelligence）スピーカなどを含む。 In recent years, communication devices that are used for purposes such as communication and entertainment have been developed. Communication devices include, for example, robots that operate by moving their joints, smart speakers and AI (Artificial Intelligence) speakers that communicate with viewers by voice.

コミュニケーション装置は、例えば、クリエイターが作成したアクションの定義に従って、光源を発光させたり、関節を動かしたりして動作する。例えば、クリエイターは、コミュニケーション装置の動作時に流す音楽に合わせて、光源の発光波形や関節の動きを定義することで、コミュニケーション装置を音楽に合わせて躍らせたりすることができる。 The communication device operates by, for example, emitting light from a light source or moving joints according to an action definition created by a creator. For example, the creator can make the communication device dance to the music by defining the light emission waveform of the light source and the movement of the joints in accordance with the music that is played when the communication device operates.

また、例えば、コミュニケーション装置は、発話する内容を指定する発話データに従って音声を出力し、会話をしたりする。そして、発話する音声に基づいて、光源の発光波形を生成するリップシンクと呼ばれる技術が知られている。リップシンクでは、発話される言葉に連動して人間の口の動きに模して光源を発光させることで、あたかも話しているかのように見せることができる。リップシンクにより、視聴者は、コミュニケーション装置の発話内容がより理解しやすくなる。なお、以下では、クリエイターが定義した光源の発光およびリップシンクにより生成された光源の発光を問わず、コミュニケーション装置の光源の発光によるアクションを光アクションと呼ぶことがある。 In addition, for example, the communication device outputs voice according to speech data specifying the contents of speech to have a conversation. A technique called lip-sync is also known, which generates a light emission waveform of a light source based on a voice that is spoken. With lip-syncing, by illuminating a light source that mimics the movement of a human mouth in conjunction with spoken words, it is possible to make it appear as if the person is speaking. Lip-sync makes it easier for the viewer to understand what the communication device is saying. Note that hereinafter, regardless of whether the creator-defined light source light emission or the light source light emission generated by lip-syncing, an action caused by light emission of the light source of the communication device may be referred to as a light action.

これに関し、コミュニケーション装置に関連する技術が知られている（例えば、特許文献１および特許文献２）。 In this regard, techniques related to communication devices are known (for example, Patent Literature 1 and Patent Literature 2).

特開２０１７－２２６０５１号公報JP 2017-226051 A 特表２００９－５０９６７３号公報Japanese Patent Publication No. 2009-509673

しかしながら、クリエイターが作成した光アクションでの光源の発光強度の波形と、リップシンクにより生成された光アクションの光源の発光強度の波形とが類似してしまい、視聴者にとって区別がつきにくくなることがある。その結果、光源が、例えば、クリエイターが定義した光アクションで発光しているのか、それともコミュニケーション装置が発話していることを表しているのかが区別がつかないことがある。 However, the waveform of the luminescence intensity of the light source in the light action created by the creator and the luminescence intensity waveform of the light source of the light action generated by the lip sync are similar, making it difficult for the viewer to distinguish them. be. As a result, it may be indistinguishable whether the light source is e.g. emitting light with a creator-defined light action or representing the communication device speaking.

１つの側面では、本発明は、クリエイターが定義した光アクションと識別可能なリップシンクによる光アクションを生成することを目的とする。 In one aspect, the present invention aims to generate a lip-sync light action that is distinguishable from a creator-defined light action.

本発明の一つの態様の情報処理装置は、定義された発光波形で光源を発光させる定義アクションと、発話される音声に応じて光源を発光させるリップシンクアクションとを含むコンテンツの定義アクションでの発光波形の周波数成分を分析し、発光波形を代表する代表周波数成分を特定する特定部と、リップシンクアクションにおける光源の発光波形で使用する周波数を、代表周波数成分とは区別可能な周波数に調整する調整部と、を含む。 An information processing apparatus according to one aspect of the present invention emits light in a defined action of content including a defined action of emitting a light source with a defined emission waveform and a lip-sync action of causing the light source to emit light according to an uttered voice. An identification unit that analyzes the frequency components of the waveform and identifies representative frequency components that represent the light emission waveform, and an adjustment that adjusts the frequency used in the light emission waveform of the light source in the lip-sync action to a frequency that can be distinguished from the representative frequency components. including the part and

クリエイターが定義した光アクションと識別可能なリップシンクによる光アクションを生成することができる。 Creator-defined light actions and identifiable lip-sync light actions can be generated.

コミュニケーション装置が実行するコンテンツの時系列の動作を例示する図である。FIG. 4 is a diagram illustrating a time-series operation of content executed by a communication device; 例示的なリップシンクにおける光源の発光波形の生成を示す図である。FIG. 11 illustrates the generation of light source emission waveforms in an exemplary lip-sync; コミュニケーション装置が実行する定義アクションとリップシンクとを例示する図である。FIG. 10 is a diagram illustrating defined actions and lip-syncing performed by a communication device; 実施形態に係るリップシンクの期間の光アクションを生成する生成装置のブロック構成を例示する図である。FIG. 4 is a diagram illustrating a block configuration of a generation device that generates light actions during lip-sync according to an embodiment; 実施形態に係るリップシンクによる光アクションの生成の流れを例示する図である。FIG. 5 is a diagram illustrating the flow of generating a light action by lip-syncing according to the embodiment; 実施形態に係るリップシンクの期間における光アクションの発光波形の調整を例示する図である。FIG. 10 is a diagram illustrating adjustment of emission waveforms of light actions during lip-sync according to embodiments; 実施形態に係るコンテンツ情報を例示する図である。It is a figure which illustrates the content information which concerns on embodiment. 実施形態に係るリップシンクにおける光アクションの生成処理の動作フローを例示する図である。FIG. 7 is a diagram illustrating an operation flow of light action generation processing in lip sync according to the embodiment; 実施形態に係る所定期間のクリエイターの光アクションに基づいて偏りの判定および抑制対象の周波数成分の決定を行う例を示す図である。FIG. 10 is a diagram showing an example of determining bias and determining frequency components to be suppressed based on light actions of a creator for a predetermined period according to the embodiment; 実施形態に係るリップシンクにおける抑制対象の周波数成分の抑制制御を例示する図である。FIG. 5 is a diagram illustrating suppression control of suppression target frequency components in lip sync according to the embodiment; 別の実施形態に係るコミュニケーション装置のブロック構成を例示する図である。FIG. 11 is a diagram illustrating a block configuration of a communication device according to another embodiment; FIG. 別の実施形態に係るコミュニケーション装置の制御部が実行するリップシンクにおける光アクションの生成処理の動作フローを例示する図である。FIG. 11 is a diagram illustrating an operational flow of light action generation processing in lip-sync executed by a control unit of a communication device according to another embodiment; 実施形態に係る生成装置を実現するための情報処理装置のハードウェア構成を例示する図である。It is a figure which illustrates the hardware constitutions of the information processing apparatus for implement|achieving the production|generation apparatus which concerns on embodiment. 別の実施形態に係るコミュニケーション装置を実現するためのハードウェア構成を例示する図である。FIG. 10 is a diagram illustrating a hardware configuration for realizing a communication device according to another embodiment; FIG.

以下、図面を参照しながら、本発明のいくつかの実施形態について詳細に説明する。なお、複数の図面において対応する要素には同一の符号を付す。 Several embodiments of the present invention will be described in detail below with reference to the drawings. In addition, the same code|symbol is attached|subjected to the element which corresponds in several drawings.

図１は、コミュニケーション装置１００が実行するコンテンツの動作を時系列に例示する図である。コミュニケーション装置１００は、例えば、関節などを動かして動作するロボット、並びに音声で視聴者とコミュニケーションを図るスマートスピーカおよびＡＩスピーカなどの視聴者と言葉を用いてコミュニケーションを図る装置を含む。図１の例では、コミュニケーション装置１００としてロボットが例示されている。 FIG. 1 is a diagram illustrating, in chronological order, content operations executed by the communication device 100. As shown in FIG. The communication device 100 includes, for example, a robot that operates by moving joints, a device that communicates with a viewer using words, such as a smart speaker and an AI speaker that communicates with the viewer by voice. In the example of FIG. 1, a robot is illustrated as the communication device 100 .

また、コンテンツでは、例えば、コミュニケーション装置１００が実行する一連の動作が規定されている。コンテンツは、例えば、リップシンクの期間と、定義アクションの期間とを含んでよい。なお、コンテンツは、リップシンクの期間と、定義アクションの期間とを複数含んでもよい。 Also, the content defines, for example, a series of operations to be executed by the communication device 100 . The content may include, for example, periods of lip-syncing and periods of defining actions. Note that the content may include a plurality of lip-sync periods and definition action periods.

コミュニケーション装置１００は、例えば、ＬＥＤ（light emitting diode）などの光源１０１を含む。そして、リップシンクの期間では、コミュニケーション装置１００は、発話する内容を示す発話データに従って音声を出力する。また、コミュニケーション装置１００は、出力する音声に基づいて、リップシンクでの光源１０１の発光波形を生成し、生成した発光波形で光源１０１を発光させる。なお、リップシンクの期間における光源１０１の発光によるアクションを、リップシンクアクションと呼ぶことがある。 The communication device 100 includes a light source 101 such as an LED (light emitting diode). Then, during the lip-sync period, communication device 100 outputs voice according to speech data indicating the content of speech. Further, the communication device 100 generates a light emission waveform of the light source 101 in lip-sync based on the output sound, and causes the light source 101 to emit light with the generated light emission waveform. Note that an action caused by light emission from the light source 101 during the lip sync period is sometimes called a lip sync action.

図２は、例示的なリップシンクにおける光源１０１の発光波形の生成を示す図である。例えば、図２（ａ）に示すように、リップシンクの期間において発話する文字列「みぎてのてんじぶつをごらんください」があるとする。 FIG. 2 is a diagram illustrating generation of an emission waveform of light source 101 in an exemplary lip-sync. For example, as shown in FIG. 2(a), assume that there is a character string "please see the tenjibutsu" that is uttered during the lip-sync period.

この場合に、図２（ｂ）に示すように、音声合成の技術を用いて、文字列から音声を合成して音声波形のデータを生成することができる。音声波形データは、例えば、コミュニケーション装置１００に発話させる際に再生される。 In this case, as shown in FIG. 2B, speech synthesis technology can be used to synthesize speech from a character string to generate speech waveform data. The voice waveform data is reproduced, for example, when the communication device 100 is made to speak.

また、音声波形データに基づいて、リップシンクにより光源１０１の発光波形を生成することができる。例えば、人間の声の周波数は約１００Ｈｚ～２０００Ｈｚの範囲に分布している。一方、人間が、光の強度変化を認識できるのは０．２Ｈｚ～５０Ｈｚ程度の範囲が限界であると言われている。そのため、音声波形データを直接、光源１０１の発光波形として用いることは難しい。リップシンクでは、視聴者にコミュニケーション装置１００が話しているかのように光源１０１を発光させるため、人間が光の強度変化を認識し易い０．３～６Ｈｚなどの範囲で光源１０１を発光させることが多い。そのため、１００Ｈｚ～２０００Ｈｚの周波数範囲の人間の声の波形から、０．３～６Ｈｚの周波数範囲などのリップシンクに適した周波数範囲の波形を生成する処理が行われる。一例では、リップシンクに適した波形は、音声波形データの包絡線を求めることで生成することができる。 Also, based on the audio waveform data, the light emission waveform of the light source 101 can be generated by lip-sync. For example, the frequencies of human voice are distributed in the range of approximately 100 Hz to 2000 Hz. On the other hand, it is said that the range of 0.2 Hz to 50 Hz is the limit for humans to perceive changes in light intensity. Therefore, it is difficult to directly use the sound waveform data as the light emission waveform of the light source 101 . In lip-syncing, the light source 101 emits light as if the communication device 100 were speaking to the viewer. many. Therefore, processing is performed to generate a waveform in a frequency range suitable for lip-sync, such as a frequency range of 0.3 to 6 Hz, from a human voice waveform in a frequency range of 100 Hz to 2000 Hz. In one example, a waveform suitable for lip-sync can be generated by determining the envelope of audio waveform data.

図２（ｃ）は、音声波形データから得られた包絡線の波形を例示する図である。図２（ｃ）に例示するように、１００Ｈｚ～２０００Ｈｚの周波数範囲の人間の声の波形を含む音声データに基づいて、０．３～６Ｈｚの周波数範囲の波形を生成することができる。そして、リップシンクでは、例えば、音声波形データを再生する際に、このように得られた音声波形データの波形と相関を有する波形で、光源１０１の強度を変化させることで、コミュニケーション装置１００が話しているかのように視聴者に認識させることができる。そして、リップシンクにより、視聴者はコミュニケーション装置１００の発話内容がより理解し易くなる。 FIG. 2(c) is a diagram illustrating an envelope waveform obtained from speech waveform data. As illustrated in FIG. 2(c), a waveform in a frequency range of 0.3-6 Hz can be generated based on audio data including a human voice waveform in a frequency range of 100-2000 Hz. In lip-sync, for example, when reproducing the voice waveform data, the communication device 100 can speak by changing the intensity of the light source 101 with a waveform having a correlation with the waveform of the voice waveform data thus obtained. It is possible to make the viewer recognize as if The lip-sync makes it easier for the viewer to understand the contents of the speech of the communication device 100 .

また、定義アクションの期間では、クリエイターが定義したアクションに従って、コミュニケーション装置１００は動作する。クリエイターは、例えば、生成する定義アクションの用途などに応じて、定義アクションの開始時刻からの経過時間と対応づけて、各関節の角度や光源１０１の発光強度を定義してよい。例えば、音楽に合わせて踊りを踊るアクションを定義する場合、クリエイターは、コミュニケーション装置１００の踊りの姿勢を時間ごとに定義し、コミュニケーション装置１００を踊らせてよい。また、クリエイターは、例えば、定義するアクションにおいて光源１０１の発光波形を定義してもよい。例えば、クリエイターは、コミュニケーション装置１００の動作時に流れる音楽、コミュニケーション装置１００の動き、および、発話の内容などに合わせて、光源１０１の発光波形を定義してよい。 Further, during the defined action period, the communication device 100 operates according to the action defined by the creator. For example, the creator may define the angle of each joint and the light emission intensity of the light source 101 in association with the elapsed time from the start time of the defined action according to the use of the defined action to be generated. For example, when defining an action of dancing to music, the creator may define a dancing posture of the communication device 100 for each time period and cause the communication device 100 to dance. Also, the creator may define the light emission waveform of the light source 101 in the defined action, for example. For example, the creator may define the light emission waveform of the light source 101 according to the music that is played when the communication device 100 operates, the movement of the communication device 100, the content of the speech, and the like.

しかしながら、上述のように、クリエイターが作成した定義アクションにおける光源１０１の発光波形と、リップシンクによる光源１０１の発光波形とが類似してしまい、見ている視聴者にとって区別がつきにくくなることがある。その結果、光源が、例えば、クリエイターが定義した光アクションで発光しているのか、それともコミュニケーション装置が発話していることを表しているのかが区別がつかないことがある。 However, as described above, the light emission waveform of the light source 101 in the definition action created by the creator and the light emission waveform of the light source 101 due to lip-sync are similar, and it may be difficult for the viewer to distinguish between them. . As a result, it may be indistinguishable whether the light source is e.g. emitting light with a creator-defined light action or representing the communication device speaking.

図３は、コミュニケーション装置１００が連続して実行する定義アクションとリップシンクとを例示する図である。図３（ａ）は、時系列のコミュニケーション装置１００の動作を例示している。また、図３（ｂ）は、コミュニケーション装置１００の左腕の関節角の変化を例示している。コミュニケーション装置１００は、定義アクション１の期間において左腕関節の角度を水平になるように回転させた後、左腕を上下に振っている。その後、コミュニケーション装置１００は、リップシンクの期間において姿勢を維持し、定義アクション２の期間においてまた左腕を上下に振っている。 FIG. 3 is a diagram illustrating defined actions and lip-syncs that are successively executed by the communication device 100. As shown in FIG. FIG. 3(a) illustrates the operation of the communication device 100 in chronological order. Also, FIG. 3B illustrates changes in the joint angle of the left arm of the communication device 100 . The communication device 100 swings the left arm up and down after rotating the angle of the left arm joint to be horizontal in the period of definition action 1 . After that, the communication device 100 maintains the posture during the lip-sync period, and swings the left arm up and down again during the definition action 2 period.

また、図３（ｃ）の発話に示すように、リップシンクの期間には、コミュニケーション装置１００は、「みぎてのてんじぶつをごらんください」と発話している。 Further, as shown in the utterance of FIG. 3(c), during the lip sync period, the communication device 100 utters, "Please look at the results of Migite."

図３（ｄ）および図３（ｅ）は、光アクションにおける光源１０１の発光波形を例示している。定義アクション１の期間の光源１０１の発光波形と、定義アクション２の期間の光源１０１の発光波形は、例えば、コミュニケーション装置１００のアクションを作成するクリエイターによって設定される。また、リップシンクの期間における光源１０１の発光波形は、例えば、図２を参照して例示したように、発話の音声波形データの波形に基づいて生成することができる。 FIGS. 3(d) and 3(e) illustrate the light emission waveform of the light source 101 in light action. The light emission waveform of the light source 101 during the defined action 1 period and the light emission waveform of the light source 101 during the defined action 2 period are set by, for example, a creator who creates the action of the communication device 100 . Also, the light emission waveform of the light source 101 during the lip-sync period can be generated based on the waveform of the speech waveform data, as illustrated with reference to FIG. 2, for example.

ここで、例えば、定義アクション１または定義アクション２の期間における光源１０１の発光の周期と、リップシンクの期間における光源１０１の発光の周期とが、図３（ｄ）に示すように、区別可能な程度の大きさで異なっているとする。この場合、コミュニケーション装置１００の視聴者は、定義アクションの期間と、リップシンクの期間との切り替えを認識することができる。その結果、視聴者は、リップシンクの期間の光源１０１の発光から発話内容がより理解しやすくなる。 Here, for example, the light emission period of the light source 101 during the defined action 1 or the defined action 2 period and the light emission period of the light source 101 during the lip-sync period can be distinguished as shown in FIG. Suppose they differ in degree. In this case, the viewer of the communication device 100 can recognize the switching between the defined action period and the lip-sync period. As a result, the viewer can more easily understand the utterance content from the light emission of the light source 101 during the lip-sync period.

しかしながら、例えば、定義アクション１または定義アクション２の期間における光源１０１の発光の周期と、リップシンクの期間における光源１０１の発光の周期とが、図３（ｅ）に示すように、近いとする。この場合、コミュニケーション装置１００の視聴者は、定義アクションの期間と、リップシンクの期間との切り替えを認識することが難しい。その結果、視聴者は、光源１０１の発光が、定義アクションにおける発光なのか、それともコミュニケーション装置１００が話していることを表すリップシンクによる発光なのかが区別できなくなることがある。この場合、リップシンクによる視聴者の内容の理解を促す効果が得られなかったり、或いは、場合によっては、視聴者の理解を妨げてしまったりする恐れがある。そのため、クリエイターが生成する定義アクションのおける光源１０１の発光と、リップシンクによる光源１０１の発光とを識別可能に制御する技術の提供が望まれている。以下、第１の実施形態を説明する。 However, for example, it is assumed that the light emission period of the light source 101 during the definition action 1 or the definition action 2 is close to the light emission period of the light source 101 during the lip-sync period, as shown in FIG. 3(e). In this case, it is difficult for the viewer of communication device 100 to recognize switching between the defined action period and the lip-sync period. As a result, the viewer may not be able to distinguish whether the light emission of the light source 101 is the light emission in the definition action or the light emission due to the lip-sync indicating that the communication device 100 is speaking. In this case, there is a risk that the effect of lip-syncing to promote the viewer's understanding of the content cannot be obtained, or that the viewer's understanding may be hindered in some cases. Therefore, it is desired to provide a technique for controlling the light emission of the light source 101 in the definition action generated by the creator and the light emission of the light source 101 by lip-sync so as to be distinguishable. A first embodiment will be described below.

（第１の実施形態）
図４は、実施形態に係るリップシンクの期間の光源１０１のアクションを生成する生成装置４００のブロック構成を例示する図である。生成装置４００は、例えば、クリエイターがコンテンツや定義アクションの生成に用いるパーソナルコンピュータ（ＰＣ）、モバイルＰＣ、タブレット端末などの情報処理装置であってよい。生成装置４００は、例えば、制御部４０１、および記憶部４０２を含む。制御部４０１は、例えば特定部４１１および調整部４１２などとして動作してよい。生成装置４００の記憶部４０２は、例えば、後述するコンテンツ情報７００などの情報を記憶している。これらの各部の詳細及び記憶部４０２に格納されている情報の詳細については後述する。 (First embodiment)
FIG. 4 is a diagram illustrating a block configuration of a generation device 400 that generates an action of the light source 101 during lip sync according to the embodiment. The generation device 400 may be, for example, an information processing device such as a personal computer (PC), a mobile PC, or a tablet terminal used by creators to generate content and definition actions. Generation device 400 includes, for example, control unit 401 and storage unit 402 . The control unit 401 may operate, for example, as the identification unit 411 and the adjustment unit 412 . The storage unit 402 of the generation device 400 stores information such as content information 700 described later, for example. Details of these units and details of information stored in the storage unit 402 will be described later.

また、以下で述べる実施形態では、クリエイターが定義した定義アクションにおける光源１０１の発光と識別可能なリップシンクによる光源１０１の発光波形を生成する。 Further, in the embodiment described below, a light emission waveform of the light source 101 is generated by lip-sync that can be distinguished from the light emission of the light source 101 in the definition action defined by the creator.

図５は、実施形態に係るリップシンクによる光源１０１の発光波形の生成の流れを例示する図である。 FIG. 5 is a diagram illustrating the flow of generation of the light emission waveform of the light source 101 by lip sync according to the embodiment.

ステップ５０１（以降、ステップを“Ｓ”と記載し、例えば、Ｓ５０１と表記する）において生成装置４００の制御部４０１は、リップシンクの期間における発話のデータに基づいて、光アクションを生成する。例えば、生成装置４００の制御部４０１は、図２で述べたように、発話の音声波形データに基づいてリップシンクの発光波形を生成してよい。 In step 501 (hereinafter, the step is written as "S", for example, S501), the control unit 401 of the generation device 400 generates a light action based on speech data during the lip-sync period. For example, the control unit 401 of the generation device 400 may generate a lip-sync light emission waveform based on speech waveform data of an utterance, as described with reference to FIG.

Ｓ５０２において、生成装置４００の制御部４０１は、同じコンテンツのリップシンク以外の期間に含まれる定義アクションで定義される光源１０１の発光波形の周波数を分析する。例えば、生成装置４００の制御部４０１は、リップシンクの期間と隣接している定義アクションの発光波形に含まれる周波数成分を分析してよい。 In S502, the control unit 401 of the generation device 400 analyzes the frequency of the light emission waveform of the light source 101 defined by the definition action included in the period other than the lip-sync period of the same content. For example, the control unit 401 of the generation device 400 may analyze the frequency components included in the emission waveform of the defined action adjacent to the lip-sync period.

Ｓ５０３において生成装置４００の制御部４０１は、例えば、定義アクションの分析結果に基づいて、リップシンクの期間における光源１０１の発光波形を制御する。例えば、制御部４０１は、リップシンクの期間と隣接している定義アクションにおける光源１０１の発光波形の周波数成分の主な成分とは区別可能な周波数となるように、リップシンクの発光波形を調整してよい。それにより、コミュニケーション装置１００の視聴者が、クリエイターの作成した定義アクションと、リップシンクの期間との切り替わりを認識できずに、混同してしまうことを抑制することができる。 In S503, the control unit 401 of the generation device 400 controls the light emission waveform of the light source 101 during the lip-sync period, for example, based on the analysis result of the definition action. For example, the control unit 401 adjusts the lip-sync emission waveform so that the frequencies are distinguishable from the main frequency components of the emission waveform of the light source 101 in the definition action adjacent to the lip-sync period. you can As a result, it is possible to prevent the viewer of the communication device 100 from being confused by not being able to recognize the switching between the definition action created by the creator and the lip-sync period.

なお、リップシンクでは、音声強弱と大まかに対応している光の強度変化が見えていれば、視聴者は発話として認識する傾向がある。そのため、一部の周波数成分を用いなくても、リップシンクの期間における光源１０１の発光波形を生成することが可能である。 In lip-sync, if a change in light intensity roughly corresponding to the strength of the voice is visible, the viewer tends to recognize it as an utterance. Therefore, it is possible to generate the emission waveform of the light source 101 during the lip-sync period without using some of the frequency components.

図６は、実施形態に係るリップシンクの期間における光源１０１の発光波形の調整を例示する図である。図６（ａ）は、例えば、図２（ｃ）で音声波形データから生成した光源１０１の発光波形であり、例えば、人が光の強度変化を認識し易い０．３～６Ｈｚなどの範囲の周波数成分を含んでいる。 FIG. 6 is a diagram illustrating adjustment of the emission waveform of the light source 101 during the lip-sync period according to the embodiment. FIG. 6(a) is, for example, the light emission waveform of the light source 101 generated from the sound waveform data in FIG. 2(c). Contains frequency components.

図６（ｂ）は、図６（ａ）の発光波形に含まれる周波数成分のうち、高周波数成分を抑制して得られた波形である。例えば、図６（ｂ）の発光波形は、図６（ａ）の発光波形をローパスフィルタに通すことで得ることができる。ここで、ローパスフィルタにより、例えば、０．５～２Ｈｚ程度の周期で変化する成分が抽出されるとする。この場合にも、例えば、音声波形データにおいて０．５～２Ｈｚ程度で変化する成分には、発話の際に生じる声の強弱に応じた成分が含まれる傾向がある。そのため、ローパスフィルタで高周波数成分を抑制して得られた発光波形で光源１０１の発光を制御したとしても、視聴者には、声の強弱の変化に応じて光源１０１の発光強度が変化しているように見える。その結果、光源１０１の発光で、コミュニケーション装置１００が話していると視聴者に錯覚させることができる。 FIG. 6(b) shows a waveform obtained by suppressing high frequency components among the frequency components included in the light emission waveform of FIG. 6(a). For example, the emission waveform of FIG. 6(b) can be obtained by passing the emission waveform of FIG. 6(a) through a low-pass filter. Here, it is assumed that a low-pass filter extracts a component that changes with a period of about 0.5 to 2 Hz, for example. Also in this case, for example, the components that change at about 0.5 to 2 Hz in the voice waveform data tend to include components corresponding to the strength of the voice that occurs during speech. Therefore, even if the light emission of the light source 101 is controlled by the light emission waveform obtained by suppressing the high frequency components with a low-pass filter, the viewer will perceive that the light emission intensity of the light source 101 changes according to the change in the strength of the voice. It looks like there is As a result, the light emitted from the light source 101 can give the viewer the illusion that the communication device 100 is speaking.

また、図６（ｃ）は、図６（ａ）の発光波形に含まれる周波数成分のうち、低周波数成分を抑制して得られた波形である。例えば、図６（ｃ）の発光波形は、図６（ａ）の発光波形をハイパスフィルタに通すことで得ることができる。ここで、ハイパスフィルタにより、例えば、２～６Ｈｚ程度の周期で変化する成分が抽出されるとする。この場合にも、例えば、音声波形データにおいて２～６Ｈｚ程度の周期で変化する成分には、音声の１文字１文字の発話の際に生じる変化の成分が含まれる傾向がある。そのため、ハイパスフィルタで低周波数成分を抑制して得られた発光波形で光源１０１の発光を制御したとしても、視聴者には、文字の発声に応じて光源１０１の発光強度が変化しているように見える。その結果、光源１０１の発光で、コミュニケーション装置１００が話していると視聴者に錯覚させることができる。 FIG. 6(c) is a waveform obtained by suppressing low frequency components among the frequency components included in the emission waveform of FIG. 6(a). For example, the emission waveform of FIG. 6(c) can be obtained by passing the emission waveform of FIG. 6(a) through a high-pass filter. Here, it is assumed that a high-pass filter extracts a component that changes with a period of about 2 to 6 Hz, for example. In this case as well, for example, the components that change at a cycle of about 2 to 6 Hz in the voice waveform data tend to include components that change when each character of voice is uttered. Therefore, even if the light emission of the light source 101 is controlled by the light emission waveform obtained by suppressing the low-frequency components with a high-pass filter, the viewer will feel that the light emission intensity of the light source 101 is changing in accordance with the utterance of the characters. looks like As a result, the light emitted from the light source 101 can give the viewer the illusion that the communication device 100 is speaking.

このように、リップシンクでは、音声波形に対して相関のある波形を抽出すれば、一部の周波数成分を除いても視聴者に発話していると認識させることが可能である。そのため、クリエイターが作成した定義アクションで利用されている光源１０１の発光波形の周波数成分を避けてリップシンクの期間における光源１０１の発光波形を生成することが可能である。 In this way, with lip-sync, if a waveform that is correlated with the voice waveform is extracted, it is possible to make the viewer recognize that the speaker is speaking even if some frequency components are removed. Therefore, it is possible to generate the light emission waveform of the light source 101 during the lip-sync period by avoiding the frequency components of the light emission waveform of the light source 101 used in the definition action created by the creator.

また、上述のように、人間が、光の強度変化を認識できるのは０．２Ｈｚ～５０Ｈｚ程度の範囲が限界であるといわれている。そして、その中でも、人間が光の強度変化を認識し易い範囲は０．３～６Ｈｚなど、ある程度制限されている。そのため、クリエイターが定義アクションで光源１０１の発光に用いる周波数も、リップシンクにおいて光源１０１の発光に用いる周波数も、どちらも同じ０．３～６Ｈｚなどの範囲の周波数を用いる傾向がある。 Also, as described above, it is said that the range of 0.2 Hz to 50 Hz is the limit for human beings to perceive changes in the intensity of light. Among them, the range in which humans can easily recognize changes in light intensity is limited to some extent, such as 0.3 to 6 Hz. Therefore, both the frequency used for the light emission of the light source 101 in the defined action by the creator and the frequency used for the light emission of the light source 101 in the lip sync tend to use the same frequency in the range of 0.3 to 6 Hz.

しかしながら、クリエイターは、コミュニケーション装置１００の動作を定義して定義アクションを作成する場合、定義アクションの用途に合わせた周期で光源１０１を発光させる傾向がある。例えば、コミュニケーション装置１００に案内をさせる場合、クリエイターは、車両のウィンカーの点滅の周期を参考にすることがあり、その周期に近い周期で強度変化するように光アクションを生成することがある。また、例えば、クリエイターが、コミュニケーション装置１００に音楽に合わせてダンスを踊らせる定義アクションを作成する場合、音楽のテンポに合わせて光アクションにおける発光の強度変化の周期を選択する傾向がある。 However, when a creator defines an operation of the communication device 100 and creates a defined action, the creator tends to cause the light source 101 to emit light in a cycle matching the purpose of the defined action. For example, when using the communication device 100 for guidance, the creator may refer to the blinking period of the turn signals of the vehicle, and may generate a light action so that the intensity changes at a period close to that period. Also, for example, when a creator creates a definition action that causes the communication device 100 to dance to music, there is a tendency to select the period of light emission intensity change in the light action in accordance with the tempo of the music.

即ち、例えば、或る用途で流すコンテンツに含まれる定義アクションをクリエイターに自由に作成させたとしても、定義アクションで利用される光源１０１の発光波形に含まれる周波数成分には偏りがあることが多い。そのため、クリエイターが作成した定義アクションで利用されている光源１０１の発光波形の周波数成分を避けてリップシンクの期間における光源１０１の発光波形を生成することが可能である。 That is, for example, even if creators are allowed to freely create definition actions included in content to be streamed for a certain purpose, the frequency components included in the light emission waveform of the light source 101 used in the definition actions are often biased. . Therefore, it is possible to generate the light emission waveform of the light source 101 during the lip-sync period by avoiding the frequency components of the light emission waveform of the light source 101 used in the definition action created by the creator.

そして、コンテンツに含まれる定義アクションで定義される光アクションの発光の周期と識別可能な周期に、リップシンクの期間の光アクションの発光の周期を調整することで、クリエイターによる定義アクションとリップシンクとの切り替えの識別が容易になる。 Then, by adjusting the light emission cycle of the light action during the lip sync period to a cycle that can be identified from the light action light emission cycle defined by the definition action included in the content, the definition action and the lip sync by the creator are adjusted. makes it easier to identify switching.

一例として、クリエイターが、定義アクションにおいて１Ｈｚ程度のゆっくりした周期で光源１０１を発光させる光アクションを作成したとする。この場合に、例えば、リップシンクの期間の光源１０１の発光波形において２Ｈｚ未満の周波数をフィルタリングし、２Ｈｚ以上の早い周期で光源１０１の発光強度を制御する。それにより、例えば、定義アクションとリップシンクの期間とで、切り替わり時に発光のリズムに差が生じるため、視聴者はクリエイターが定義した定義アクションの期間と、リップシンクの期間とを識別することができる。また、別な例では、クリエイターが、定義アクションにおいて３Ｈｚ程度のはやい周期で光源１０１を発行させる光アクションを作成したとする。この場合に、例えば、リップシンクの期間の光源１０１の発光波形において２Ｈｚ以上の周波数をフィルタリングし、２Ｈｚ未満の遅い周期で光源１０１の発光強度を制御する。それにより、例えば、定義アクションとリップシンクの期間とで、切り替わり時に発光のリズムに差が生じるため、視聴者はクリエイターが定義した定義アクションの期間と、リップシンクの期間とを識別することができる。 As an example, assume that the creator has created a light action that causes the light source 101 to emit light at a slow cycle of about 1 Hz in the definition action. In this case, for example, in the light emission waveform of the light source 101 during the lip-sync period, frequencies of less than 2 Hz are filtered, and the light emission intensity of the light source 101 is controlled in a period as fast as 2 Hz or more. As a result, for example, there is a difference in the rhythm of light emission at the time of switching between the defined action and the lip-sync period, so the viewer can distinguish between the defined action period defined by the creator and the lip-sync period. . In another example, it is assumed that the creator has created a light action that emits the light source 101 at a fast cycle of about 3 Hz in the definition action. In this case, for example, frequencies of 2 Hz or more are filtered in the light emission waveform of the light source 101 during the lip-sync period, and the light emission intensity of the light source 101 is controlled at a slow cycle of less than 2 Hz. As a result, for example, there is a difference in the rhythm of light emission at the time of switching between the defined action and the lip-sync period, so the viewer can distinguish between the defined action period defined by the creator and the lip-sync period. .

従って、リップシンクの期間を、クリエイターが作成した定義アクションの期間と視聴者が混同してしまうことを抑制することができる。 Therefore, it is possible to prevent the viewer from confusing the lip-sync period with the defined action period created by the creator.

以下、実施形態に係るリップシンクの期間における光アクションの生成について更に詳細に説明する。 The generation of light actions during lip-sync according to embodiments is described in more detail below.

図７は、実施形態に係るコンテンツ情報７００を例示する図である。コンテンツ情報７００には、コンテンツにおけるコミュニケーション装置１００の動作が規定されている。コンテンツ情報７００には、例えば、時間、発話、光アクション、関節角１、関節角２を対応づけたエントリが登録されている。時間は、例えば、コンテンツにおいてエントリの動作を実行する期間を示す情報である。発話は、例えば、コミュニケーション装置１００に発話させる文字列が登録されている。なお、図７の例では、エントリと対応する動作期間においてコミュニケーション装置１００が発話しない場合には、発話には「なし」が登録されている。光アクションには、エントリと対応する期間における光源１０１の発光を指定する情報が登録されている。なお、図７の例では、エントリと対応する動作期間においてコミュニケーション装置１００が光源１０１を発光させない場合には、光アクションに「なし」が登録されている。また、エントリと対応する動作期間においてコミュニケーション装置１００の光源１０１の発光を、リップシンクにより制御する場合には、コンテンツ情報７００のエントリの発話には「リップシンク」が登録されている。 FIG. 7 is a diagram illustrating content information 700 according to the embodiment. The content information 700 defines the operation of the communication device 100 in content. In the content information 700, for example, entries are registered in which time, utterance, light action, joint angle 1, and joint angle 2 are associated with each other. The time is, for example, information indicating a period during which the operation of the entry is executed in the content. For the utterance, for example, a character string to be uttered by the communication device 100 is registered. In the example of FIG. 7, when the communication device 100 does not speak during the operation period corresponding to the entry, "none" is registered as the speech. Information specifying light emission of the light source 101 in a period corresponding to the entry is registered in the light action. Note that in the example of FIG. 7, when the communication device 100 does not cause the light source 101 to emit light during the operation period corresponding to the entry, "none" is registered as the light action. Also, when the light emission of the light source 101 of the communication device 100 is controlled by lip sync during the operation period corresponding to the entry, "lip sync" is registered in the utterance of the entry of the content information 700 .

関節角１および関節角２は、例えば、エントリと対応する動作期間におけるコミュニケーション装置１００が備えるそれぞれの関節の角度を定義する情報である。なお、コミュニケーション装置１００が備える関節は、関節角１および関節角２に限定されるものではなく、更に多くの関節を含んでもよいし、別の実施形態では、コミュニケーション装置１００は関節を含まなくてもよい。以下で述べる実施形態では、例えば、関節角１は、コミュニケーション装置１００と向かい合ってみた場合に視聴者から見て右側の腕関節の角度であり、また、関節角２は、左側の腕関節の角度である場合を例に説明を行う。 The joint angle 1 and the joint angle 2 are, for example, information defining angles of respective joints provided in the communication device 100 during an operation period corresponding to the entry. Note that the joints included in the communication device 100 are not limited to the joint angles 1 and 2, and may include more joints. In another embodiment, the communication device 100 may include no joints. good too. In the embodiments described below, for example, the joint angle 1 is the angle of the right arm joint as seen from the viewer when facing the communication device 100, and the joint angle 2 is the angle of the left arm joint. A case will be described as an example.

そして、クリエイターは、コンテンツ情報７００にコミュニケーション装置１００の動作を定義することで、コミュニケーション装置１００に様々な動作を行わせることができる。例えば、図７のコンテンツ情報７００の例では、時刻が０秒から３秒まではクリエイターが定義した定義アクションの期間である。クリエイターは０秒から３秒の期間においてコミュニケーション装置１００の発話の内容、光アクション、各関節角の角度などを設定することでコミュニケーション装置１００の動作を定義することができる。また、図７の例では、３秒から８秒まではリップシンクの期間であり、クリエイターは、光アクションにリップシンクと設定することで、エントリの発話の文字列に応じた音声の波形に合わせて、光源１０１の発光を制御することができる。 By defining actions of the communication device 100 in the content information 700, the creator can cause the communication device 100 to perform various actions. For example, in the example of the content information 700 in FIG. 7, the time from 0 seconds to 3 seconds is the defined action period defined by the creator. The creator can define the operation of the communication device 100 by setting the content of speech, light action, angles of each joint, etc. of the communication device 100 in a period of 0 to 3 seconds. Also, in the example of FIG. 7, the lip-sync period is from 3 seconds to 8 seconds, and the creator sets the lip-sync to the light action to match the voice waveform corresponding to the character string of the entry's utterance. light emission of the light source 101 can be controlled.

続いて、図８は、実施形態に係るリップシンクによる光アクションの生成処理の動作フローを例示する図である。例えば、コミュニケーション装置１００の制御部４０１は、コンテンツの動作の実行指示が入力されると、図８の動作フローを開始してよい。 Next, FIG. 8 is a diagram illustrating an operation flow of light action generation processing by lip sync according to the embodiment. For example, the control unit 401 of the communication device 100 may start the operation flow of FIG. 8 when an instruction to execute the content operation is input.

Ｓ８０１において制御部４０１は、実行指示が入力されたコンテンツの動作を規定するコンテンツ情報７００を読み出す。 In S801, the control unit 401 reads the content information 700 that defines the operation of the content for which the execution instruction has been input.

Ｓ８０２において制御部４０１は、読み出したコンテンツ情報７００に含まれる発話の情報を参照し、リップシンクの期間における音声波形データを取得する。例えば、制御部４０１は、コンテンツ情報７００の発話に登録されている文字列から、その文字列を発話した音声を合成して、音声波形データを取得してよい。或いは、別の実施形態では、例えば、コンテンツ情報７００には、発話の文字列の代わりに、または発話の文字列に加えて、音声波形データが登録されていてもよい。この場合、制御部４０１は、Ｓ８０２の処理において、コンテンツ情報７００からリップシンクの期間における音声波形データを読み出してよい。 In S802, the control unit 401 refers to the speech information included in the read content information 700, and acquires voice waveform data during the lip-sync period. For example, the control unit 401 may obtain voice waveform data by synthesizing the voice of the character string uttered from the character string registered in the utterance of the content information 700 . Alternatively, in another embodiment, for example, voice waveform data may be registered in the content information 700 instead of or in addition to the character string of speech. In this case, the control unit 401 may read the audio waveform data during the lip-sync period from the content information 700 in the process of S802.

Ｓ８０３において制御部４０１は、Ｓ８０２で生成した音声波形データから、リップシンクの期間における光源１０１の発光波形を生成する。制御部４０１は、例えば、音声波形データの振幅に応じて光源１０１の発光強度の波形を生成してよい。一例では、制御部４０１は、音声波形データの包絡線を求めることで、光源１０１の発光波形を生成してよい。なお、実施形態はこれに限定されるものではない。別の実施形態では制御部４０１は、リップシンクで利用する所定の周波数帯域（例えば、０．３～６Ｈｚ）の成分を通過させる帯域通過フィルタで音声波形データを処理し、得られた波形の信号値に比例した光源１０１の発光強度を有する発光波形を生成してよい。また、更に別の実施形態では、比例ではなく指数関数を用いてもよく、或いは、発光強度が所定値を超える場合には、所定値に発光強度を制限する関数などの変換関数を用いて光源１０１の発光波形を生成してもよい。 In S803, the control unit 401 generates a light emission waveform of the light source 101 during the lip-sync period from the audio waveform data generated in S802. For example, the control unit 401 may generate a waveform of the emission intensity of the light source 101 according to the amplitude of the audio waveform data. In one example, the control unit 401 may generate the light emission waveform of the light source 101 by obtaining the envelope of the sound waveform data. In addition, embodiment is not limited to this. In another embodiment, the control unit 401 processes the voice waveform data with a band-pass filter that passes components of a predetermined frequency band (for example, 0.3 to 6 Hz) used in lip sync, and obtains a waveform signal An emission waveform may be generated having the emission intensity of the light source 101 proportional to the value. Still other embodiments may use an exponential function instead of proportionality, or use a transform function, such as a function that limits the luminous intensity to a predetermined value if the luminous intensity exceeds a predetermined value. 101 emission waveforms may be generated.

Ｓ８０４において制御部４０１は、コンテンツ情報７００に含まれるリップシンク以外の期間における光源１０１の発光波形の周波数成分を分析する。制御部４０１は、一例では、コンテンツ情報７００に登録されているリップシンク以外の期間での光源１０１の発光波形の周波数成分をＦＦＴ（fast Fourier transform）を用いて分析してよい。なお、周波数成分の分析は、これに限定されるものではなく、その他の手法が用いられてもよい。 In S<b>804 , the control unit 401 analyzes the frequency component of the light emission waveform of the light source 101 during the period other than the lip sync included in the content information 700 . As an example, the control unit 401 may analyze the frequency components of the light emission waveform of the light source 101 during periods other than the lip sync registered in the content information 700 using FFT (fast Fourier transform). Note that the analysis of frequency components is not limited to this, and other techniques may be used.

Ｓ８０５において制御部４０１は、分析結果に基づいて、リップシンク以外の期間における光源１０１の発光波形の周波数成分に所定の条件を満たす偏りがあるか否かを判定する。一例では、制御部４０１は、リップシンク以外の期間における光源１０１の発光波形に含まれる周波数成分が形成する帯域を特定する。そして、制御部４０１は、その帯域が、任意の周波数に対して±２０％で表すことが可能な範囲内に収まるか（例えば、１．０Ｈｚ±０．２Ｈｚ、２．０Ｈｚ±０．４Ｈｚ等）によって周波数成分に偏りがあるか否かを判断してよい。また、制御部４０１は、リップシンク以外の期間での光源１０１の発光波形に含まれる周波数成分が形成する帯域を特定する際に、所定の強度（例えば、最大ピークの３０％の強度）以上の周波数成分を抽出してから帯域を特定してもよい。 In S805, the control unit 401 determines whether or not the frequency components of the light emission waveform of the light source 101 in the period other than the lip-sync period have a bias that satisfies a predetermined condition, based on the analysis result. In one example, the control unit 401 identifies a band formed by frequency components included in the light emission waveform of the light source 101 during a period other than lip sync. Then, the control unit 401 determines whether the band falls within a range that can be represented by ±20% for any frequency (for example, 1.0 Hz ±0.2 Hz, 2.0 Hz ±0.4 Hz, etc.). ) may be used to determine whether or not the frequency components are biased. In addition, when specifying a band formed by frequency components included in the light emission waveform of the light source 101 during a period other than the lip sync, the control unit 401 determines that the frequency component has a predetermined intensity (for example, an intensity of 30% of the maximum peak) or more. A band may be specified after extracting a frequency component.

そして、リップシンク以外の期間における光源１０１の発光波形に含まれる周波数成分が形成する帯域が、任意の周波数に対して±２０％で表すことが可能な範囲を超えた幅を有する場合、制御部４０１は、帯域に偏りがないと判定してよい。この場合、リップシンク以外の期間における光源１０１の発光波形に含まれる周波数成分が、光アクションで利用される０．３～６Ｈｚなどの所定の周波数範囲において、広範に分布していることを示している。この場合、リップシンクの発光周波数を、リップシンク以外の期間の周波数と混同を避けるように設定することが難しいことがある。そのため、制御部４０１は、Ｓ８０５でＮＯと判定してよく、Ｓ８０６で、コンテンツでの利用周波数を偏らせるように修正を促す警告情報を出力し、本動作フローは終了する。 Then, when the band formed by the frequency components included in the light emission waveform of the light source 101 in the period other than the lip-sync period has a width exceeding the range that can be expressed by ±20% with respect to an arbitrary frequency, the control unit 401 may determine that there is no bias in the band. In this case, the frequency components included in the light emission waveform of the light source 101 during the period other than the lip-sync period are widely distributed in a predetermined frequency range such as 0.3 to 6 Hz used for light action. there is In this case, it may be difficult to set the emission frequency for lip-sync so as to avoid confusion with the frequency for periods other than lip-sync. Therefore, the control unit 401 may determine NO in S805, and in S806, outputs warning information prompting correction so as to bias the frequency used in the content, and the operation flow ends.

一方、例えば、リップシンク以外の期間における光源１０１の発光波形に含まれる周波数成分が形成する帯域が、任意の周波数に対して±２０％で表すことが可能な範囲に収まる幅で分布しているとする。この場合、リップシンク以外の期間の光源１０１の発光波形に含まれる周波数成分に偏りがあり、リップシンクの期間の光アクションの周波数を、リップシンク以外の期間における光源１０１の発光波形に含まれる周波数成分を避けて設定することが可能である。そのため、制御部４０１は、Ｓ８０５でＹＥＳと判定してよく、フローはＳ８０７に進む。なお、Ｓ８０５における偏りの判定は、これに限定されるものではなく、その他の手法で実行されてもよい。例えば、偏りを判定するために用いる周波数範囲の幅は、±２０％に限定されるものではなく、±５％から±５０％などその他の範囲に設定されてもよい。また、偏りの判定に用いる周波数の幅は、％（パーセント）で表されなくてもよく、例えば、０．５～２．０Ｈｚなど、所定の幅で偏りの判定が実行されてもよい。 On the other hand, for example, the band formed by the frequency components included in the light emission waveform of the light source 101 in the period other than the lip sync is distributed within a range that can be represented by ±20% with respect to any frequency. and In this case, the frequency components included in the light emission waveform of the light source 101 during the period other than the lip-sync period are biased, and the frequency of the light action during the lip-sync period is the frequency included in the light emission waveform of the light source 101 during the period other than the lip-sync period. It is possible to set avoiding components. Therefore, the control unit 401 may determine YES in S805, and the flow proceeds to S807. Note that the bias determination in S805 is not limited to this, and may be performed by other methods. For example, the width of the frequency range used to determine the bias is not limited to ±20%, and may be set to other ranges such as ±5% to ±50%. Also, the width of frequencies used for bias determination need not be expressed in % (percentage), and bias determination may be performed in a predetermined width such as 0.5 to 2.0 Hz, for example.

Ｓ８０７において制御部４０１は、リップシンク以外の期間における光源１０１の発光波形を代表する代表周波数成分を特定する。一例では、制御部４０１は、リップシンク以外の期間における光源１０１の発光波形の周波数成分のうち、最大ピークの周波数をリップシンク以外の期間における光源１０１の発光波形を代表する代表周波数成分として特定してよい。 In S807, the control unit 401 identifies a representative frequency component that represents the light emission waveform of the light source 101 during a period other than lip sync. In one example, the control unit 401 specifies the maximum peak frequency among the frequency components of the light emission waveform of the light source 101 during the period other than the lip sync period as the representative frequency component representing the light emission waveform of the light source 101 during the period other than the lip sync period. you can

なお、代表周波数成分の特定は、これに限定されるものではなく、その他の手法で決定されてもよい。例えば、別の実施形態では、制御部４０１は、リップシンク以外の期間における光源１０１の発光波形の周波数成分をエネルギーの高い周波数成分順にソートする。そして、制御部４０１は、エネルギーの値が上位３０％に入る周波数成分の周波数の平均値を、代表周波数成分として特定してよい。また、平均値の算出の際には、エネルギーの値で重みづけをした重みづけ平均を用いてもよい。更に別の実施形態では、制御部４０１は、リップシンク以外の期間における光源１０１の発光波形の周波数成分をエネルギーの高い周波数成分順にソートし、エネルギーの値が上位３０％に入る成分が形成する周波数帯域を代表周波数成分として特定してもよい。 Note that the identification of the representative frequency component is not limited to this, and may be determined by other methods. For example, in another embodiment, the control unit 401 sorts the frequency components of the light emission waveform of the light source 101 in periods other than the lip-sync period in descending order of energy. Then, the control section 401 may specify the average value of the frequencies of the frequency components whose energy values are in the top 30% as the representative frequency component. Also, when calculating the average value, a weighted average weighted by the energy value may be used. In yet another embodiment, the control unit 401 sorts the frequency components of the light emission waveform of the light source 101 in the period other than the lip-sync period in order of the frequency components with the highest energy, and the frequencies formed by the components whose energy values are in the top 30%. A band may be identified as a representative frequency component.

Ｓ８０８において制御部４０１は、代表周波数成分に基づいて、リップシンクの期間における光源１０１の発光波形において抑制する抑制対象の周波数成分を決定し、決定した抑制対象の周波数成分を抑制する。例えば、Ｓ８０７で代表周波数成分として１つの周波数を特定した場合、制御部４０１は、代表周波数成分のプラスおよびマイナス方向に所定の幅（例えば、±０．２Ｈｚ、代表する周波数×０．１Ｈｚの幅など）の帯域を抑制対象の周波数成分として決定する。そして、制御部４０１は、リップシンクの期間における光源１０１の発光波形のうちで、抑制対象の周波数成分を減衰させる。また、減衰の強度は、減衰帯域において１０～５０％の通過利得に設定されてよい。一例では、減衰の強度は、抑制対象の周波数成分で形成される減衰帯域において３０％の通過利得、および減衰帯域の中心の周波数において２０％の通過利得に設定されてよい。 In step S<b>808 , the control unit 401 determines suppression target frequency components to be suppressed in the light emission waveform of the light source 101 during the lip sync period based on the representative frequency components, and suppresses the determined suppression target frequency components. For example, when one frequency is identified as the representative frequency component in S807, the control unit 401 sets a predetermined width (for example, ±0.2 Hz, representative frequency×0.1 Hz width) in the plus and minus directions of the representative frequency component. etc.) is determined as the frequency component to be suppressed. Then, the control unit 401 attenuates the suppression target frequency component in the light emission waveform of the light source 101 during the lip-sync period. Also, the strength of attenuation may be set to a pass gain of 10-50% in the attenuation band. In one example, the strength of attenuation may be set to 30% pass gain in the attenuation band formed by the frequency components to be suppressed and 20% pass gain at the center frequency of the attenuation band.

また、例えば、代表周波数成分として、Ｓ８０７で周波数帯域が特定された場合には、制御部４０１は、代表周波数成分を抑制対象の周波数帯域とし、抑制対象の周波数帯域を５０～１０％の通過利得で減衰させてよい。一例では、制御部４０１は、抑制対象の周波数帯域を３０％の通過利得で減衰させる。 Further, for example, when the frequency band is specified in S807 as the representative frequency component, the control unit 401 sets the representative frequency component as the frequency band to be suppressed, and sets the frequency band to be suppressed to a pass gain of 50 to 10%. can be attenuated by In one example, the control unit 401 attenuates the suppression target frequency band with a pass gain of 30%.

また、減衰に用いる帯域通過フィルタは、様々な方式で実装することができる。一例では、帯域通過フィルタは、光源１０１の発光波形をＦＦＴして、対象の周波数成分を減衰させた後、ＩＦＦＴ（inverse fast Fourier transform）により減衰後の発光波形データを取得することで実装されてよい。或いは、帯域通過フィルタは、ＦＩＲ（Finite Impulse Response）フィルタ等の実空間のフィルタを用いて近似的に実装されてもよい。 Also, the bandpass filters used for attenuation can be implemented in various ways. In one example, the bandpass filter is implemented by performing an FFT on the emission waveform of the light source 101 to attenuate the frequency component of interest, and then obtaining the attenuated emission waveform data by an IFFT (inverse fast Fourier transform). good. Alternatively, the bandpass filter may be approximately implemented using a real-space filter such as an FIR (Finite Impulse Response) filter.

Ｓ８０９において制御部４０１は、取得した減衰後の発光波形のデータをリップシンクの期間における光アクションのデータとして記憶部４０２に保存し、本動作フローは終了する。一例では、制御部４０１は、コンテンツ情報７００の発話がリップシンクに設定されているエントリと対応づけて減衰後の発光波形データを記憶部４０２に保存してよい。 In S<b>809 , the control unit 401 stores the acquired light emission waveform data after attenuation in the storage unit 402 as light action data in the lip-sync period, and the operation flow ends. In one example, the control unit 401 may store the post-attenuation light emission waveform data in the storage unit 402 in association with the entry in which the utterance of the content information 700 is set to lip sync.

以上で述べたように、図８の動作フローによれば、制御部４０１は、クリエイターにより定義された定義アクションの期間における光源１０１の発光と区別可能に、リップシンクの期間の光アクションを生成することができる。 As described above, according to the operation flow of FIG. 8, the control unit 401 generates the light action during the lip-sync period so as to be distinguishable from the light emission of the light source 101 during the defined action period defined by the creator. be able to.

（変形例１）
続いて、第１の実施形態の変形例を説明する。上述の実施形態では、例えば、Ｓ８０４～Ｓ８０７の処理で、コンテンツに含まれるリップシンク以外の期間における光アクションの周波数を用いて、偏りの判定や、抑制対象の周波数成分の決定を行っている。しかしながら、実施形態はこれに限定されるものではない。例えば、クリエイターが定義した光アクションと、リップシンクによる光アクションとの混同を防ぐには、切り替わりの前後において周波数を異ならせれば十分なことがある。そのため、変形例では、制御部４０１は、クリエイターが定義した光アクションのうちで、切り替わりの時点から所定期間内にある光アクションの周波数成分を用いて、偏りの判定や、抑制対象の周波数成分の決定を行う。 (Modification 1)
Next, a modified example of the first embodiment will be described. In the above-described embodiment, for example, in the processing of S804 to S807, the frequency of the light action in the period other than the lip-sync period included in the content is used to determine the bias and determine the frequency component to be suppressed. However, embodiments are not so limited. For example, different frequencies before and after a switch may be sufficient to prevent confusion between creator-defined light actions and lip-sync light actions. Therefore, in the modified example, the control unit 401 uses the frequency components of the light actions defined by the creator within a predetermined period from the time of switching to determine bias and determine frequency components to be suppressed. make a decision.

図９は、実施形態に係る所定期間の光アクションに基づいて、偏りの判定や、抑制対象の周波数成分の決定を行う例を示す図である。例えば、制御部４０１は、図９（ａ）に示すように、リップシンクの前に隣接するクリエイターが定義した光アクションにおいて、リップシンクに切り替わる直前の所定期間の波形を取得する。そして、制御部４０１は、取得した所定期間の波形の周波数成分に基づいて、偏りの判定や、抑制対象の周波数成分の決定を行ってよい。 FIG. 9 is a diagram illustrating an example of determination of bias and determination of frequency components to be suppressed based on light actions for a predetermined period according to the embodiment. For example, as shown in FIG. 9A, the control unit 401 acquires a waveform for a predetermined period immediately before switching to lip-sync in the adjacent creator-defined light action before lip-sync. Then, the control unit 401 may determine the bias and determine the frequency component to be suppressed based on the acquired frequency component of the waveform for the predetermined period.

また、制御部４０１は、図９（ｂ）に示すように、リップシンクの後に隣接するクリエイターが定義した光アクションにおいて、リップシンクから切り替わった直後の所定期間の波形を取得する。そして、制御部４０１は、取得した所定期間の波形の周波数成分に基づいて、偏りの判定や、抑制対象の周波数成分の決定を行ってよい。 In addition, as shown in FIG. 9B, the control unit 401 acquires a waveform for a predetermined period immediately after switching from lip-sync in an adjacent creator-defined light action after lip-sync. Then, the control unit 401 may determine the bias and determine the frequency component to be suppressed based on the acquired frequency component of the waveform for the predetermined period.

更には、制御部４０１は、例えば、図９（ｃ）に示すように、リップシンクの前後に隣接するクリエイターが定義した光アクションと、リップシンクとの切り替わり時点から所定期間にあるクリエイターが定義した光アクションの波形を取得する。そして、制御部４０１は、取得した所定期間の波形の周波数成分に基づいて、偏りの判定や、抑制対象の周波数成分の決定を行ってよい。 Furthermore, for example, as shown in FIG. 9C, the control unit 401 controls the light action defined by the creators adjacent before and after the lip-sync and the creator-defined Get the waveform of the light action. Then, the control unit 401 may determine the bias and determine the frequency component to be suppressed based on the acquired frequency component of the waveform for the predetermined period.

以上のように、制御部４０１は、上述のＳ８０４において所定期間の光源１０１の発光波形を取得し、周波数を分析してよい。そして、Ｓ８０５では、制御部４０１は、抽出した所定期間の光源１０１の発光波形の周波数成分に偏りがあるか否かを判定してよい。また、Ｓ８０７およびＳ８０８において制御部４０１は、抽出した所定期間の光源１０１の発光波形の周波数成分に基づいて代表周波数成分を特定し、特定した代表周波数成分から抑制対象の周波数成分を決定してよい。一例では、制御部４０１は、抽出した所定期間の光源１０１の発光波形の周波数成分のうち、最大ピークの周波数を代表周波数成分として特定してよい。 As described above, the control unit 401 may acquire the light emission waveform of the light source 101 for a predetermined period and analyze the frequency in S804 described above. Then, in S805, the control unit 401 may determine whether or not there is a bias in the extracted frequency components of the light emission waveform of the light source 101 during the predetermined period. In S807 and S808, the control unit 401 may specify a representative frequency component based on the extracted frequency component of the light emission waveform of the light source 101 during the predetermined period, and determine the frequency component to be suppressed from the specified representative frequency component. . In one example, the control unit 401 may specify, as the representative frequency component, the maximum peak frequency among the extracted frequency components of the light emission waveform of the light source 101 during the predetermined period.

例えば、以上のように、切り替わりから所定期間にあるクリエイターが定義した光アクションの発光波形を用いることで、クリエイターが定義した光アクションとリップシンクによる光アクションとの混同を効率的に抑制することができる。また、クリエイターは、所定期間以外の期間において自由に光源１０１の発光波形を設定して光アクションを生成することができる。 For example, as described above, by using the light emission waveform of the light action defined by the creator in a predetermined period after switching, confusion between the light action defined by the creator and the light action by lip sync can be efficiently suppressed. can. In addition, the creator can freely set the light emission waveform of the light source 101 during a period other than the predetermined period to generate a light action.

（変形例２）
続いて、第１の実施形態の別の変形例を説明する。上述のように、例えば、クリエイターが定義した光アクションと、リップシンクによる光アクションとの混同を防ぐには、切り替わりの前後において周波数を異ならせれば十分なことがある。そのため、以下の変形例では、制御部４０１は、Ｓ８０８の処理でクリエイターが定義した光アクションとリップシンクによる光アクションとの切り替わりの時点では、抑制対象の周波数成分の減衰強度を強くし、それ以外の期間では弱くするように制御する。 (Modification 2)
Next, another modified example of the first embodiment will be described. As noted above, it may be sufficient to have different frequencies before and after the switch to prevent confusion between, for example, creator-defined light actions and lip-sync light actions. Therefore, in the following modified example, the control unit 401 increases the attenuation strength of the frequency component to be suppressed at the time of switching between the light action defined by the creator in the process of S808 and the light action by lip sync, and is controlled to be weak during the period of

図１０は、変形例に係る抑制対象の周波数成分の抑制制御を例示する図である。図１０は、縦軸にフィルタの制御強度をとり、横軸にリップシンクにおける経過時間をとったグラフである。図１０に示す例では、リップシンクの開始時と終了時において、フィルタの制御強度を１００％としている。 FIG. 10 is a diagram illustrating suppression control of suppression target frequency components according to the modification. FIG. 10 is a graph in which the vertical axis represents the control strength of the filter and the horizontal axis represents the elapsed time in lip sync. In the example shown in FIG. 10, the filter control strength is 100% at the start and end of lip-sync.

また、図１０の例では、リップシンクの開始時からの経過時間に応じてフィルタ強度を下げており、所定時間経過後に０％としている。また、図１０の例では、リップシンクの終了時から所定時間前においてフィルタ強度を０％としており、そこから終了時間までの期間でフィルタ強度を徐々に上げている。 Further, in the example of FIG. 10, the filter strength is lowered according to the elapsed time from the start of the lip-sync, and is set to 0% after the elapse of a predetermined time. In the example of FIG. 10, the filter strength is set to 0% a predetermined time before the end of the lip-sync, and the filter strength is gradually increased during the period from then until the end time.

このようなフィルタの強度の制御は、例えば、以下により実行することができる。例えば、リップシンクへの切り替わりからの経過時間をＴ１とする。また、リップシンクの開始から終了までにかかる発話の所要時間は、例えば、発話の文字列などから見積もることができる。そして、リップシンクの残り時間は、発話の所要時間から経過時間：Ｔ１を差し引くことで求めることができ、この残り時間をＴ２とする。この場合に、フィルタの制御強度は、Ｔ１およびＴ２のいずれかが小さな場合に強くすればよい。例えば、Ｔ１およびＴ２のうちの小さい方の時間をＴとし、フィルタの強度の制御を行う所定時間の長さを１０秒とした場合、フィルタ強度は以下の式１で設定することができる。
（１０－Ｔ）／１０×１００［％］・・・式１ Controlling the strength of such filters can be performed, for example, by: For example, the elapsed time from switching to lip sync is assumed to be T1. Also, the time required for speech from the start to the end of lip-sync can be estimated from, for example, the character string of the speech. The remaining time of lip-sync can be obtained by subtracting the elapsed time: T1 from the time required for speech, and this remaining time is T2. In this case, the filter control strength should be increased when either T1 or T2 is small. For example, if the smaller one of T1 and T2 is T, and the predetermined length of time for controlling the strength of the filter is 10 seconds, the filter strength can be set by Equation 1 below.
(10−T)/10×100 [%] Formula 1

この場合、Ｔが１０秒以上ではフィルタの制御強度は０％となり、また、Ｔが１０以下の値である場合、Ｔが小さくなるにつれてフィルタの制御強度を強くすることができる。なお、フィルタの制御強度：１００％では、例えば、Ｓ８０８で抑制対象の周波数帯域を３０％の通過利得に設定している場合、制御部４０１は、３０％に対してフィルタの制御強度を１００％とし、３０％の通過利得でフィルタを動作させてよい。また、制御部４０１は、フィルタの制御強度：５０％では、３０％に対してフィルタの制御強度を５０％とし、１５％の通過利得でフィルタを動作させてよい。 In this case, when T is 10 seconds or more, the filter control strength is 0%, and when T is a value of 10 or less, the filter control strength can be increased as T becomes smaller. Note that when the filter control strength is 100%, for example, if the frequency band to be suppressed is set to a pass gain of 30% in S808, the control unit 401 sets the filter control strength to 100% for 30%. , and the filter may be operated with a pass gain of 30%. Further, when the control strength of the filter is 50%, the control section 401 may set the control strength of the filter to 50% for 30% and operate the filter with a pass gain of 15%.

例えば、以上で述べたように、フィルタをかける期間や強度を制御することで、リップシンクの期間においても、フィルタの強度を弱くしている期間では発話に応じて光源１０１を幅広い表現で発光させることができる。 For example, as described above, by controlling the period and intensity of filtering, the light source 101 is caused to emit light in a wide range of expressions according to the speech during the period when the intensity of the filter is weakened even during the lip-sync period. be able to.

また、例えば、図９（ｃ）で例示するように、リップシンクの前後のクリエイターの光アクションでリップシンクの期間における光源１０１の発光波形を制御する場合に、前と後のクリエイターの光アクションで使用している周波数が異なることもある。この場合、リップシンクの開始時に減衰させる周波数をリップシンクの前のクリエイターの光アクションから決定し、リップシンクの終了時に減衰させる周波数をリップシンクの後のクリエイターの光アクションから決定するというように、個別に決定してもよい。また、このようにリップシンクの開始時と終了時とで減衰させる周波数を個別に決定したとしても、図１０で述べたように、例えば、フィルタの制御強度を０％にするなど一旦弱めることで、減衰させる周波数が変わっても視聴者の違和感を抑えることができる。 Further, for example, as illustrated in FIG. 9C, when controlling the light emission waveform of the light source 101 during the period of lip sync with the light actions of the creator before and after the lip sync, the light actions of the creator before and after the lip sync Different frequencies may be used. In this case, the frequency to be attenuated at the start of lip-sync is determined from the creator's light action before lip-sync, and the frequency to be attenuated at the end of lip-sync is determined from the creator's light action after lip-sync. You can decide on an individual basis. Further, even if the frequencies to be attenuated at the start and end of lip-sync are separately determined in this way, as described with reference to FIG. , even if the frequency to be attenuated is changed, the viewer's discomfort can be suppressed.

（第２の実施形態）
続いて、第２の実施形態を説明する。上述の実施形態では、クリエイターがコンテンツを生成する際などに、コンテンツの生成に用いるコンピュータなどの情報処理装置を生成装置４００として、リップシンクにおける光アクションの生成処理が実行される場合を例示している。しかしながら、実施形態はこれに限定されるものではなく、実施形態に係るリップシンクにおける光アクションの生成処理は、その他のタイミングおよびその他の装置において実行されてもよい。一例では、実施形態に係るリップシンクにおける光アクションの生成処理は、コミュニケーション装置１００において実行されてもよい。 (Second embodiment)
Next, a second embodiment will be described. In the above-described embodiment, when a creator generates content, an information processing device such as a computer used for content generation is used as the generation device 400, and light action generation processing in lip sync is executed as an example. there is However, embodiments are not limited to this, and the process of generating light actions in lip-syncing according to embodiments may be performed at other timings and with other devices. In one example, the process of generating a light action in lip-syncing according to the embodiment may be performed in the communication device 100 .

図１１は、実施形態に係るコミュニケーション装置１００のブロック構成を例示する図である。コミュニケーション装置１００は、例えば、制御部１１０１、記憶部１１０２、光源制御部１１０３、および光源１０１を含む。制御部１１０１は、例えば特定部１１１１および調整部１１１２などを含む。コミュニケーション装置１００の記憶部１１０２は、例えば、上述のコンテンツ情報７００などの情報を記憶している。光源制御部１１０３は、例えば、制御部１１０１の指示に従って、光源１０１の発光強度を制御する。 FIG. 11 is a diagram illustrating the block configuration of the communication device 100 according to the embodiment. Communication device 100 includes, for example, control unit 1101 , storage unit 1102 , light source control unit 1103 , and light source 101 . The control unit 1101 includes, for example, a specifying unit 1111 and an adjusting unit 1112. The storage unit 1102 of the communication device 100 stores information such as the content information 700 described above, for example. The light source control unit 1103 controls the light emission intensity of the light source 101 according to instructions from the control unit 1101, for example.

図１２は、実施形態に係るコミュニケーション装置１００の制御部１１０１が実行するリップシンクにおける光アクションの生成処理の動作フローを例示する図である。例えば、コミュニケーション装置１００の制御部１１０１は、コンテンツの動作の実行指示が入力されると、図１２の動作フローを開始してよい。 FIG. 12 is a diagram illustrating an operational flow of light action generation processing in lip-sync performed by the control unit 1101 of the communication device 100 according to the embodiment. For example, the control unit 1101 of the communication device 100 may start the operation flow of FIG. 12 when an instruction to execute a content operation is input.

なお、Ｓ１２０１からＳ１２０７の処理は、図８のＳ８０１からＳ８０４、Ｓ８０７、およびＳ８０８の処理とそれぞれ対応していてよい。例えば、制御部１１０１は、Ｓ１２０１からＳ１２０７において、Ｓ８０１からＳ８０４、Ｓ８０７、およびＳ８０８の処理と同様の処理を実行してよい。なお、制御部１１０１は、Ｓ１２０１においてコンテンツ情報７００を記憶部１１０２から読み出してよい。 Note that the processing from S1201 to S1207 may correspond to the processing from S801 to S804, S807, and S808 in FIG. 8, respectively. For example, in S1201 to S1207, the control unit 1101 may perform the same processing as the processing in S801 to S804, S807, and S808. Note that the control unit 1101 may read the content information 700 from the storage unit 1102 in S1201.

Ｓ１２０８において制御部１１０１は、Ｓ１２０１からＳ１２０７の処理で決定されたリップシンクの期間における光源１０１の発光波形を用いて、コンテンツの動作を実行し、本動作フローは終了する。 In S1208, the control unit 1101 executes the content operation using the light emission waveform of the light source 101 during the lip-sync period determined in the processing from S1201 to S1207, and the operation flow ends.

以上で述べたように、実施形態に係るリップシンクの期間における光源１０１の発光制御は、コミュニケーション装置１００において実行することもできる。 As described above, the light emission control of the light source 101 during the lip sync period according to the embodiment can also be executed in the communication device 100 .

なお、コミュニケーション装置１００がコンテンツの中で実行する動作は、動的に生成されることもある。例えば、コミュニケーション装置１００が、センサで検出した視聴者の年齢や性別に応じて、コンテンツで実行するアクションの内容を動的に選択することがある。一例として、視聴者が小学生などの子供である場合、「今日は、小学生がたくさん来てくれたね。僕と一緒に踊ろうよ！」などのアップテンポで元気なコンテンツを流し、一方、視聴者が高齢者である場合、比較的落ち着いたコンテンツを流すといった制御を行うことがある。このように、例えば、コンテンツが動的に生成される場合にも、生成されたコンテンツに対して、制御部１１０１は、実施形態に係るリップシンクにおける光アクションの生成処理を実行してリップシンクの期間の光源１０１の発光を制御してよい。 Note that the action that the communication device 100 performs in the content may be dynamically generated. For example, the communication device 100 may dynamically select the details of the action to be executed in the content according to the viewer's age and gender detected by a sensor. As an example, if the viewer is a child such as an elementary school student, up-tempo and lively content such as "A lot of elementary school students came today. Let's dance with me!" When the user is an elderly person, control may be performed such that relatively calm content is played. In this way, for example, even when content is dynamically generated, the control unit 1101 performs the lip-sync light action generation process according to the embodiment on the generated content to perform lip-sync. Light emission of the light source 101 during the period may be controlled.

以上において、実施形態を例示したが、実施形態はこれに限定されるものではない。例えば、上述の動作フローは例示であり、実施形態はこれに限定されるものではない。可能な場合には、動作フローは、処理の順番を変更して実行されてもよく、別に更なる処理を含んでもよく、又は、一部の処理が省略されてもよい。例えば、図８においてＳ８０５とＳ８０６の処理は、クリエイターに修正を促さない場合には、省略されてもよい。 Although the embodiment has been exemplified above, the embodiment is not limited to this. For example, the operational flow described above is an example, and embodiments are not limited thereto. If possible, the operation flow may be executed by changing the order of the processes, may include additional processes, or may omit some of the processes. For example, the processing of S805 and S806 in FIG. 8 may be omitted if the creator is not prompted to make corrections.

また、上述の実施形態では、コミュニケーション装置１００がロボットである場合を例に説明を行っているが、実施形態はこれに限定されるものではない。例えば、別の実施形態では、スピーカと光源を含み、視聴者とのコミュニケーションを行うコミュニケーション装置１００において、実施形態が適用されてもよい。即ち、例えば、コミュニケーション装置１００は、腕などの稼働する関節を含まなくてもよく、その場合、コンテンツ情報７００は、関節角の情報を含まなくてもよい。また、上述の実施形態において、光源１０１の発光強度の制御には、例えば、パルス幅変調が用いられてもよい。 Further, in the above-described embodiment, the case where the communication device 100 is a robot is described as an example, but the embodiment is not limited to this. For example, in another embodiment, embodiments may be applied in a communication device 100 that includes speakers and light sources to communicate with an audience. In other words, for example, the communication device 100 may not include moving joints such as arms, and in that case, the content information 700 may not include joint angle information. Further, in the above-described embodiments, for example, pulse width modulation may be used to control the emission intensity of the light source 101 .

なお、上述の実施形態において、図８のＳ８０１～Ｓ８０７までの処理では、生成装置４００の制御部４０１は、例えば、特定部４１１として動作する。また、Ｓ８０８の処理では、生成装置４００の制御部４０１は、例えば、調整部４１２として動作する。図１２のＳ１２０１～Ｓ１２０６までの処理では、コミュニケーション装置１００の制御部１１０１は、例えば、特定部１１１１として動作する。また、Ｓ１２０７の処理では、コミュニケーション装置１００の制御部１１０１は、例えば、調整部１１１２として動作する。 Note that in the above-described embodiment, the control unit 401 of the generation device 400 operates as the identification unit 411, for example, in the processes from S801 to S807 in FIG. Also, in the process of S808, the control unit 401 of the generation device 400 operates as the adjustment unit 412, for example. In the processing from S1201 to S1206 in FIG. 12, the control unit 1101 of the communication device 100 operates as the identification unit 1111, for example. Also, in the process of S1207, the control unit 1101 of the communication device 100 operates as the adjustment unit 1112, for example.

図１３は、実施形態に係る生成装置４００を実現するための例えばコンピュータなどの情報処理装置１３００のハードウェア構成を例示する図である。図１３の生成装置４００を実現するためのハードウェア構成は、例えば、プロセッサ１３０１、メモリ１３０２、記憶装置１３０３、読取装置１３０４、通信インタフェース１３０６、及び入出力インタフェース１３０７を備える。なお、プロセッサ１３０１、メモリ１３０２、記憶装置１３０３、読取装置１３０４、通信インタフェース１３０６、入出力インタフェース１３０７は、例えば、バス１３０８を介して互いに接続されている。 FIG. 13 is a diagram illustrating a hardware configuration of an information processing device 1300 such as a computer for realizing the generation device 400 according to the embodiment. A hardware configuration for realizing the generating device 400 of FIG. 13 includes, for example, a processor 1301, a memory 1302, a storage device 1303, a reading device 1304, a communication interface 1306, and an input/output interface 1307. Note that the processor 1301, memory 1302, storage device 1303, reader 1304, communication interface 1306, and input/output interface 1307 are connected to each other via a bus 1308, for example.

プロセッサ１３０１は、例えば、シングルプロセッサであっても、マルチプロセッサやマルチコアであってもよい。プロセッサ１３０１は、メモリ１３０２を利用して例えば上述の図８の動作フローの手順を記述したプログラムを実行することにより、上述した制御部４０１の一部または全部の機能を提供する。例えば、生成装置４００のプロセッサ１３０１は、記憶装置１３０３に記憶されているプログラムを読み出して実行することで、上述の特定部４１１および調整部４１２として動作してよい。 The processor 1301 may be, for example, a single processor, multiple processors, or multiple cores. The processor 1301 provides some or all of the functions of the control unit 401 described above by executing a program describing the procedure of the operation flow of FIG. 8 described above, for example, using the memory 1302 . For example, the processor 1301 of the generation device 400 may operate as the identification unit 411 and adjustment unit 412 described above by reading and executing a program stored in the storage device 1303 .

メモリ１３０２は、例えば半導体メモリであり、ＲＡＭ領域及びＲＯＭ領域を含んでいてよい。記憶装置１３０３は、例えばハードディスク、フラッシュメモリ等の半導体メモリ、又は外部記憶装置である。なお、ＲＡＭは、Random Access Memoryの略称である。また、ＲＯＭは、Read Only Memoryの略称である。 The memory 1302 is, for example, a semiconductor memory and may include a RAM area and a ROM area. The storage device 1303 is, for example, a hard disk, a semiconductor memory such as a flash memory, or an external storage device. Note that RAM is an abbreviation for Random Access Memory. Also, ROM is an abbreviation for Read Only Memory.

読取装置１３０４は、プロセッサ１３０１の指示に従って着脱可能記憶媒体１３０５にアクセスする。着脱可能記憶媒体１３０５は、例えば、半導体デバイス（ＵＳＢメモリ等）、磁気的作用により情報が入出力される媒体（磁気ディスク等）、光学的作用により情報が入出力される媒体（ＣＤ－ＲＯＭ、ＤＶＤ等）などにより実現される。なお、ＵＳＢは、Universal Serial Busの略称である。ＣＤは、Compact Discの略称である。ＤＶＤは、Digital Versatile Diskの略称である。上述の記憶部４０２は、例えばメモリ１３０２、記憶装置１３０３、及び着脱可能記憶媒体１３０５を含んでいる。例えば、生成装置４００の記憶装置１３０３には、コンテンツ情報７００が格納されていてよい。 Reader 1304 accesses removable storage medium 1305 according to instructions from processor 1301 . The removable storage medium 1305 is, for example, a semiconductor device (USB memory, etc.), a medium for inputting/outputting information by magnetic action (magnetic disk, etc.), a medium for inputting/outputting information by optical action (CD-ROM, DVD, etc.). Note that USB is an abbreviation for Universal Serial Bus. CD is an abbreviation for Compact Disc. DVD is an abbreviation for Digital Versatile Disk. The storage unit 402 described above includes, for example, a memory 1302 , a storage device 1303 , and a removable storage medium 1305 . For example, the content information 700 may be stored in the storage device 1303 of the generation device 400 .

通信インタフェース１３０６は、プロセッサ１３０１の指示に従ってネットワークや他の装置とデータを送受信する。入出力インタフェース１３０７は、例えば、入力装置及び出力装置との間のインタフェースであってよい。入力装置は、例えばユーザからの指示を受け付けるキーボードやマウスなどのデバイスである。出力装置は、例えばディスプレーなどの表示装置、及びスピーカなどの音声装置である。 Communication interface 1306 transmits and receives data to and from a network and other devices according to instructions from processor 1301 . Input/output interface 1307 may be, for example, an interface between an input device and an output device. The input device is, for example, a device such as a keyboard or mouse that receives instructions from the user. The output device is, for example, a display device such as a display and an audio device such as a speaker.

また、図１４は、実施形態に係るコミュニケーション装置１００を実現するためのハードウェア構成を例示する図である。図１４のコミュニケーション装置１００を実現するためのハードウェア構成は、情報処理装置１４００、音出力装置１４０９、光源制御回路１４１１、光源１０１、駆動制御回路１４２１、および駆動回路１４２２を含む。情報処理装置１４００は、例えば、プロセッサ１４０１、メモリ１４０２、通信インタフェース１４０６、及び入出力インタフェース１４０７を備える。なお、プロセッサ１４０１、メモリ１４０２、通信インタフェース１４０６、入出力インタフェース１４０７は、例えば、バス１４０８を介して互いに接続されている。 Also, FIG. 14 is a diagram illustrating a hardware configuration for realizing the communication device 100 according to the embodiment. A hardware configuration for realizing communication device 100 in FIG. The information processing device 1400 includes, for example, a processor 1401 , a memory 1402 , a communication interface 1406 and an input/output interface 1407 . Note that the processor 1401, memory 1402, communication interface 1406, and input/output interface 1407 are connected to each other via a bus 1408, for example.

プロセッサ１４０１は、例えば、シングルプロセッサであっても、マルチプロセッサやマルチコアであってもよい。プロセッサ１４０１は、メモリ１４０２を利用して例えば上述の図１２の動作フローの手順を記述したプログラムを実行することにより、上述した制御部１１０１の一部または全部の機能を提供する。例えば、コミュニケーション装置１００のプロセッサ１４０１は、メモリ１４０２に記憶されているプログラムを読み出して実行することで、上述の特定部１１１１および調整部１１１２として動作してよい。 The processor 1401 may be, for example, a single processor, multiple processors, or multiple cores. The processor 1401 provides some or all of the functions of the control unit 1101 described above by executing a program describing the procedure of the operation flow of FIG. 12 described above, for example, using the memory 1402 . For example, the processor 1401 of the communication device 100 may operate as the identifying unit 1111 and the adjusting unit 1112 described above by reading and executing programs stored in the memory 1402 .

メモリ１４０２は、例えば半導体メモリであり、ＲＡＭ領域及びＲＯＭ領域を含んでいてよい。メモリ１４０２は、例えば、上述の記憶部１１０２の一例である。例えば、コミュニケーション装置１００のメモリ１４０２には、コンテンツ情報７００が格納されていてよい。 The memory 1402 is, for example, a semiconductor memory and may include a RAM area and a ROM area. The memory 1402 is, for example, an example of the storage unit 1102 described above. For example, the memory 1402 of the communication device 100 may store the content information 700 .

通信インタフェース１４０６は、プロセッサ１４０１の指示に従ってネットワークや他の装置とデータを送受信する。入出力インタフェース１４０７は、例えば、入力装置及び出力装置との間のインタフェースであってよい。入出力インタフェース１４０７には、例えば、スピーカなどの音出力装置１４０９が接続されている。音出力装置１４０９は、例えば、プロセッサ１４０１の指示に従って、コンテンツ情報７００の発話に登録されている文字列を発話した音声を出力してよい。また、入力装置は、例えばユーザからの指示を受け付けるボタンやタッチパネルなどのデバイスであってよい。 Communication interface 1406 transmits and receives data to and from a network and other devices as directed by processor 1401 . Input/output interface 1407 may be, for example, an interface between an input device and an output device. A sound output device 1409 such as a speaker is connected to the input/output interface 1407, for example. The sound output device 1409 may, for example, according to the instruction of the processor 1401, output the sound of uttering the character string registered in the utterance of the content information 700. FIG. Also, the input device may be a device such as a button or a touch panel that receives an instruction from the user, for example.

また、光源制御回路１４１１は、プロセッサ１４０１の指示に従って、ＬＥＤなどの光源１０１の発光を制御する。光源制御回路１４１１は、上述の光源制御部１１０３の一例である。駆動制御回路１４２１は、プロセッサ１４０１の指示に従って、モータなどのコミュニケーション装置１００の関節を駆動する駆動回路１４２２を制御する。なお、図１４のハードウェア構成が、ロボットではなく、スマートスピーカなどのハードウェア構成である場合には、例えば、駆動制御回路１４２１と、駆動回路１４２２とは、省略されてもよい。 Also, the light source control circuit 1411 controls light emission of the light source 101 such as an LED according to instructions from the processor 1401 . The light source control circuit 1411 is an example of the light source control section 1103 described above. Drive control circuit 1421 controls drive circuit 1422 that drives joints of communication device 100 such as motors, according to instructions from processor 1401 . Note that if the hardware configuration of FIG. 14 is that of a smart speaker instead of a robot, for example, the drive control circuit 1421 and the drive circuit 1422 may be omitted.

実施形態に係る各プログラムは、例えば、下記の形態で生成装置４００、およびコミュニケーション装置１００に提供される。
（１）記憶装置１３０３、およびメモリ１４０２に予めインストールされている。
（２）着脱可能記憶媒体１３０５により提供される。
（３）プログラムサーバなどのサーバから提供される。 Each program according to the embodiment is provided to the generation device 400 and the communication device 100 in the form described below, for example.
(1) It is pre-installed in the storage device 1303 and memory 1402 .
(2) provided by removable storage medium 1305;
(3) provided by a server such as a program server;

なお、図１３および図１４を参照して述べたハードウェア構成は、例示であり、実施形態はこれに限定されるものではない。例えば、上述の制御部４０１および制御部１１０１の一部または全部の機能がＦＰＧＡ及びＳｏＣなどによるハードウェアとして実装されてもよい。なお、ＦＰＧＡは、Field Programmable Gate Arrayの略称である。ＳｏＣは、System-on-a-chipの略称である。 Note that the hardware configuration described with reference to FIGS. 13 and 14 is an example, and the embodiment is not limited to this. For example, some or all of the functions of the control unit 401 and the control unit 1101 described above may be implemented as hardware such as FPGA and SoC. Note that FPGA is an abbreviation for Field Programmable Gate Array. SoC is an abbreviation for System-on-a-chip.

以上において、いくつかの実施形態が説明される。しかしながら、実施形態は上記の実施形態に限定されるものではなく、上述の実施形態の各種変形形態及び代替形態を包含するものとして理解されるべきである。例えば、各種実施形態は、その趣旨及び範囲を逸脱しない範囲で構成要素を変形して具体化できることが理解されよう。また、前述した実施形態に開示されている複数の構成要素を適宜組み合わせることにより、種々の実施形態が実施され得ることが理解されよう。更には、実施形態に示される全構成要素からいくつかの構成要素を削除して又は置換して、或いは実施形態に示される構成要素にいくつかの構成要素を追加して種々の実施形態が実施され得ることが当業者には理解されよう。 Several embodiments are described above. However, it should be understood that the embodiments are not limited to the embodiments described above, but encompass various variations and alternatives of the embodiments described above. For example, it will be appreciated that various embodiments may be embodied with varying elements without departing from the spirit and scope thereof. Also, it will be understood that various embodiments can be implemented by appropriately combining a plurality of constituent elements disclosed in the above-described embodiments. Furthermore, various embodiments can be implemented by deleting or replacing some components from all the components shown in the embodiments, or by adding some components to the components shown in the embodiments. It will be understood by those skilled in the art that

１００：コミュニケーション装置
１０１：光源
４００：生成装置
４０１：制御部
４０２：記憶部
４１１：特定部
４１２：調整部
１１０１：制御部
１１０２：記憶部
１１０３：光源制御部
１１１１：特定部
１１１２：調整部
１３００：情報処理装置
１３０１：プロセッサ
１３０２：メモリ
１３０３：記憶装置
１３０４：読取装置
１３０５：着脱可能記憶媒体
１３０６：通信インタフェース
１３０７：入出力インタフェース
１３０８：バス
１４００：情報処理装置
１４０１：プロセッサ
１４０２：メモリ
１４０６：通信インタフェース
１４０７：入出力インタフェース
１４０８：バス
１４０９：音出力装置
１４１１：光源制御回路
１４２１：駆動制御回路
１４２２：駆動回路 100: communication device 101: light source 400: generation device 401: control unit 402: storage unit 411: identification unit 412: adjustment unit 1101: control unit 1102: storage unit 1103: light source control unit 1111: identification unit 1112: adjustment unit 1300: Information processing device 1301 : Processor 1302 : Memory 1303 : Storage device 1304 : Reading device 1305 : Removable storage medium 1306 : Communication interface 1307 : Input/output interface 1308 : Bus 1400 : Information processing device 1401 : Processor 1402 : Memory 1406 : Communication interface 1407: input/output interface 1408: bus 1409: sound output device 1411: light source control circuit 1421: drive control circuit 1422: drive circuit

Claims

Analyzing the frequency components of the light emission waveform in the definition action of content including a defined action of emitting light with a defined light emission waveform and a lip-sync action of causing the light source to emit light in response to an uttered voice, an identifying unit that identifies a representative frequency component that represents an emission waveform;
an adjustment unit that adjusts the frequency used in the light emission waveform of the light source in the lip sync action to a frequency distinguishable from the representative frequency component;
An information processing device, including

The adjustment unit
Based on the representative frequency component, a frequency component to be suppressed including the representative frequency component is determined;
generating a light emission waveform of the light source in the lip sync action by attenuating the frequency component to be suppressed among the frequency components included in the light emission waveform of the light source generated in response to the uttered voice;
2. The information processing apparatus according to claim 1, characterized by:

3. The specifying unit outputs warning information prompting correction of the defined action when frequency components of the emission waveform in the defined action are not biased to satisfy a predetermined condition. The information processing device according to .

The specifying unit emits light of the first defined action within a predetermined period from timing of switching between a first defined action executed immediately before the lip sync action and the lip sync action among the defined actions. identifying the representative frequency component based on the waveform;
4. The information processing apparatus according to claim 2, wherein:

The specifying unit emits light of the first defined action within a predetermined period from timing of switching between a first defined action executed immediately before the lip sync action and the lip sync action among the defined actions. Based on the waveform, determine whether there is a bias that satisfies the predetermined condition;
4. The information processing apparatus according to claim 3, characterized by:

The adjustment unit sets a first intensity to attenuate the frequency component to be suppressed at a timing of switching between the first definition action and the lip sync action, and sets the first intensity to a first intensity. setting the strength to a second strength that attenuates the frequency component to be suppressed with a strength weaker than the first strength at a second timing after a predetermined time has elapsed from the timing of switching to the lip-sync action. 6. The information processing apparatus according to claim 4 or 5, characterized by:

The specifying unit emits light of the second defined action within a predetermined period from timing of switching between the second defined action executed immediately after the lip sync action and the lip sync action among the defined actions. identifying the representative frequency component based on the waveform;
5. The information processing apparatus according to any one of claims 2 to 4, characterized by:

The specifying unit emits light of the second defined action within a predetermined period from timing of switching between the second defined action executed immediately after the lip sync action and the lip sync action among the defined actions. Based on the waveform, determine whether there is a bias that satisfies the predetermined condition;
4. The information processing apparatus according to claim 3, characterized by:

The adjustment unit sets a first strength for attenuating the frequency component to be suppressed at the timing of switching between the second definition action and the lip-sync action, and performs the second definition action and the lip-sync action. setting the strength to a second strength that attenuates the frequency component to be suppressed with a strength weaker than the first strength at a second timing that is a predetermined time before the timing of switching to the lip-sync action. 9. The information processing apparatus according to claim 7 or 8, characterized by:

Analyzing the frequency components of the light emission waveform in the definition action of content including a defined action of emitting light with a defined light emission waveform and a lip-sync action of causing the light source to emit light in response to an uttered voice, Identify a representative frequency component that represents the light emission waveform,
Adjusting the frequency used in the light emission waveform of the light source in the lip sync action to a frequency distinguishable from the representative frequency component;
A light action generation method executed by an information processing device, comprising:

Analyzing the frequency components of the light emission waveform in the definition action of content including a defined action of emitting light with a defined light emission waveform and a lip-sync action of causing the light source to emit light in response to an uttered voice, Identify a representative frequency component that represents the light emission waveform,
Adjusting the frequency used in the light emission waveform of the light source in the lip sync action to a frequency distinguishable from the representative frequency component;
A light action generation program that causes an information processing device to execute processing.