JP2022531056A

JP2022531056A - Interactive target drive methods, devices, devices, and recording media

Info

Publication number: JP2022531056A
Application number: JP2021549865A
Authority: JP
Inventors: 林 ▲孫▼
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-03-31
Filing date: 2020-11-18
Publication date: 2022-07-06
Also published as: KR20210124306A; SG11202109201XA; TWI759039B; TW202138987A; CN111459451A; WO2021196647A1

Abstract

インタラクティブ対象の駆動方法、装置、デバイス、及び記録媒体を開示し、前記方法は、表示デバイスに展示されているインタラクティブ対象の音声駆動データを取得することと、前記音声駆動データに含まれている目標データに基づいて、前記目標データにマッチングするインタラクティブ対象の所定の動作の制御パラメータシーケンスを取得することと、得られた制御パラメータシーケンスに基づいて前記インタラクティブ対象が前記所定の動作を実行するように制御することと、を含む。【選択図】図２Disclosed is an interactive object driving method, apparatus, device, and recording medium, the method comprising: obtaining audio-driven data for an interactive object displayed on a display device; Obtaining a control parameter sequence of a predetermined action of an interactive object that matches the target data based on the data, and controlling the interactive object to perform the predetermined action based on the obtained control parameter sequence. including doing and [Selection drawing] Fig. 2

Description

本発明は、コンピュータ技術分野に関し、具体的には、インタラクティブ対象の駆動方法、装置、デバイス、及び記録媒体に関する。 The present invention relates to the field of computer technology, specifically, to driving methods, devices, devices, and recording media for interactive objects.

人間とコンピュータの相互作用の方法のほとんどは、ユーザがキーストローク、タッチ、音声によって入力し、デバイスがスクリーンに画像、テキスト、または仮想キャラクターを表示することによって応答する。現在、仮想キャラクターは主に音声アシスタントに基づいて改善されている。ユーザと仮想キャラクターのインタラクティブは、まだ表面上にある。 Most of the methods of human-computer interaction are that the user inputs by keystroke, touch, or voice, and the device responds by displaying an image, text, or virtual character on the screen. Currently, virtual characters are being improved primarily based on voice assistants. User-virtual character interactivity is still on the surface.

本発明の実施例は、インタラクティブ対象の駆動の技術的解決策を提供する。 The embodiments of the present invention provide a technical solution for driving an interactive object.

本発明の１態様によると、インタラクティブ対象の駆動方法を提供し、前記方法は、表示デバイスに展示されているインタラクティブ対象の音声駆動データを取得することと、前記音声駆動データに含まれている目標データに基づいて、前記目標データにマッチングするインタラクティブ対象の所定の動作の制御パラメータシーケンスを取得することと、得られた制御パラメータシーケンスに基づいて前記インタラクティブ対象が前記所定の動作を実行するように制御することと、を含む。 According to one aspect of the invention, a method of driving an interactive object is provided, wherein the method obtains the voice-driven data of the interactive object displayed on the display device and the target included in the voice-driven data. Based on the data, the control parameter sequence of the predetermined operation of the interactive object matching the target data is acquired, and the interactive object is controlled to perform the predetermined operation based on the obtained control parameter sequence. Including to do.

本発明によって提供される任意の実施形態に結合して、前記方法は、前記音声駆動データに対応する音声情報に基づいて前記表示デバイスが音声を出力するように制御すること、および/または、前記音声駆動データに対応するテキスト情報に基づいてテキストを展示することを、さらに含む。 Combined with any embodiment provided by the present invention, the method controls the display device to output audio based on the audio information corresponding to the audio driven data, and / or said. It further includes displaying texts based on the textual information corresponding to the voice-driven data.

本発明によって提供される任意の実施形態に結合して、前記得られた制御パラメータシーケンスに基づいて前記インタラクティブ対象が前記所定の動作を実行するように制御することは、前記目標データに対応する音声情報を確定することと、前記音声情報を出力する時間情報を取得することと、前記時間情報に基づいて前記目標データに対応する所定の動作の実行時間を確定することと、前記実行時間に基づいて、前記目標データに対応する制御パラメータシーケンスによって、前記インタラクティブ対象が前記所定の動作を実行するように制御することと、を含む。 In combination with any embodiment provided by the present invention, controlling the interactive object to perform the predetermined operation based on the obtained control parameter sequence is a voice corresponding to the target data. Based on the determination of information, the acquisition of time information for outputting the voice information, the determination of the execution time of a predetermined operation corresponding to the target data based on the time information, and the execution time. The control parameter sequence corresponding to the target data is used to control the interactive object to perform the predetermined operation.

本発明によって提供される任意の実施形態に結合して、前記制御パラメータシーケンスは、１組または複数組の制御パラメータを含み、前記実行時間に基づいて、前記目標データに対応する制御パラメータシーケンスによって、前記インタラクティブ対象が前記所定の動作を実行するように制御することは、前記制御パラメータシーケンス中の各組の制御パラメータを所定の速度で呼び出して、前記インタラクティブ対象が各組の制御パラメータに対応する姿態を展示するようにすることを含む。 Combined with any embodiment provided by the present invention, the control parameter sequence comprises one or more sets of control parameters, and based on the execution time, by the control parameter sequence corresponding to the target data. Controlling the interactive object to perform the predetermined operation calls the control parameters of each set in the control parameter sequence at a predetermined speed, and the interactive target corresponds to the control parameters of each set. Includes having to exhibit.

本発明によって提供される任意の実施形態に結合して、前記制御パラメータシーケンスは、１組または複数組の制御パラメータを含み、前記実行時間に基づいて、前記目標データに対応する制御パラメータシーケンスによって、前記インタラクティブ対象が前記所定の動作を実行するように制御することは、前記実行時間に基づいて、前記制御パラメータシーケンスの呼び出し速度を確定することと、前記制御パラメータシーケンス中の各組の制御パラメータを前記呼び出し速度で呼び出して、前記インタラクティブ対象が各組の制御パラメータに対応する姿態を出力するようにすることと、を含む。 Combined with any embodiment provided by the present invention, the control parameter sequence comprises one or more sets of control parameters, and based on the execution time, by the control parameter sequence corresponding to the target data. Controlling the interactive object to perform the predetermined operation determines the calling speed of the control parameter sequence based on the execution time, and controls each set of control parameters in the control parameter sequence. It includes calling at the calling speed so that the interactive object outputs a state corresponding to each set of control parameters.

本発明によって提供される任意の実施形態に結合して、前記実行時間に基づいて、前記目標データに対応する制御パラメータシーケンスによって、前記インタラクティブ対象が前記所定の動作を実行するように制御することは、前記目標データに対応する音声情報を出力する前の所定の時点で、前記目標データに対応する制御パラメータシーケンスを呼び出し始めて、前記インタラクティブ対象が前記所定の動作を実行し始めるようにすることを含む。 Combined with any embodiment provided by the present invention, it is possible to control the interactive object to perform the predetermined operation by a control parameter sequence corresponding to the target data based on the execution time. , Including starting to call the control parameter sequence corresponding to the target data at a predetermined time point before outputting the voice information corresponding to the target data so that the interactive object starts to perform the predetermined operation. ..

本発明によって提供される任意の実施形態に結合して、前記音声駆動データは、複数の目標データを含み、前記得られた制御パラメータシーケンスに基づいて前記インタラクティブ対象が前記所定の動作を実行するように制御することは、前記複数の目標データの中の隣接する目標データがオーバーラップしていることが検出されたことに応答して、語順に従って前に配列された目標データに対応する制御パラメータシーケンスに基づいて、前記インタラクティブ対象が前記所定の動作を実行するように制御することと、を含む。 Combined with any embodiment provided by the present invention, the voice driven data comprises a plurality of target data so that the interactive object performs the predetermined operation based on the obtained control parameter sequence. Controlling to is a control parameter sequence corresponding to the previously arranged target data according to word order in response to the detection that adjacent target data in the plurality of target data overlap. To control the interactive object to perform the predetermined operation based on the above.

本発明によって提供される任意の実施形態に結合して、前記音声駆動データは、複数の目標データを含み、前記目標データに対応する制御パラメータシーケンスに基づいて前記インタラクティブ対象が前記所定の動作を実行するように制御することは、前記複数の目標データの中の隣接する目標データに対応する制御パラメータシーケンスの実行時間がオーバーラップしていることが検出されたことに応答して、前記隣接する目標データに対応する制御パラメータシーケンスのオーバーラップしている部分を融合することを含む。 Combined with any embodiment provided by the present invention, the voice driven data includes a plurality of target data, and the interactive object performs the predetermined operation based on a control parameter sequence corresponding to the target data. Controlling to do so corresponds to the adjacent target in response to the detection that the execution times of the control parameter sequences corresponding to the adjacent target data in the plurality of target data overlap. Includes fusing overlapping parts of the control parameter sequence corresponding to the data.

本発明によって提供される任意の実施形態に結合して、前記音声駆動データに含まれている目標データに基づいて、前記目標データにマッチングするインタラクティブ対象の所定の動作の制御パラメータシーケンスを取得することは、前記音声駆動データがオーディオデータを含むことに応答して、前記オーディオデータに対して音声認識を実行し、認識された音声内容に基づいて、前記オーディオデータに含まれている目標データを確定することと、前記音声駆動データがテキストデータを含むことに応答して、前記テキストデータに含まれているテキスト内容に基づいて、前記テキストデータに含まれている目標データを確定することと、を含む。 Combined with any embodiment provided by the present invention, based on the target data contained in the voice-driven data, obtaining a control parameter sequence of a predetermined operation of the interactive object matching the target data. Executes voice recognition for the audio data in response to the voice-driven data including audio data, and determines the target data included in the audio data based on the recognized voice content. In response to the voice-driven data including the text data, the target data contained in the text data is determined based on the text content contained in the text data. include.

本発明によって提供される任意の実施形態に結合して、前記音声駆動データは、音節データを含み、前記音声駆動データに含まれている目標データに基づいて、前記目標データにマッチングするインタラクティブ対象の所定の動作の制御パラメータシーケンスを取得することは、前記音声駆動データに含まれている音節データが目標音節データにマッチングされるか否かを確定することであって、前記目標音節データは事前に分割された互いに異なる音節タイプに属し、互いに異なる音節タイプは互いに異なる所定の口形状に対応し、互いに異なる所定の口形状に対して対応する制御パラメータシーケンスが設定されていることと、前記音節データが前記目標音節データにマッチングされることに応答して、マッチングされる前記目標音節データが属している音節タイプに基づいて、マッチングされる前記目標音節データに対応する所定の口形状の制御パラメータシーケンスを取得することと、を含む。 Combined with any embodiment provided by the present invention, the voice-driven data includes syllable data and is an interactive object that matches the target data based on the target data contained in the voice-driven data. Acquiring a control parameter sequence of a predetermined operation is to determine whether or not the syllable data included in the voice-driven data is matched with the target tuned data, and the target tuned data is obtained in advance. It belongs to the divided different tune types, the different tune types correspond to the different predetermined mouth shapes, the control parameter sequences corresponding to the different predetermined mouth shapes are set, and the tune data. In response to being matched to the target syllable data, a predetermined mouth shape control parameter sequence corresponding to the matched target syllable data based on the syllable type to which the matched target syllable data belongs. To get and include.

本発明によって提供される任意の実施形態に結合して、前記方法は、前記音声駆動データの中の目標データ以外の第１データを取得することと、前記第１データの音響特徴を取得することと、前記音響特徴にマッチングする姿態制御パラメータを取得することと、前記姿態制御パラメータに基づいて前記インタラクティブ対象の姿態を制御することと、をさらに含む。 Combined with any embodiment provided by the present invention, the method is to acquire first data other than the target data in the voice driven data and to acquire the acoustic features of the first data. Further, acquisition of a shape control parameter matching the acoustic feature, and control of the shape of the interactive object based on the shape control parameter are further included.

本発明の１態様によると、インタラクティブ対象の駆動装置を提出し、前記装置は、表示デバイスに展示されているインタラクティブ対象の音声駆動データを取得するための第１取得ユニットと、前記音声駆動データに含まれている目標データに基づいて、前記目標データにマッチングするインタラクティブ対象の所定の動作の制御パラメータシーケンスを取得するための第２取得ユニットと、得られた制御パラメータシーケンスに基づいて前記インタラクティブ対象が前記所定の動作を実行するように制御するための駆動ユニットと、を備える。 According to one aspect of the present invention, a drive device for an interactive object is submitted, and the device is used for a first acquisition unit for acquiring voice drive data for an interactive object displayed on a display device, and the voice drive data. A second acquisition unit for acquiring a control parameter sequence of a predetermined operation of an interactive object matching the target data based on the included target data, and the interactive object based on the obtained control parameter sequence. A drive unit for controlling to perform the predetermined operation is provided.

本発明によって提供される任意の実施形態に結合して、前記装置は、前記音声駆動データに対応する音声情報に基づいて前記表示デバイスが音声を出力するように制御し、および/または、前記音声駆動データに対応するテキスト情報に基づいてテキストを展示するための出力ユニットを、さらに備える。 Combined with any embodiment provided by the present invention, the device controls the display device to output audio based on the audio information corresponding to the audio driven data, and / or the audio. It is further equipped with an output unit for displaying text based on the text information corresponding to the drive data.

本発明によって提供される任意の実施形態に結合して、前記駆動ユニットは、具体的に、前記目標データに対応する音声情報を確定し、前記音声情報を出力する時間情報を取得し、前記時間情報に基づいて前記目標データに対応する所定の動作の実行時間を確定し、前記実行時間に基づいて、前記目標データに対応する制御パラメータシーケンスによって、前記インタラクティブ対象が前記所定の動作を実行するように制御する。 Combined with any embodiment provided by the present invention, the drive unit specifically determines audio information corresponding to the target data, acquires time information for outputting the audio information, and obtains the time information. Based on the information, the execution time of the predetermined operation corresponding to the target data is determined, and based on the execution time, the interactive target executes the predetermined operation by the control parameter sequence corresponding to the target data. To control.

本発明によって提供される任意の実施形態に結合して、前記制御パラメータシーケンスは、１組または複数組の制御パラメータを含み、前記駆動ユニットは、前記実行時間に基づいて、前記目標データに対応する制御パラメータシーケンスによって、前記インタラクティブ対象が前記所定の動作を実行するように制御するときに、具体的に、前記制御パラメータシーケンス中の各組の制御パラメータを所定の速度で呼び出して、前記インタラクティブ対象が各組の制御パラメータに対応する姿態を展示するようにする。 Combined with any embodiment provided by the present invention, the control parameter sequence comprises one or more sets of control parameters, and the drive unit corresponds to the target data based on the execution time. When the interactive object is controlled to perform the predetermined operation by the control parameter sequence, specifically, each set of control parameters in the control parameter sequence is called at a predetermined speed, and the interactive object causes the interactive object. The appearance corresponding to each set of control parameters will be exhibited.

本発明によって提供される任意の実施形態に結合して、前記制御パラメータシーケンスは、１組または複数組の制御パラメータを含み、前記駆動ユニットは、前記実行時間に基づいて、前記目標データに対応する制御パラメータシーケンスによって、前記インタラクティブ対象が前記所定の動作を実行するように制御するときに、具体的に、前記実行時間に基づいて、前記制御パラメータシーケンスの呼び出し速度を確定し、前記制御パラメータシーケンス中の各組の制御パラメータを前記呼び出し速度で呼び出して、前記インタラクティブ対象が各組の制御パラメータに対応する姿態を出力するようにする。 Combined with any embodiment provided by the present invention, the control parameter sequence comprises one or more sets of control parameters, and the drive unit corresponds to the target data based on the execution time. When the interactive object is controlled to perform the predetermined operation by the control parameter sequence, specifically, the calling speed of the control parameter sequence is determined based on the execution time, and the control parameter sequence is in the control parameter sequence. The control parameters of each set are called at the calling speed so that the interactive target outputs the state corresponding to the control parameters of each set.

本発明によって提供される任意の実施形態に結合して、前記制御パラメータシーケンスは、１組または複数組の制御パラメータを含み、前記駆動ユニットは、前記実行時間に基づいて、前記目標データに対応する制御パラメータシーケンスによって、前記インタラクティブ対象が前記所定の動作を実行するように制御するときに、具体的に、前記目標データに対応する音声情報を出力する前の所定の時点で、前記目標データに対応する制御パラメータシーケンスを呼び出し始めて、前記インタラクティブ対象が前記所定の動作を実行し始めるようにする。 Combined with any embodiment provided by the present invention, the control parameter sequence comprises one or more sets of control parameters, and the drive unit corresponds to the target data based on the execution time. When the interactive object is controlled to perform the predetermined operation by the control parameter sequence, specifically, the target data corresponds to the target data at a predetermined time before outputting the voice information corresponding to the target data. Start calling the control parameter sequence to make the interactive object start performing the predetermined action.

本発明によって提供される任意の実施形態に結合して、前記音声駆動データは、複数の目標データを含み、前記駆動ユニットは、具体的に、前記複数の目標データの中の隣接する目標データがオーバーラップしていることが検出されたことに応答して、語順に従って前に配列された目標データに対応する制御パラメータシーケンスに基づいて、前記インタラクティブ対象が前記所定の動作を実行するように制御する。 Combined with any embodiment provided by the present invention, the voice drive data may include a plurality of target data, and the drive unit may specifically include adjacent target data in the plurality of target data. In response to the detection of overlap, the interactive object is controlled to perform the predetermined action based on the control parameter sequence corresponding to the target data previously arranged according to the word order. ..

本発明によって提供される任意の実施形態に結合して、前記音声駆動データは、複数の目標データを含み、前記駆動ユニットは、具体的に、前記複数の目標データの中の隣接する目標データに対応する制御パラメータシーケンスの実行時間がオーバーラップしていることが検出されたことに応答して、前記隣接する目標データに対応する制御パラメータシーケンスのオーバーラップしている部分を融合する。 Combined with any embodiment provided by the present invention, the voice-driven data comprises a plurality of target data, and the drive unit specifically comprises adjacent target data in the plurality of target data. In response to the detection that the execution times of the corresponding control parameter sequences overlap, the overlapping parts of the control parameter sequences corresponding to the adjacent target data are fused.

本発明によって提供される任意の実施形態に結合して、前記第２取得ユニットは、具体的に、前記音声駆動データがオーディオデータを含むことに応答して、前記オーディオデータに対して音声認識を実行し、前記オーディオデータに含まれている音声内容に基づいて、前記オーディオデータに含まれている目標データを確定し、前記音声駆動データがテキストデータを含むことに応答して、前記テキストデータに含まれているテキスト内容に基づいて、前記テキストデータに含まれている目標データを確定する。 Combined with any embodiment provided by the present invention, the second acquisition unit specifically performs voice recognition on the audio data in response to the voice driven data including audio data. Execute, determine the target data included in the audio data based on the audio content contained in the audio data, and respond to the audio driven data including the text data to the text data. Based on the contained text content, the target data included in the text data is determined.

本発明によって提供される任意の実施形態に結合して、前記音声駆動データは、音節データを含み、前記第２取得ユニットは、具体的に、前記音声駆動データに含まれている音節データが目標音節データにマッチングされるか否かを確定し、ここで、前記目標音節データは事前に分割された互いに異なる音節タイプに属し、互いに異なる音節タイプは互いに異なる所定の口形状に対応し、互いに異なる所定の口形状に対して対応する制御パラメータシーケンスが設定されており、また、前記音節データが前記目標音節データにマッチングされることに応答して、マッチングされる前記目標音節データが属している音節タイプに基づいて、マッチングされる前記目標音節データに対応する所定の口形状の制御パラメータシーケンスを取得する。 Combined with any embodiment provided by the present invention, the voice-driven data includes syllable data, and the second acquisition unit specifically targets the syllable data contained in the voice-driven data. It is determined whether or not it is matched with the syllable data, where the target syllable data belongs to the pre-divided and different syllable types, and the different syllable types correspond to different predetermined mouth shapes and are different from each other. The corresponding control parameter sequence is set for the predetermined mouth shape, and the syllable to which the matched target syllable data belongs in response to the matching of the syllable data with the target syllable data. Based on the type, a predetermined mouth shape control parameter sequence corresponding to the matched target syllable data is acquired.

本発明によって提供される任意の実施形態に結合して、前記装置は、前記音声駆動データの中の目標データ以外の第１データを取得し、前記第１データの音響特徴を取得し、前記第１データの音響特徴にマッチングする姿態制御パラメータを取得し、前記姿態制御パラメータに基づいて前記インタラクティブ対象の姿態を制御するための姿態制御ユニットをさらに備える。 Combined with any embodiment provided by the present invention, the apparatus acquires first data other than the target data in the voice-driven data, acquires the acoustic features of the first data, and the first. 1 Further, a shape control unit for acquiring a shape control parameter matching the acoustic characteristics of the data and controlling the shape of the interactive target based on the shape control parameter is further provided.

本発明の１態様によると、電子デバイスを提供し、前記デバイスは、メモリとプロセッサとを備え、前記メモリは、プロセッサ上で運行可能なコンピュータ命令を記憶し、前記プロセッサは、前記コンピュータ命令が実行されるときに、本発明によって提供される任意の実施形態に記載のインタラクティブ対象の駆動方法が実現される。 According to one aspect of the invention, an electronic device is provided, the device comprising a memory and a processor, the memory storing computer instructions operable on the processor, the processor executing the computer instructions. When done, the interactive object driving method described in any of the embodiments provided by the present invention is realized.

本発明の１態様によると、コンピュータプログラムが記憶されているコンピュータ可読記録媒体を提供し、前記プログラムがプロセッサによって実行されるときに、本発明によって提供される任意の実施形態に記載のインタラクティブ対象の駆動方法が実現される。 According to one aspect of the invention, a computer-readable recording medium in which a computer program is stored is provided, and when the program is executed by a processor, the interactive object according to any embodiment provided by the present invention. The drive method is realized.

本発明の１つまたは複数の実施例のインタラクティブ対象の駆動方法、装置、デバイス、及びコンピュータ可読記録媒体によると、表示デバイスに展示されているインタラクティブ対象の音声駆動データに含まれている少なくとも１つの目標データに基づいて、前記目標データにマッチングするインタラクティブ対象の所定の動作の制御パラメータを取得して、前記表示デバイスに展示されているインタラクティブ対象の動作を制御することによって、インタラクティブ対象が音声駆動データに含まれている目標データに対応する動作を行うようにし、インタラクティブ対象の発話する状態を自然で鮮やかにし、目標対象のインタラクティブ体験を改善した。 According to the method, apparatus, device, and computer-readable recording medium of the interactive object of one or more embodiments of the present invention, at least one of the voice-driven data of the interactive object displayed on the display device. Based on the target data, the interactive target is voice-driven data by acquiring control parameters of a predetermined motion of the interactive target matching the target data and controlling the motion of the interactive target displayed on the display device. The action corresponding to the target data contained in is made to perform the action corresponding to the target data, the speaking state of the interactive target is made natural and vivid, and the interactive experience of the target target is improved.

本発明の実施例によって提供されるインタラクティブ対象の駆動方法中の表示デバイスの模式図である。It is a schematic diagram of the display device in the driving method of an interactive object provided by the Embodiment of this invention. 本発明の実施例によって提供されるインタラクティブ対象の駆動方法のフローチャートである。It is a flowchart of the driving method of an interactive object provided by the Example of this invention. 本発明の実施例によって提供されるインタラクティブ対象の駆動方法のフローチャートである。It is a flowchart of the driving method of an interactive object provided by the Example of this invention. 本発明の実施例によって提供されるインタラクティブ対象の駆動方法のフローチャートである。It is a flowchart of the driving method of an interactive object provided by the Example of this invention. 本発明の実施例によって提供されるインタラクティブ対象の駆動装置の構成の模式図である。It is a schematic diagram of the structure of the drive device of an interactive object provided by the Example of this invention. 本発明の実施例によって提供される電子デバイスの構成の模式図である。It is a schematic diagram of the structure of the electronic device provided by the Example of this invention.

以下、例示的な実施例を詳細に説明し、その例を図面に示す。以下の説明が図面を言及している場合、特に明記しない限り、異なる図面における同一の数字は、同一または類似な要素を示す。以下の例示的な実施例で叙述される実施形態は、本発明と一致するすべての実施形態を代表しない。逆に、それらは、添付された特許請求の範囲に記載された、本発明のいくつかの態様と一致する装置及び方法の例に過ぎない。 Hereinafter, exemplary embodiments will be described in detail and examples will be shown in the drawings. Where the following description refers to drawings, the same numbers in different drawings indicate the same or similar elements, unless otherwise stated. The embodiments described in the following exemplary examples do not represent all embodiments consistent with the present invention. Conversely, they are merely examples of devices and methods consistent with some aspects of the invention described in the appended claims.

本明細書における「および/または」という用語は、ただ関連対象の関連関係を説明するものであり、３つの関係が存在できることを示し、たとえば、Ａおよび/またはＢは、Ａが単独に存在すること、ＡとＢが同時に存在すること、および、Ｂが単独に存在することのような３つの関係が存在する。また、本明細書における「少なくとも１種」という用語は、複数種類の中の任意の１種または複数種類の中の少なくとも２種の任意の組み合わせを示し、たとえば、Ａ、Ｂ、Ｃの中の少なくとも１種を含むことは、Ａ、Ｂ、および、Ｃから構成されたセットから選択した任意の１つまたは複数の要素を含むことを示す。 The term "and / or" as used herein merely describes the relationship of a related object and indicates that three relationships can exist, for example, A and / or B, where A is present alone. There are three relationships, such as the existence of A and B at the same time, and the existence of B alone. Further, the term "at least one kind" in the present specification refers to any one kind in a plurality of kinds or any combination of at least two kinds in a plurality of kinds, for example, in A, B, C. Inclusion of at least one indicates that it comprises any one or more elements selected from a set composed of A, B, and C.

本発明の少なくとも１つの実施例は、インタラクティブ対象の駆動方法を提供し、前記駆動方法は、端末デバイスまたはサーバなどの電子デバイスによって実行され得る。前記端末デバイスは、携帯電話、タブレットパソコン、ゲーム機、デスクトップパソコン、広告機、オールインワン機、車載端末などの、固定端末または移動端末であり得る。前記サーバは、ローカルサーバまたはクラウドサーバなどを含む。前記方法は、プロセッサによりメモリに記憶されているコンピュータ可読命令を呼び出す方法によって実現されることができる。 At least one embodiment of the present invention provides a driving method for an interactive object, which driving method can be performed by an electronic device such as a terminal device or a server. The terminal device may be a fixed terminal or a mobile terminal such as a mobile phone, a tablet personal computer, a game machine, a desktop personal computer, an advertising machine, an all-in-one machine, and an in-vehicle terminal. The server includes a local server, a cloud server, and the like. The method can be realized by a method of calling a computer-readable instruction stored in a memory by a processor.

本発明の実施例において、インタラクティブ対象は、目標対象とインタラクティブを実行できる任意の仮想イメージであり得る。１実施例において、インタラクティブ対象は、仮想キャラクターであり得、さらに、仮想動物、仮想物品、漫画イメージなどの、インタラクティブ機能を実現できる他の仮想イメージであり得る。インタラクティブ対象の表示形式は、２Ｄまたは３Ｄであるが、本発明はこれに対して限定しない。前記目標対象は、ユーザ、ロボット、またはその他のスマートデバイスであり得る。前記インタラクティブ対象の前記目標対象とのインタラクティブ方法は、能動的インタラクティブ方法または受動的インタラクティブ方法であり得る。１例において、目標対象により、ジェスチャまたは肢体動作を行うことによって要求を発して、能動的インタラクティブ方法によってインタラクティブ対象をトリガしてインタラクティブを行うことができる。もう１例において、インタラクティブ対象により、能動的に挨拶して、目標対象が動作などを行うようにプロンプトする方法によって、目標対象が受動的方法によってインタラクティブ対象とインタラクティブを行うようにすることができる。 In an embodiment of the invention, the interactive object can be any virtual image capable of performing interactive with the target object. In one embodiment, the interactive object may be a virtual character, and may be another virtual image capable of realizing an interactive function, such as a virtual animal, a virtual article, or a cartoon image. The display format of the interactive object is 2D or 3D, but the present invention is not limited thereto. The target can be a user, a robot, or other smart device. The interactive method of the interactive object with the target object may be an active interactive method or a passive interactive method. In one example, depending on the target object, a request can be made by performing a gesture or a limb movement, and the interactive object can be triggered and interactively performed by an active interactive method. In another example, the interactive object can be made to interact with the interactive object by a passive method by actively greeting and prompting the target object to perform an action or the like.

前記インタラクティブ対象は、端末デバイスを利用して展示することができ、前記端末デバイスは、テレビ、表示機能を有するオールインワン器、プロジェクター、仮想現実（ＶｉｒｔｕａｌＲｅａｌｉｔｙ、ＶＲ）デバイス、拡張現実（ＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ、ＡＲ）デバイスなどであり得、本発明は端末デバイスの具体的な形態に対して限定しない。 The interactive object can be exhibited using a terminal device, and the terminal device includes a television, an all-in-one device having a display function, a projector, a virtual reality (VR) device, and an augmented reality (AR). ) The present invention may be a device or the like, and the present invention is not limited to a specific form of the terminal device.

図１は、本発明の実施例に係る表示デバイスを示す。図１に示すように、当該表示デバイスは、表示スクリーンを有し、表示スクリーンに立体画像を表示することによって、立体効果を有する仮想シーンおよびインタラクティブ対象を現わすことができる。たとえば、図１の表示スクリーンに表示されたインタラクティブ対象は、仮想漫画人物を含む。 FIG. 1 shows a display device according to an embodiment of the present invention. As shown in FIG. 1, the display device has a display screen, and by displaying a stereoscopic image on the display screen, a virtual scene having a stereoscopic effect and an interactive object can be represented. For example, the interactive object displayed on the display screen of FIG. 1 includes a virtual cartoon person.

本発明に記載の電子デバイスは、内蔵されたディスプレイを含み、ディスプレイを利用して立体画像を表示して、仮想シーンおよびインタラクティブ対象を表現することができる。もういくつかの実施例において、本発明に記載の電子デバイスは、内蔵するディスプレイを含まないでもよく、表示する必要がある内容を有線または無線の接続を介して外部のディスプレイが仮想シーンおよびインタラクティブ対象を表示するように通知することができる。 The electronic device according to the present invention includes a built-in display, which can be used to display a stereoscopic image to represent a virtual scene and an interactive object. In some other embodiments, the electronic device described in the present invention may not include a built-in display, and the content that needs to be displayed can be displayed on an external display via a wired or wireless connection for virtual scenes and interactive objects. Can be notified to display.

いくつかの実施例において、電子デバイスによってインタラクティブ対象が音声を出力するように駆動するための音声駆動データが受信されたことに応答して、インタラクティブ対象は、目標対象に対して指定された音声を発することができる。端末デバイスは、端末デバイスの周辺の目標対象の動作、表情、身分、好みなどに基づいて、音声駆動データを生成することによって、インタラクティブ対象が指定された音声を発して交流または応答を行うように駆動することで、目標対象に対して擬人化サービスを提供することができる。インタラクティブ対象が目標対象とインタラクティブする過程で、当該音声駆動データに基づいてインタラクティブ対象が指定された音声を発するように駆動する同時に、前記インタラクティブ対象が当該指定された音声と同期に顔動作を行うように駆動することができない場合があり、インタラクティブ対象が音声を発するときに鈍く不自然になり、目標対象とインタラクティブ体験に影響をあたえることになる。これに鑑みて、本発明の少なくとも１つの実施例は、インタラクティブ対象駆動方法は、目標対象のインタラクティブ対象とのインタラクティブの体験を向上させる。 In some embodiments, in response to the electronic device receiving voice-driven data to drive the interactive object to output audio, the interactive object delivers the audio specified for the target object. Can be emitted. The terminal device generates voice-driven data based on the movement, facial expression, status, preference, etc. of the target object around the terminal device so that the interactive target emits a specified voice to interact or respond. By driving, it is possible to provide anthropomorphic services to the target target. In the process of interacting with the target object, the interactive object is driven to emit a specified voice based on the voice-driven data, and at the same time, the interactive target performs a face motion in synchronization with the specified voice. It may not be able to be driven by, and it becomes dull and unnatural when the interactive object emits sound, which affects the target object and the interactive experience. In view of this, in at least one embodiment of the invention, the interactive object driving method enhances the interactive experience with the interactive object of the target object.

図２は、本発明の実施例に係るインタラクティブ対象の駆動方法のフローチャートであり、図２に示すように、前記方法は、ステップ２０１～ステップ２０３を含む。 FIG. 2 is a flowchart of a driving method of an interactive object according to an embodiment of the present invention, and as shown in FIG. 2, the method includes steps 201 to 203.

ステップ２０１において、表示デバイスに展示されているインタラクティブ対象の音声駆動データを取得する。 In step 201, the voice-driven data of the interactive object displayed on the display device is acquired.

本発明の実施例において、前記音声駆動データは、オーディオデータ（音声データ）、テキストデータなどを含み得る。前記音声駆動データは、端末デバイスによりインタラクティブ対象とインタラクティブを行う目標対象の動作、表情、身分、好みなどに基づいて生成した駆動データであってもよいし、電子デバイスにより内部メモリから呼び出した音声駆動データなどのような、直接取得した駆動データであってもよい。本発明は、当該音声駆動データの取得方法に対して限定しない。 In the embodiment of the present invention, the voice-driven data may include audio data (voice data), text data, and the like. The voice-driven data may be drive data generated based on the movement, facial expression, status, preference, etc. of the target object that interacts with the interactive object by the terminal device, or may be voice-driven data called from the internal memory by the electronic device. It may be directly acquired drive data such as data. The present invention is not limited to the method of acquiring the voice-driven data.

ステップ２０２において、前記音声駆動データに含まれている目標データに基づいて、前記目標データにマッチングするインタラクティブ対象の所定の動作の制御パラメータシーケンスを取得し、ここで、前記制御パラメータシーケンスは、１組または複数組の制御パラメータを含む。 In step 202, based on the target data included in the voice-driven data, a control parameter sequence of a predetermined operation of the interactive target matching with the target data is acquired, and here, the control parameter sequence is a set. Or it contains multiple sets of control parameters.

本発明の実施例において、目標データは、所定の動作と事前にマッチングされたデータであり、前記所定の動作は該当する制御パラメータシーケンスに基づいて制御して実現されるため、前記目標データは前記所定の動作の制御パラメータシーケンスにマッチングされる。前記目標データは、所定のキーワード、単語、文などであり得る。キーワードが「手振り」である例をとると、前記音声駆動データにテキストデータが含まれている場合、「手振り」に対応する目標データは「手振り」というテキストデータであり、および/または、前記音声駆動データにオーディオまたは音節データが含まれている場合、「手振り」に対応する目標データは「手振り」という音声データであり得る。前記音声駆動データが上述した目標データにマッチングされると、前記音声駆動データに目標データが含まれていると確定することができる。 In the embodiment of the present invention, the target data is data that is pre-matched with a predetermined operation, and the predetermined operation is controlled and realized based on the corresponding control parameter sequence. Therefore, the target data is described above. Matches to a control parameter sequence of predetermined operation. The target data may be a predetermined keyword, word, sentence, or the like. Taking the example where the keyword is "hand gesture", when the voice-driven data includes text data, the target data corresponding to the "hand gesture" is the text data "hand gesture" and / or the voice. When the drive data includes audio or syllable data, the target data corresponding to the "hand gesture" can be the voice data "hand gesture". When the voice-driven data is matched with the target data described above, it can be determined that the voice-driven data includes the target data.

前記所定の動作は、汎用の単位動画を利用して実現することができ、当該ユニット動画は画像フレームシーケンスを含み得、当該シーケンス中の各々の画像フレームは前記インタラクティブ対象の１つの姿態に対応し、画像フレーム間の対応する姿態の変化はインタラクティブ対象が所定の動作を実現するようにすることができる。ここで、１つの画像フレーム内のインタラクティブ対象の姿態は、１組の制御パラメータを利用して実現することができ、たとえば、複数のボーンポイントの変位によって形成される１組の制御パラメータを利用して実現することができる。したがって、複数組の制御パラメータによって形成された制御パラメータシーケンスを利用してインタラクティブ対象の姿態変化を制御することができ、インタラクティブ対象が所定の動作を実現するように制御することができる。 The predetermined operation can be realized by using a general-purpose unit moving image, the unit moving image may include an image frame sequence, and each image frame in the sequence corresponds to one form of the interactive object. , Corresponding changes in appearance between image frames can allow the interactive object to achieve a given action. Here, the appearance of the interactive object in one image frame can be realized by using a set of control parameters, for example, using a set of control parameters formed by the displacement of a plurality of bone points. Can be realized. Therefore, it is possible to control the change in the appearance of the interactive object by using the control parameter sequence formed by a plurality of sets of control parameters, and it is possible to control the interactive object so as to realize a predetermined operation.

いくつかの実施例において、前記目標データは、目標音節データを含み得、前記目標音節データは所定の口形状の制御パラメータに対応し、１種の目標音節データは事前に分割された互いに異なる音節タイプに属し、前記互いに異なる音節タイプは互いに異なる所定の口形状に対応し、互いに異なる所定の口形状に対して対応する制御パラメータシーケンスが設定されている。 In some embodiments, the target data may include target syllable data, the target syllable data corresponds to a predetermined mouth shape control parameter, and one type of target syllable data is pre-divided and different syllables. The syllable types that belong to the type and are different from each other correspond to the predetermined mouth shapes that are different from each other, and the control parameter sequences corresponding to the predetermined mouth shapes that are different from each other are set.

音節データは、少なくとも１つの音素を組み合わせて形成した音声単位であり得、前記音節データは、ピンイン言語の音節データおよび非ピンイン言語（たとえば中国語である）の音節データであり得る。互いに異なる音節タイプは、発音動作と一致するかまたは基本的に一致する音節データであり得、互いに異なる音節タイプは、インタラクティブ対象の１種の動作に対応する。具体的に、互いに異なる音節タイプは、インタラクティブ対象が話すときの互いに異なる所定の口形状に対応し、すなわち１種の発音動作に対応する。この場合、同じ種類のタイプの音節データは、所定の同じ種類の口形状の制御パラメータシーケンスにマッチングする。たとえば、ピンインである「ｍａ」、「ｍａｎ」、「ｍａｎｇ」などのタイプの音節データは、この種類の音節データの発音動作は基本的に一致するため、同じタイプに見なすことができ、いずれもインタラクティブ対象が発話するときに「口が開いている」という口形状の制御パラメータシーケンスに対応する。この場合、音声駆動データ中にこの種類の目標音節データが存在することが検出された場合、当該目標音節データにマッチングする口形状の制御パラメータシーケンスに基づいてインタラクティブ対象が対応する口形状を行うように制御することができる。さらに、複数種類のタイプの音節データに基づいて、複数の異なるタイプの口形状の制御パラメータシーケンスをマッチングすることができ、前記複数の制御パラメータシーケンスを利用してインタラクティブ対象の口形状の変化を制御し、インタラクティブ対象の擬人化された発話状態の実現を制御することができる。 The syllable data may be a voice unit formed by combining at least one phoneme, and the syllable data may be syllable data in a pin-in language and syllable data in a non-pin-in language (for example, Chinese). Different syllable types can be syllable data that match or essentially match the pronunciation action, and different syllable types correspond to one type of motion of the interactive object. Specifically, the different syllable types correspond to different predetermined mouth shapes as the interactive object speaks, i.e., to correspond to one type of pronunciation action. In this case, the same type of syllable data is matched to a predetermined same type of mouth shape control parameter sequence. For example, pinyin types of syllable data such as "ma", "man", and "mang" can be regarded as the same type because the pronunciation behavior of this type of syllable data basically matches. Corresponds to the mouth-shaped control parameter sequence of "open mouth" when the interactive object speaks. In this case, if it is detected that this type of target syllable data is present in the voice-driven data, the interactive target will perform the corresponding mouth shape based on the mouth shape control parameter sequence that matches the target syllable data. Can be controlled to. Further, it is possible to match a plurality of different types of mouth shape control parameter sequences based on a plurality of types of syllable data, and control changes in the mouth shape of an interactive target by using the plurality of control parameter sequences. And it is possible to control the realization of anthropomorphic speech states of interactive objects.

ステップ２０３において、得られた制御パラメータシーケンスに基づいて前記インタラクティブ対象が前記所定の動作を実行するように制御する。 In step 203, the interactive object is controlled to perform the predetermined operation based on the obtained control parameter sequence.

前記音声駆動データに含まれている１つまたは複数の目標データに対して、いずれも、該当する所定の動作の制御パラメータシーケンスを得ることができる。得られた制御パラメータシーケンスに基づいて前記インタラクティブ対象の動作を制御することで、前記音声駆動データ中の各々の目標データに対応する所定の動作を実現することができる。 For one or more target data included in the voice-driven data, it is possible to obtain a control parameter sequence of the corresponding predetermined operation. By controlling the operation of the interactive object based on the obtained control parameter sequence, it is possible to realize a predetermined operation corresponding to each target data in the voice-driven data.

本発明の実施例において、表示デバイスに展示されているインタラクティブ対象の音声駆動データに含まれている目標データに基づいて、前記目標データにマッチングするインタラクティブ対象の所定の動作の制御パラメータシーケンスを取得し、前記表示デバイスに展示されているインタラクティブ対象の動作を制御することによって、インタラクティブ対象が音声駆動データに含まれている目標データに対応する動作を行うようにし、インタラクティブ対象の発話する状態を自然で鮮やかにし、目標対象のインタラクティブ体験を改善した。 In an embodiment of the present invention, a control parameter sequence of a predetermined operation of an interactive object matching the target data is acquired based on the target data included in the voice-driven data of the interactive object displayed on the display device. By controlling the motion of the interactive object displayed on the display device, the interactive object is made to perform the motion corresponding to the target data included in the voice-driven data, and the talking state of the interactive object is naturally made. Brilliant and improved the targeted interactive experience.

図３は、本発明の実施例に係るインタラクティブ対象の駆動方法のフローチャートであり、図３に示すように、前記方法は以下のステップを含む。 FIG. 3 is a flowchart of a driving method for an interactive object according to an embodiment of the present invention, and as shown in FIG. 3, the method includes the following steps.

ステップ２０４において、前記音声駆動データに対応する音声情報に基づいて前記表示デバイスが音声を出力するように制御し、および／または、前記音声駆動データに対応するテキスト情報に基づいてテキストを展示するように制御する。 In step 204, the display device is controlled to output voice based on the voice information corresponding to the voice-driven data, and / or the text is displayed based on the text information corresponding to the voice-driven data. To control.

表示デバイスが音声駆動データに対応する音声を出力するように制御する同時に、前記音声駆動データ中の各々の目標データにマッチングする制御パラメータシーケンスに基づいて、前記インタラクティブ対象が該当する動作を実行するように順に制御することによって、インタラクティブ対象が音声を出力する同時に、音声に含まれている内容に基づいて動作を行うようにすることができ、インタラクティブ対象の発話する状態を自然で鮮やかにし、目標対象のインタラクティブ体験を改善した。 The display device controls to output the voice corresponding to the voice-driven data, and at the same time, the interactive object performs the corresponding action based on a control parameter sequence matching each target data in the voice-driven data. By controlling in order, the interactive target can output the voice and at the same time perform the action based on the content contained in the voice, making the spoken state of the interactive target natural and vivid, and the target target. Improved the interactive experience.

さらに、表示デバイスが音声駆動データに対応する音声を出力するように制御する同時に、前記表示デバイスに前記音声駆動データに対応するテキストを展示することができ、また、前記音声駆動データ中の各々の目標データにマッチングする制御パラメータシーケンスに基づいて、前記インタラクティブ対象が該当する動作を実行するように順に制御することによって、インタラクティブ対象が音声を出力し、テキストを展示する同時に、音声およびテキストに含まれている内容に基づいて動作を行うようにすることができ、インタラクティブ対象が表現する状態が自然で鮮やかになるようにし、目標対象のインタラクティブ体験を改善した。 Further, at the same time as controlling the display device to output the voice corresponding to the voice-driven data, the text corresponding to the voice-driven data can be displayed on the display device, and each of the texts in the voice-driven data can be displayed. By sequentially controlling the interactive object to perform the corresponding action based on a control parameter sequence that matches the target data, the interactive object outputs the voice and displays the text, and at the same time, it is included in the voice and the text. You can make the action based on what you are doing, make the state expressed by the interactive object natural and vivid, and improve the interactive experience of the target object.

本発明の実施例において、指定された動作に対して制御パラメータシーケンスを設定するだけで、可変内容に対応する画像フレームシーケンスを構成することができ、インタラクティブ対象の駆動効率を向上させた。なお、目標データを必要に応じて増加または修正することによって、変化する内容に応答することができ、駆動システムに対するメンテナンスおよび更新が容易になる。 In the embodiment of the present invention, the image frame sequence corresponding to the variable content can be configured only by setting the control parameter sequence for the specified operation, and the driving efficiency of the interactive object is improved. By increasing or modifying the target data as needed, it is possible to respond to changing contents, and maintenance and updating of the drive system becomes easy.

いくつかの実施例において、前記方法は、サーバに適用され、当該サーバは、ローカルサーバまたはクラウドサーバなどを含む。前記サーバは、インタラクティブ対象の音声駆動データに対して処理を実行して、前記インタラクティブ対象の姿態パラメータ値を生成し、前記姿態パラメータ値に基づいて３次元のまたは２次元のレンダリングエンジンを利用してレンダリングして、前記インタラクティブ対象の応答動画を得ることができる。前記サーバは、前記応答動画を端末デバイスに送信して展示することで目標対象に応答することができ、さらに、前記応答動画をクラウドに送信することで、端末デバイスがクラウドから前記応答動画を取得して目標対象に応答することができる。サーバは、前記インタラクティブ対象の姿態パラメータ値を生成した後に、さらに、前記姿態パラメータ値を端末に送信することによって、端末が、レンダリングの実行、動画の生成、展示の実行などの過程を完了するようにする。 In some embodiments, the method applies to a server, which includes a local server, a cloud server, and the like. The server executes processing on the voice-driven data of the interactive target to generate the appearance parameter value of the interactive target, and utilizes a three-dimensional or two-dimensional rendering engine based on the appearance parameter value. It can be rendered to obtain the response video of the interactive object. The server can respond to the target target by transmitting the response video to the terminal device and displaying it, and further, by transmitting the response video to the cloud, the terminal device acquires the response video from the cloud. And respond to the target. After the server generates the appearance parameter value of the interactive object, the server further sends the appearance parameter value to the terminal so that the terminal completes the process of performing rendering, generating a moving image, executing an exhibition, and the like. To.

いくつかの実施例において、前記方法は、端末デバイスに応用され、前記端末デバイスは、インタラクティブ対象の音声駆動データに対して処理を実行して、前記インタラクティブ対象の姿態パラメータ値を生成し、前記姿態パラメータ値に基づいて３次元のまたは２次元レンダリングエンジンを利用してレンダリングして、前記インタラクティブ対象の応答動画を得、前記端末は、前記応答動画を展示することで目標対象に応答することができる。 In some embodiments, the method is applied to a terminal device, which performs a process on the voice-driven data of the interactive object to generate a figure parameter value of the interactive object, said the figure. Rendering is performed using a three-dimensional or two-dimensional rendering engine based on the parameter value to obtain the response video of the interactive target, and the terminal can respond to the target target by displaying the response video. ..

音声駆動データがオーディオデータを含むことに応答して、音声駆動データに対して音声認識を実行して、前記オーディオデータに含まれている音声内容を得、前記オーディオデータに含まれている目標データを確定することができる。音声内容と目標データとをマッチングすることによって、前記音声駆動データに含まれている目標データを確定することができる。 In response to the voice-driven data including audio data, voice recognition is performed on the voice-driven data to obtain the voice content contained in the audio data, and the target data included in the audio data is obtained. Can be confirmed. By matching the voice content with the target data, the target data included in the voice drive data can be determined.

音声駆動データがテキストデータを含むことに応答して、前記テキストデータに含まれているテキスト内容に基づいて、前記テキストデータに含まれている目標データを確定することができる。 In response to the voice-driven data including the text data, the target data included in the text data can be determined based on the text content included in the text data.

いくつかの実施例において、前記音声駆動データが音節データを含む場合、前記音声駆動データを分割して少なくとも１つの音節データを得ることができる。当業者は、音声駆動データを分割する方法は、１種の方法に限定されず、異なる分割方法によって異なる音節データの組み合わせを得ることができ、異なる分割方法に対して優先度を設定することによって、優先度が高い分割方法によって得られた音節データの組み合わせを分割結果として利用することができることを理解すべきである。 In some embodiments, when the voice-driven data includes syllable data, the voice-driven data can be divided to obtain at least one syllable data. Those skilled in the art can obtain different combinations of syllable data by different division methods, and the method of dividing voice-driven data is not limited to one method, and by setting priorities for different division methods. It should be understood that the combination of syllable data obtained by the high priority division method can be used as the division result.

分割して得られた音節データと目標音節データとをマッチングし、前記音節データが任意の音節タイプの目標音節データとマッチングされたことに応答して、前記音節データが目標音節データとマッチングされると確定することができ、さらに、前記音声駆動データが前記目標データを含むと確定することができる。たとえば、目標音節データは、「ｍａ」、「ｍａｎ」、「ｍａｎｇ」のようなタイプの音節データを含み得、前記音声駆動データが「ｍａ」、「ｍａｎ」、「ｍａｎｇ」の中の任意の１つにマッチングする音節データを含むことに応答して、前記音声駆動データが前記目標音節データを含むと確定することができる。 The syllable data obtained by division is matched with the target syllable data, and the syllable data is matched with the target syllable data in response to the matching of the syllable data with the target syllable data of any syllable type. And further, it can be determined that the voice-driven data includes the target data. For example, the target syllable data may include syllable data of types such as "ma", "man", "mang", and the voice driven data may be any of "ma", "man", "mang". In response to including one matching syllable data, it can be determined that the voice driven data includes the target syllable data.

前記音声駆動データが目標音節データを含む場合、前記目標音節データが属している音節タイプに基づいて、前記目標音節データに対応する所定の口形状の制御パラメータシーケンスを取得し、インタラクティブ対象が対応する口形状を行うように制御することができる。上述した方法によって、音声駆動データに対応する口形状の制御パラメータシーケンスに基づいて前記インタラクティブ対象の口形状変化を制御することができ、インタラクティブ対象が擬人化された発話状態を実現するようにすることができる。 When the voice-driven data includes the target syllable data, a predetermined mouth shape control parameter sequence corresponding to the target syllable data is acquired based on the syllable type to which the target syllable data belongs, and the interactive target corresponds to the target syllable data. It can be controlled to form a mouth shape. By the method described above, it is possible to control the change in the mouth shape of the interactive object based on the control parameter sequence of the mouth shape corresponding to the voice-driven data, so that the interactive object realizes an anthropomorphic utterance state. Can be done.

分割して得られた音節データは、複数の音節データであり得る。複数の音節データ中の各々の音節データに対して、当該音節データが特定の目標音節データとマッチングされるか否かを検索し、当該音節データが特定の目標音節データとマッチングされると、当該目標音節データに対応する所定の口形状の制御パラメータシーケンスを取得することができる。 The syllable data obtained by dividing can be a plurality of syllable data. For each syllable data in a plurality of syllable data, it is searched whether or not the syllable data is matched with a specific target syllable data, and when the syllable data is matched with a specific target syllable data, the said It is possible to acquire a control parameter sequence of a predetermined mouth shape corresponding to the target syllable data.

いくつかの実施例において、図４に示すように、ステップ２０３は、以下のステップをさらに含む。 In some embodiments, as shown in FIG. 4, step 203 further comprises the following steps:

ステップ２０３１において、前記目標データに対応する音声情報を確定する。 In step 2031, the voice information corresponding to the target data is determined.

ステップ２０３２において、前記音声情報を出力する時間情報を取得する。 In step 2032, the time information for outputting the voice information is acquired.

ステップ２０３３において、前記時間情報に基づいて前記目標データに対応する所定の動作の実行時間を確定する。 In step 2033, the execution time of a predetermined operation corresponding to the target data is determined based on the time information.

ステップ２０３４において、前記実行時間に基づいて、前記目標データに対応する制御パラメータシーケンスで前記インタラクティブ対象が前記所定の動作を実行するように制御する。 In step 2034, based on the execution time, the interactive target is controlled to perform the predetermined operation in the control parameter sequence corresponding to the target data.

前記音声駆動データに対応する音声情報に基づいて前記表示デバイスが音声を出力するように制御する場合、目標データに対応する音声情報を出力する時間情報を確定することができる。たとえば、前記目標データに対応する音声情報の出力を開始する時間、出力を終了する時間、および、時間長さを確定することができる。前記時間情報に基づいて前記目標データに対応する所定の動作の実行時間を確定することができ、前記実行時間内または実行時間の特定の範囲内で、前記目標データに対応する制御パラメータシーケンスによって、前記インタラクティブ対象が前記所定の動作を実行するように制御することができる。 When the display device is controlled to output voice based on the voice information corresponding to the voice-driven data, the time information for outputting the voice information corresponding to the target data can be determined. For example, the time for starting the output of the voice information corresponding to the target data, the time for ending the output, and the time length can be determined. The execution time of a predetermined operation corresponding to the target data can be determined based on the time information, and within the execution time or within a specific range of the execution time, by the control parameter sequence corresponding to the target data. The interactive object can be controlled to perform the predetermined operation.

本発明の実施例において、音声駆動データに基づいて音声を出力する時間長さは、複数の制御パラメータシーケンスに基づいてインタラクティブ対象が連続した所定の動作を実行するように制御する時間長さと、一致するかまたは類似である。また、各々の目標データに対して対応する音声を出力する時間長さも、対応する制御パラメータシーケンスに基づいてインタラクティブ対象が所定の動作を実行するように制御する時間長さと、一致するかまたは類似である。従って、インタラクティブ対象が発話する時間と動作を実行する時間とがマッチングされ、インタラクティブ対象の音声と動作とが同期および調整される。 In the embodiment of the present invention, the time length for outputting the voice based on the voice-driven data coincides with the time length for controlling the interactive object to perform a predetermined continuous operation based on a plurality of control parameter sequences. Or similar. Also, the length of time to output the corresponding audio for each target data matches or is similar to the length of time to control the interactive object to perform a given action based on the corresponding control parameter sequence. be. Therefore, the time during which the interactive object speaks and the time during which the action is performed are matched, and the voice of the interactive object and the action are synchronized and adjusted.

いくつかの実施例において、前記制御パラメータシーケンス中の各組の制御パラメータを所定の速度で呼び出して、前記インタラクティブ対象が各組の制御パラメータに対応する姿態を展示するようにすることができる。つまり、各々の目標データに対応する制御パラメータシーケンスは、常に一定の速度で実行される。 In some embodiments, the control parameters of each set in the control parameter sequence can be called at a predetermined speed so that the interactive object exhibits the appearance corresponding to each set of control parameters. That is, the control parameter sequence corresponding to each target data is always executed at a constant speed.

目標データに対応する音素の数がより少ないが、目標データにマッチングする所定の動作の制御パラメータシーケンスがより長い場合、つまり、インタラクティブ対象が目標データを話す時間がより短いが、動作を実行する時間がより長い場合、音声の出力が終了した同時に、当該制御パラメータシーケンスの呼び出しも停止し、当該所定の動作の実行を停止することができる。また、当該所定の動作の終了の姿態と、次の指定された動作の始める姿態とは、スムーズに遷移され、前記インタラクティブ対象の動作がスムーズで自然になるようにし、目標対象のインタラクティブエクスペリエンスを改善した。 If the number of phonemes corresponding to the target data is smaller, but the control parameter sequence of a given action that matches the target data is longer, that is, the interactive target spends less time speaking the target data, but the time to perform the action. If is longer, the call to the control parameter sequence can be stopped at the same time that the output of the voice is finished, and the execution of the predetermined operation can be stopped. In addition, the appearance of the end of the predetermined movement and the appearance of the start of the next specified movement are smoothly transitioned so that the movement of the interactive target becomes smooth and natural, and the interactive experience of the target target is improved. did.

いくつかの実施例において、各々の目標データに対して、当該目標データに対応する所定の動作の実行時間に基づいて、当該目標データに対応する制御パラメータシーケンスの呼び出し速度を確定し、当該目標データに対応する制御パラメータシーケンス中の各組の制御パラメータを前記呼び出し速度で呼び出して、前記インタラクティブ対象が各組の制御パラメータに対応する姿態を展示するようにすることを含む。 In some embodiments, for each target data, the call speed of the control parameter sequence corresponding to the target data is determined based on the execution time of the predetermined operation corresponding to the target data, and the target data is determined. Containing that the control parameters of each set in the corresponding control parameter sequence are called at the calling speed so that the interactive object exhibits the appearance corresponding to each set of control parameters.

実行時間がより短い場合、制御パラメータシーケンスの呼び出し速度が比較的に高いし、逆の場合は、より低い。制御パラメータシーケンスの呼び出し速度は、インタラクティブ対象の動作を実行する速度を決定する。たとえば、制御パラメータシーケンスをより高い速度で呼び出す場合、インタラクティブ対象の姿態変化の速度が対応的により速く、より短い時間で所定の動作を完了することができる。 If the execution time is shorter, the call speed of the control parameter sequence is relatively high, and vice versa. The call speed of the control parameter sequence determines the speed at which the interactive target operation is performed. For example, when the control parameter sequence is called at a higher speed, the rate of change of the shape of the interactive object is correspondingly faster, and a predetermined operation can be completed in a shorter time.

いくつかの実施例において、目標データを出力する音声の時間に基づいて所定の動作を実行する時間を調整することができる。たとえば、圧縮または拡張することができる、インタラクティブ対象が所定の動作を実行する時間が目標データを出力する音声の時間にマッチングされるようにして、インタラクティブ対象の音声と動作とが同期および調整される。 In some embodiments, the time to perform a predetermined action can be adjusted based on the time of the voice that outputs the target data. For example, the audio and motion of an interactive object are synchronized and coordinated so that the time that the interactive object performs a given action, which can be compressed or expanded, is matched to the time of the audio that outputs the target data. ..

１例において、前記目標データに対応する音素音声を出力する前の所定の時点で基づいて、前記目標データに対応する制御パラメータシーケンスを呼び出し始めることができ、前記インタラクティブ対象が制御パラメータシーケンスに対応する所定の動作を実行し始めるようにすることができる。 In one example, the control parameter sequence corresponding to the target data can be started to be called based on a predetermined time point before the phoneme voice corresponding to the target data is output, and the interactive target corresponds to the control parameter sequence. It is possible to start performing a predetermined operation.

たとえば、インタラクティブ対象が目標データに対応する音声を出力し始める前の極めて短い時間、たとえば０.１秒で、目標データに対応する制御パラメータシーケンスを呼び出し始めることによって、インタラクティブ対象が所定の動作を実行し始めるようしに、実在の人物の発話の状態により合致し、インタラクティブ対象の発話がより自然で鮮やかになるようにし、目標対象のインタラクティブ体験を改善した。 For example, the interactive object performs a given action by starting to call the control parameter sequence corresponding to the target data in a very short time, for example 0.1 seconds, before the interactive object starts outputting the voice corresponding to the target data. We improved the interactive experience of the target by making the speech of the interactive target more natural and vivid, more closely matched to the state of speech of the real person, as if to start.

いくつかの実施例において、複数の目標データの中の隣接する目標データがオーバーラップしていることが検出された場合、語順に従って（つまり、受信された音声駆動データの自然的な配列順序である）前に配列された目標データに対応する制御パラメータシーケンスに基づいて、前記インタラクティブ対象が対応する所定の動作を実行するように制御ことによって、当該目標データとオーバーラップしている後に配列された目標データを無視する。 In some embodiments, if it is detected that adjacent target data in a plurality of target data overlap, it is according to the word order (that is, the natural arrangement order of the received voice-driven data). ) Targets arranged after overlapping with the target data by controlling the interactive object to perform the corresponding predetermined action based on the control parameter sequence corresponding to the previously arranged target data. Ignore the data.

前記音声駆動データに含まれている各々の目標データをアレイの形で格納することができ、各々の目標データはそのアレイ中の要素である。形態素間を異なる方法に従って組み合わせて異なる目標データを得ることができるため、複数の目標データ中の隣接する２つの目標データ間にオーバーラップしている部分が存在することができる可能性ができることに注意すべきである。たとえば、音声駆動データに対応するテキストが「天気が本当にいい」である場合、それに対応する目標データは、それぞれ「１、天」、「２、天気」、「３、本当にいい」である。隣接する目標データ１と２の場合、その中に共同の形態素である「天」が含まれており、また、目標データ１と２が指定された同じ動作にマッチングされることができ、たとえば指で上方を指す動作にマッチングされることができる。 Each target data contained in the voice-driven data can be stored in the form of an array, and each target data is an element in the array. Note that since different target data can be obtained by combining morphemes according to different methods, it is possible that there may be overlapping parts between two adjacent target data in multiple target data. Should. For example, if the text corresponding to the voice-driven data is "weather is really good", the corresponding target data are "1, heaven", "2, weather", "3, really good", respectively. In the case of adjacent target data 1 and 2, the joint morpheme "heaven" is included in it, and target data 1 and 2 can be matched to the same specified action, for example a finger. Can be matched to an upward pointing motion.

各々の目標データに対してそれぞれ優先度を設定し、優先度に基づいてオーバーラップしている目標データ中のどれを実行するかを確定することができる。 Priority can be set for each target data, and it is possible to determine which of the overlapping target data to be executed based on the priority.

１例において、最初に出現される目標データの優先度をその後に出現される目標データよりも高く設定することができる。「天気が本当にいい」のような例の場合、「天」の優先度が「天気」よりも高いため、「天」に対応する所定の動作の制御パラメータシーケンスに基づいて前記インタラクティブ対象が所定の動作を実行するように制御し、残りの形態素である「気」を無視することができ（目標データである「天」とオーバーラップしている目標データである「天気」を無視する）、続いて、直接「本当にいい」にマッチングする。 In one example, the priority of the target data that appears first can be set higher than the target data that appears after that. In the case of an example such as "weather is really good", the priority of "heaven" is higher than that of "weather", so that the interactive target is determined based on the control parameter sequence of the predetermined operation corresponding to "heaven". You can control to perform the action and ignore the remaining morpheme "Qi" (ignoring the target data "Weather" that overlaps with the target data "Heaven"), and so on. And directly match "really good".

本発明の実施例において、隣接する目標データがオーバーラップしている場合に対して、マッチングルールを設定して、インタラクティブ対象が所定の動作を繰り返して実行してしまうことを回避することができる。 In the embodiment of the present invention, when adjacent target data overlap, it is possible to set a matching rule to prevent the interactive target from repeatedly executing a predetermined operation.

いくつかの実施例において、前記複数の目標データの中の隣接する目標データに対応する制御パラメータシーケンスの実行時間がオーバーラップしていることが検出された場合、前記隣接する目標データに対応する制御パラメータシーケンスのオーバーラップしている部分を融合することができる。 In some embodiments, when it is detected that the execution times of the control parameter sequences corresponding to the adjacent target data in the plurality of target data overlap, the control corresponding to the adjacent target data is detected. Overlapping parts of the parameter sequence can be fused.

１つの実施例において、制御パラメータシーケンスのオーバーラップしている部分に対して平均または加重平均を実行することによって、オーバーラップしている制御パラメータシーケンスを融合することができる。 In one embodiment, overlapping control parameter sequences can be fused by performing averaging or weighted averaging on overlapping parts of the control parameter sequences.

もう１実施例において、補間法を利用して、前の動作の特定のフレーム（たとえば、当該動作に対応する第１制御パラメータシーケンスのＮ番目の組の制御パラメータｎである）を、遷移時間に従って次の動作に補間遷移する。次の動作中の第１フレームとオーバーラップし始めると（たとえば、前記制御パラメータｎが同じである、次の動作に対応する第２制御パラメータシーケンス中の第１組の制御パラメータ１を検索するか、または、次の動作を前記特定のフレームに補間することで、補間遷移した後の２つの動作の合計の実行時間と該当する音声データ/テキストデータの再生または表示時間が同一になるようにする）、前の１つの動作中の特定のフレーム後のすべてのフレームを無視し、直接次の動作を実行することによって、オーバーラップしている制御パラメータシーケンスの融合を実現することができる。 In another embodiment, the interpolation method is used to perform a specific frame of the previous operation (for example, the Nth set of control parameters n of the first control parameter sequence corresponding to the operation) according to the transition time. Interpolation transition to the next operation. When it starts to overlap with the first frame in the next operation (for example, whether to search for the first set of control parameters 1 in the second control parameter sequence corresponding to the next operation having the same control parameter n). Or, by interpolating the next operation into the specific frame, the total execution time of the two operations after the interpolation transition and the playback or display time of the corresponding voice data / text data are made the same. ), By ignoring all frames after a particular frame during the previous operation and directly performing the next operation, fusion of overlapping control parameter sequences can be achieved.

前記隣接する目標データに対応する制御パラメータシーケンスのオーバーラップしている部分を融合することによって、インタラクティブ対象の動作間がスムーズに遷移するようにすることができ、前記インタラクティブ対象の動作がスムーズで自然になり、目標対象のインタラクティブエクスペリエンスを改善した。 By fusing the overlapping parts of the control parameter sequences corresponding to the adjacent target data, it is possible to make a smooth transition between the motions of the interactive object, and the motion of the interactive object is smooth and natural. And improved the interactive experience of the target.

いくつかの実施例において、前記音声駆動データ中の各々の目標データ以外の他のデータを、たとえば第１データと呼ぶことができる。前記第１データの音響特徴にマッチングする姿態制御パラメータに基づいて、前記インタラクティブ対象の姿態を制御することができる。 In some embodiments, data other than the respective target data in the voice-driven data can be referred to as, for example, first data. The appearance of the interactive object can be controlled based on the appearance control parameter matching the acoustic feature of the first data.

前記音声駆動データがオーディオデータを含むことに応答して、前記第１データに含まれている音声フレームシーケンスを取得し、少なくとも１つの音声フレームに対応する音響特徴を取得し、前記音響特徴に対応する前記インタラクティブ対象の姿態制御パラメータ、たとえば姿態制御ベクトルに基づいて、前記インタラクティブ対象の姿態を制御することができる。 In response to the voice-driven data including audio data, the voice frame sequence contained in the first data is acquired, the acoustic feature corresponding to at least one voice frame is acquired, and the sound feature corresponds to the sound feature. The appearance of the interactive object can be controlled based on the appearance control parameters of the interactive object, for example, the appearance control vector.

前記音声駆動データがテキストデータを含むことに応答して、テキストデータ中の形態素に対応する音素に基づいて、前記音素に対応する音響特徴を取得し、前記音響特徴に対応する前記インタラクティブ対象の姿態制御パラメータ、たとえば姿態制御ベクトルに基づいて、前記インタラクティブ対象の姿態を制御することができる。 In response to the voice-driven data including text data, the acoustic features corresponding to the phonemes are acquired based on the phonemes corresponding to the morphemes in the text data, and the appearance of the interactive object corresponding to the acoustic features. The shape of the interactive object can be controlled based on a control parameter, for example, a shape control vector.

本発明の実施例において、音響特徴は、基本周波数特徴、共通ピーク特徴、メル周波数係数（ＭｅｌＦｒｅｑｕｅｎｃｙＣｏｆｆｉｃｉｅｎｔ、ＭＦＣＣ）などのような、音声情感に関連する特徴であり得る。 In the embodiments of the present invention, the acoustic feature may be a feature related to voice emotion, such as a fundamental frequency feature, a common peak feature, a Mel Frequency Cofficient, MFCC, and the like.

前記姿態制御パラメータ値と前記音声セグメントの音声フレームシーケンスとがマッチングされるため、前記第１データに基づいて出力した音声および/または展示したテキストと、前記姿態パラメータ値に基づいて制御したインタラクティブ対象の姿態とが、同期化される場合、インタラクティブ対象が行った姿態と出力した音声および/またはテキストとが同期化され、目標対象に前記インタラクティブ対象と話している感覚を与える。また、前記姿態制御ベクトルが音声を出力する音響特徴に関連しているため、前記姿態制御ベクトルに基づいて駆動することによって、インタラクティブ対象の表情と肢体動作に感情的な要素が加わり、インタラクティブ対象が発話する過程をより自然で鮮やかにし、目標対象のインタラクティブ体験を改善した。 Since the shape control parameter value and the voice frame sequence of the voice segment are matched, the voice output based on the first data and / or the exhibited text and the interactive object controlled based on the shape parameter value. When the appearance is synchronized, the appearance performed by the interactive object and the output voice and / or text are synchronized, giving the target object the feeling of talking to the interactive object. In addition, since the shape control vector is related to the acoustic feature that outputs voice, by driving based on the shape control vector, an emotional element is added to the facial expression and limb movement of the interactive target, and the interactive target becomes. The process of speaking has been made more natural and vivid, and the interactive experience of the target has been improved.

いくつかの実施例において、前記音声駆動データは、少なくとも１つの目標データおよび前記目標データ以外の第１データを含む。前記第１データに対しては、前記第１データの音響特徴に基づいて姿態制御パラメータを確定して、前記インタラクティブ対象の姿態を制御する。前記目標データに対しては、前記目標データにマッチングする所定の動作の制御パラメータシーケンスに基づいて、前記インタラクティブ対象が前記所定の動作を行うように制御する。 In some embodiments, the voice-driven data includes at least one target data and first data other than the target data. For the first data, the appearance control parameters are determined based on the acoustic characteristics of the first data, and the appearance of the interactive object is controlled. With respect to the target data, the interactive target is controlled to perform the predetermined operation based on a control parameter sequence of a predetermined operation matching the target data.

図５は、本発明の少なくとも１つの実施例に係るインタラクティブ対象の駆動装置の構成を示す模式図であり、図５に示すように、当該装置は、表示デバイスに展示されているインタラクティブ対象の音声駆動データを取得するための第１取得ユニット３０１と、前記音声駆動データに含まれている目標データに基づいて、前記目標データにマッチングするインタラクティブ対象の所定の動作の制御パラメータシーケンスを取得するための第２取得ユニット３０２と、得られた制御パラメータシーケンスに基づいて前記インタラクティブ対象が前記所定の動作を実行するように制御するための駆動ユニット３０３と、を含む。 FIG. 5 is a schematic diagram showing a configuration of an interactive object driving device according to at least one embodiment of the present invention, and as shown in FIG. 5, the device is an interactive object audio display displayed on a display device. Based on the first acquisition unit 301 for acquiring drive data and the target data included in the voice drive data, for acquiring a control parameter sequence of a predetermined operation of an interactive target matching the target data. It includes a second acquisition unit 302 and a drive unit 303 for controlling the interactive object to perform the predetermined operation based on the obtained control parameter sequence.

いくつかの実施例において、前記装置は、前記音声駆動データに対応する音声情報に基づいて前記表示デバイスが音声を出力するように制御し、および/または、前記音声駆動データに対応するテキスト情報に基づいてテキストを展示するための出力ユニットを、さらに備える。 In some embodiments, the device controls the display device to output voice based on the voice information corresponding to the voice driven data, and / or text information corresponding to the voice driven data. It also has an output unit for displaying texts based on it.

いくつかの実施例において、前記駆動ユニットは、具体的に、前記目標データに対応する音声情報を確定し、前記音声情報を出力する時間情報を取得し、前記時間情報に基づいて前記目標データに対応する所定の動作の実行時間を確定し、前記実行時間に基づいて、前記目標データに対応する制御パラメータシーケンスによって、前記インタラクティブ対象が前記所定の動作を実行するように制御する。 In some embodiments, the drive unit specifically determines audio information corresponding to the target data, acquires time information for outputting the audio information, and converts the target data into the target data based on the time information. The execution time of the corresponding predetermined operation is determined, and based on the execution time, the interactive target is controlled to execute the predetermined operation by the control parameter sequence corresponding to the target data.

いくつかの実施例において、前記制御パラメータシーケンスは、１組または複数組の制御パラメータを含み、前記駆動ユニットは、前記実行時間に基づいて、前記目標データに対応する制御パラメータシーケンスによって、前記インタラクティブ対象が前記所定の動作を実行するように制御するときに、具体的に、前記制御パラメータシーケンス中の各組の制御パラメータを所定の速度で呼び出して、前記インタラクティブ対象が各組の制御パラメータに対応する姿態を展示するようにする。 In some embodiments, the control parameter sequence comprises one or more sets of control parameters, wherein the drive unit is the interactive object with the control parameter sequence corresponding to the target data based on the execution time. Specifically, when controlling to perform the predetermined operation, the control parameters of each set in the control parameter sequence are called at a predetermined speed, and the interactive object corresponds to the control parameters of each set. Try to display the figure.

いくつかの実施例において、前記制御パラメータシーケンスは、１組または複数組の制御パラメータを含み、前記駆動ユニットは、前記実行時間に基づいて、前記目標データに対応する制御パラメータシーケンスによって、前記インタラクティブ対象が前記所定の動作を実行するように制御するときに、具体的に、前記実行時間に基づいて、前記制御パラメータシーケンスの呼び出し速度を確定し、前記制御パラメータシーケンス中の各組の制御パラメータを前記呼び出し速度で呼び出して、前記インタラクティブ対象が各組の制御パラメータに対応する姿態を出力するようにする。 In some embodiments, the control parameter sequence comprises one or more sets of control parameters, and the drive unit is the interactive object with the control parameter sequence corresponding to the target data based on the execution time. When controlling to execute the predetermined operation, specifically, the calling speed of the control parameter sequence is determined based on the execution time, and each set of control parameters in the control parameter sequence is set as described above. It is called at the calling speed so that the interactive target outputs the state corresponding to each set of control parameters.

いくつかの実施例において、前記制御パラメータシーケンスは、１組または複数組の制御パラメータを含み、前記駆動ユニットは、前記実行時間に基づいて、前記目標データに対応する制御パラメータシーケンスによって、前記インタラクティブ対象が前記所定の動作を実行するように制御するときに、具体的に、前記目標データに対応する音声情報を出力する前の所定の時点で、前記目標データに対応する制御パラメータシーケンスを呼び出し始めて、前記インタラクティブ対象が前記所定の動作を実行し始めるようにする。 In some embodiments, the control parameter sequence comprises one or more sets of control parameters, wherein the drive unit is the interactive object with the control parameter sequence corresponding to the target data based on the execution time. When controlling to perform the predetermined operation, specifically, at a predetermined time before outputting the voice information corresponding to the target data, the control parameter sequence corresponding to the target data is started to be called. Make the interactive object begin to perform the predetermined action.

いくつかの実施例において、前記音声駆動データは、複数の目標データを含み、前記駆動ユニットは、具体的に、前記複数の目標データの中の隣接する目標データがオーバーラップしていることが検出されたことに応答して、語順に従って前に配列された目標データに対応する制御パラメータシーケンスに基づいて、前記インタラクティブ対象が前記所定の動作を実行するように制御する。 In some embodiments, the voice-driven data includes a plurality of target data, and the drive unit specifically detects that adjacent target data in the plurality of target data overlap. In response to this, the interactive target is controlled to perform the predetermined operation based on the control parameter sequence corresponding to the target data previously arranged according to the word order.

いくつかの実施例において、前記音声駆動データは、複数の目標データを含み、前記駆動ユニットは、具体的に、前記複数の目標データの中の隣接する目標データに対応する制御パラメータシーケンスの実行時間がオーバーラップしていることが検出されたことに応答して、前記隣接する目標データに対応する制御パラメータシーケンスのオーバーラップしている部分を融合する。 In some embodiments, the voice-driven data comprises a plurality of target data, the drive unit specifically, execution time of a control parameter sequence corresponding to adjacent target data in the plurality of target data. In response to the detection of overlapping, the overlapping parts of the control parameter sequence corresponding to the adjacent target data are fused.

いくつかの実施例において、前記第２取得ユニットは、具体的に、前記音声駆動データがオーディオデータを含むことに応答して、前記オーディオデータに対して音声認識を実行し、認識した音声内容に基づいて、前記オーディオデータに含まれている目標データを確定し、前記音声駆動データがテキストデータを含むことに応答して、前記テキストデータに含まれているテキスト内容に基づいて、前記テキストデータに含まれている目標データを確定する。 In some embodiments, the second acquisition unit specifically performs voice recognition on the audio data in response to the voice driving data including audio data, and the recognized voice content. Based on, the target data included in the audio data is determined, and in response to the voice-driven data including the text data, the text data is based on the text content contained in the text data. Establish the included target data.

いくつかの実施例において、前記目標データは、目標音節データを含み、前記第２取得ユニットは、具体的に、前記音声駆動データに含まれている音節データが目標音節データにマッチングされるか否かを確定し、ここで、前記目標音節データは事前に分割された互いに異なる音節タイプに属し、互いに異なる音節タイプは互いに異なる所定の口形状に対応し、互いに異なる所定の口形状に対して対応する制御パラメータシーケンスが設定されており、また、前記音節データが前記目標音節データにマッチングされることに応答して、マッチングされる前記目標音節データが属している音節タイプに基づいて、マッチングされる前記目標音節データに対応する所定の口形状の制御パラメータシーケンスを取得する。 In some embodiments, the target data includes target syllable data, and the second acquisition unit specifically determines whether or not the syllable data contained in the voice-driven data is matched with the target syllable data. Here, the target syllable data belongs to the pre-divided and different syllable types, and the different syllable types correspond to the different predetermined mouth shapes and the different predetermined mouth shapes. The control parameter sequence to be matched is set, and in response to the matching of the syllable data with the target syllable data, the matching is performed based on the syllable type to which the matching target syllable data belongs. Acquire a control parameter sequence of a predetermined mouth shape corresponding to the target syllable data.

いくつかの実施例において、前記装置は、前記音声駆動データの中の目標データ以外の第１データを取得し、前記第１データの音響特徴を取得し、前記音響特徴にマッチングする姿態制御パラメータを取得し、前記姿態制御パラメータに基づいて前記インタラクティブ対象の姿態を制御するための姿態制御ユニットをさらに備える。 In some embodiments, the apparatus acquires first data other than the target data in the voice-driven data, acquires the acoustic features of the first data, and sets a shape control parameter that matches the acoustic features. Further, a shape control unit for acquiring and controlling the shape of the interactive object based on the shape control parameter is provided.

本明細書の少なくとも１つの実施例は、電子デバイスをさらに提供し、図６に示すように、前記デバイスは、メモリとプロセッサとを備え、メモリは、プロセッサ上で運行可能なコンピュータ命令を記憶し、プロセッサは、前記コンピュータ命令が実行されるときに、本発明の任意の実施例に記載のインタラクティブ対象の駆動方法を実現する。本明細書の少なくとも１つの実施例は、コンピュータプログラムが記憶されているコンピュータ可読記録媒体をさらに提供し、前記プログラムがプロセッサによって実行されるときに、本発明の任意の実施例に記載のインタラクティブ対象の駆動方法を実現する。 At least one embodiment of the present specification further provides an electronic device, which, as shown in FIG. 6, comprises a memory and a processor, the memory storing computer instructions that can be run on the processor. , The processor realizes the method of driving an interactive object according to any embodiment of the present invention when the computer instruction is executed. At least one embodiment of the present specification further provides a computer-readable recording medium in which a computer program is stored and is an interactive subject according to any embodiment of the invention when the program is executed by a processor. Realize the driving method of.

当業者は、本発明の１つまたは複数の実施例は、方法、システム、または、コンピュータプログラム製品として提供することができることを了解すべきである。したがって、本発明の１つまたは複数の実施例は、完全なハードウェアの実施例、完全なソフトウェアの実施例、または、ソフトウェアとハードウェアを組み合わせた実施例の形式を使用することができる。また、本発明の１つまたは複数の実施例は、コンピュータ利用可能なプログラムコードを含む１つまたは複数のコンピュータ利用可能な記録媒体（ディスクメモリ、ＣＤ－ＲＯＭ、光学メモリなどを含むが、これらに限定されない）上で実施されるコンピュータプログラム製品の形式を使用することができる。 Those skilled in the art should appreciate that one or more embodiments of the invention may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the invention may use the form of complete hardware embodiments, complete software embodiments, or a combination of software and hardware embodiments. Also, one or more embodiments of the present invention include, but include, one or more computer-usable recording media (disk memory, CD-ROM, optical memory, etc.) containing computer-usable program code. You can use the format of the computer program product implemented on (but not limited to).

本発明における各実施例は、いずれも、漸進的な方法を使用して叙述され、各実施例同士の間の同一または類似な一部は互いに参照することができ、各々の実施例では他の実施例との異なるところに焦点を合わせて説明した。特に、データ処理デバイスの実施例の場合、基本的に方法の実施例と類似であるため、比較的に的に簡単に叙述したが、関連するところは方法の実施例の一部の説明を参照すればよい。 Each of the embodiments in the present invention is described using a gradual method, the same or similar parts between the embodiments can be referred to each other, and the other embodiments can be referred to each other. The explanation focused on the differences from the examples. In particular, in the case of the example of the data processing device, since it is basically similar to the example of the method, it is described relatively briefly, but for the relevant part, refer to the explanation of a part of the example of the method. do it.

上記で本発明の特定の実施例を叙述した。他の実施例は、添付する「特許請求の範囲」の範囲内にいる。いくつかの場合、特許請求の範囲に記載の行為またはステップは、実施例と異なる順序に従って実行されることができ、このときにも依然として期待する結果が実現されることができる。また、図面で描かれた過程は、期待する結果するために、必ずとしても、示された特定の順序または連続的な順序を必要としない。いくつかの実施形態において、マルチタスク処理および並列処理も可能であるか、または、有益であり得る。 Specific embodiments of the invention have been described above. Other examples are within the scope of the attached "claims". In some cases, the actions or steps described in the claims may be performed in a different order than in the examples, and the expected result may still be achieved. Also, the process depicted in the drawings does not necessarily require the specific order or sequential order shown to achieve the expected result. In some embodiments, multitasking and parallel processing are also possible or may be beneficial.

本発明における主題および機能操作の実施例は、デジタル電子回路、有形コンピュータソフトウェアまたはファームウェア、本発明に開示される構成およびその構造的同等物を含むコンピュータハードウェア、または、それらの１つまたは複数の組み合わせで、実現されることができる。本発明における主題の実施例は、１つまたは複数のコンピュータプログラムとして実現されることができ、すなわち、有形の非一時的プログラムキャリア上に符号化されて、データ処理装置によって実行されるか、または、データ処理装置の操作を制御するための、コンピュータプログラム命令中の１つまたは複数のモジュールとして実現されることができる。代替的または追加的に、プログラム命令は、手動で生成する伝播信号上に符号化されることができ、例えば、機械が生成する電気信号、光信号、または、電磁信号に符号化されることができる。当該信号は、情報を符号化して適切な受信機装置に伝送して、データ処理装置によって実行されるようにするために、生成される。コンピュータ記録媒体は、機械可読記憶デバイス、機械可読記憶基板、ランダムにまたはシリアルアクセスメモリデバイス、または、それらの１つまたは複数の組み合わせであり得る。 Examples of the subject matter and functional operation in the present invention are digital electronic circuits, tangible computer software or firmware, computer hardware including the configurations and structural equivalents thereof disclosed in the present invention, or one or more thereof. It can be realized by combination. The embodiments of the subject in the present invention can be realized as one or more computer programs, i.e., encoded on a tangible non-temporary program carrier and executed by a data processing apparatus. , Can be implemented as one or more modules in a computer program instruction to control the operation of the data processing device. Alternatively or additionally, the program instruction can be encoded on a manually generated propagation signal, for example, on a machine-generated electrical, optical, or electromagnetic signal. can. The signal is generated to encode the information and transmit it to the appropriate receiver device for execution by the data processing device. The computer recording medium can be a machine-readable storage device, a machine-readable storage board, a random or serial access memory device, or a combination thereof.

本発明における処理と論理フローは、１つまたは複数のコンピュータプログラムを実行する１つまたは複数のプログラム可能なコンピュータによって実行されることができ、入力データに基づいて操作を実行して出力を生成することによって該当する機能を実行する。前記処理と論理フローは、さらに、例えば、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）またはＡＳＩＣ（専用集積回路）などの専用論理回路によって実行されることができ、また、装置も専用論理回路として実現されることができる。 The processing and logical flow in the present invention can be performed by one or more programmable computers running one or more computer programs, performing operations based on input data to produce output. By doing so, the corresponding function is executed. The processing and logic flow can be further executed by a dedicated logic circuit such as FPGA (field programmable gate array) or ASIC (dedicated integrated circuit), and the device can also be realized as a dedicated logic circuit. Can be done.

コンピュータプログラムの実行に適したコンピュータは、例えば、汎用、および／または、専用マイクロプロセッサ、または、いかなる他の種類の中央処理ユニットを含む。一般的に、中央処理ユニットは、読み取り専用メモリ、および／または、ランダムアクセスメモリから、命令とデータを受信することになる。コンピュータの基本コンポーネントは、命令を実施または実行するための中央処理ユニット、および、命令とデータを記憶するための１つまたは複数のメモリデバイスを含む。一般的に、コンピュータは、磁気ディスク、磁気光学ディスク、または、光学ディスクなどの、データを記憶するための１つまたは複数の大容量記憶デバイスをさらに含むか、または、操作可能に当該大容量記憶デバイスと結合されてデータを受信するかまたはデータを伝送するか、または、その両方を兼有する。しかしながら、コンピュータは、必ずとして、このようなデバイスを有するわけではない。なお、コンピュータは、もう１デバイスに埋め込まれることができ、例えば、携帯電話、パーソナルデジタルアシスタント（ＰＤＡ）、モバイルオーディオまたはビデオおプレーヤー、ゲームコンソール、グローバルポジショニングシステム（ＧＰＳ）レジーバー、または、汎用シリアルバス（ＵＳＢ）フラッシュドライブなどのポータブル記憶デバイスに埋め込まれることができ、これらデバイスはいくつかの例に過ぎない。 Suitable computers for running computer programs include, for example, general purpose and / or dedicated microprocessors, or any other type of central processing unit. Generally, the central processing unit will receive instructions and data from read-only memory and / or random access memory. The basic components of a computer include a central processing unit for executing or executing instructions, and one or more memory devices for storing instructions and data. In general, a computer further includes or is operable with one or more mass storage devices for storing data, such as magnetic disks, magnetic optical disks, or optical disks. Combined with a device to receive data, transmit data, or both. However, computers do not necessarily have such devices. The computer can be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, Global Positioning System (GPS) receiver, or general purpose serial bus. It can be embedded in portable storage devices such as (USB) flash drives, and these devices are just a few examples.

コンピュータプログラム命令とデータの記憶に適したコンピュータ可読媒体は、すべての形式の不揮発性メモリ、媒介、および、メモリデバイスを含み、例えば、半導体メモリデバイス（例えば、ＥＰＲＯＭ、ＥＥＰＲＯＭ、および、フラッシュデバイス）、磁気ディスク（例えば、内部ハードディスクまたは移動可能ディスク）、磁気光学ディスク、および、ＣＤＲＯＭ、および、ＤＶＤ－ＲＯＭディスクを含む。プロセッサとメモリは、専用論理回路によって補完されるかまたは専用論理回路に組み込まれることができる。 Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, intermediaries, and memory devices, such as semiconductor memory devices (eg, EPROMs, EEPROMs, and flash devices). Includes magnetic discs (eg, internal hard disks or mobile discs), magnetic optical discs, and CD ROMs, and DVD-ROM discs. Processors and memory can be complemented by dedicated logic circuits or incorporated into dedicated logic circuits.

本発明は、多くの具体的な実施の細部を含むが、これらを本発明の範囲または保護しようとする範囲を限定するものとして解釈すべきではなく、主に本発明のいくつかの実施例の特徴を叙述するために使用される。本発明の複数の実施例中の特定の特徴は、単一の実施例に組み合わせて実施されることもできる。他方、単一の実施例中の各種の特徴は、複数の実施例で別々に実施されるかまたはいかなる適切なサブ組み合わせで実施されることもできる。なお、特徴が上記のように特定の組み合わせで役割を果たし、また最初からこのように保護すると主張したが、保護すると主張した組み合わせからの１つまたは複数の特徴は、場合によって当該組み合わせから除外されることができ、また保護すると主張した組み合わせはサブ組み合わせるまたはサブ組み合わせる変形に向けることができる。 The present invention contains many specific implementation details, which should not be construed as limiting the scope of the invention or the scope of which it seeks to protect, primarily of some embodiments of the invention. Used to describe features. Specific features in a plurality of embodiments of the present invention may also be implemented in combination with a single embodiment. On the other hand, the various features in a single embodiment can be implemented separately in multiple embodiments or in any suitable sub-combination. It should be noted that the features play a role in a particular combination as described above and are claimed to be protected in this way from the beginning, but one or more features from the combination claimed to be protected may be excluded from the combination in some cases. And the combinations claimed to be protected can be sub-combined or directed to sub-combination variants.

類似的に、図面で特定の順序に従って操作を描いたが、これはこれら操作を示した特定の順序にしたがって実行するかまたは順次に実行するように要求するか、または、例示したすべての操作が実行されることによって期待する結果が実現されると要求することであると理解すべきではない。場合によっては、マルチタスクおよび並列処理が有利である可能性がある。なお、上記の実施例中の各種のシステムモジュールとコンポーネントの分離は、すべての実施例でいずれもこのように分離されなければならないと理解すべきではないし、また、叙述したプログラムコンポーネントとシステムは、一般的に、一緒に単一のソフトウェア製品に統合されるか、または、複数のソフトウェア製品にパッケージされることができることを理解すべきである。 Similarly, the drawings depict operations in a specific order, which either requires them to be performed in a specific order that indicates them, or requires them to be performed in sequence, or all the operations illustrated. It should not be understood that it is a requirement that the expected result be achieved by being carried out. In some cases, multitasking and parallel processing can be advantageous. It should not be understood that the separation of the various system modules and components in the above embodiments must be separated in this way in all embodiments, and the described program components and systems are: In general, it should be understood that they can be integrated together into a single software product or packaged into multiple software products.

したがって、主題の特定の実施例がすでに叙述された。他の実施例は、添付する「特許請求の範囲」の範囲内にある。場合によっては、特許請求の範囲に記載されている動作は、異なる順序によって実行されても、依然として期待する結果が実現されることができる。なお、図面で描かれた処理は、期待する結果を実現するために、必ずとして、示めされた特定の順序または順次を必要としない。一部の実現において、マルチタスクおよび並列処理が有益である可能性がある。 Therefore, specific examples of the subject have already been described. Other examples are within the scope of the attached "Claims". In some cases, the actions described in the claims may be performed in different orders and still achieve the expected results. It should be noted that the processes depicted in the drawings do not necessarily require the specific order or sequence shown to achieve the expected results. Multitasking and parallel processing can be beneficial in some implementations.

上記は、本発明のいくつかの実施例に過ぎず、本発明を限定するために使用されるものではない。本発明の精神と原則の範囲内で行われたいかなる修正、同等の置換、改良などは、いずれも本発明の１つまたは複数の実施例の範囲に含まれるべきである。
The above are only a few embodiments of the invention and are not used to limit the invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the invention should be included within the scope of one or more embodiments of the invention.

Claims

It is a driving method for interactive objects.
Acquiring the audio-driven data of the interactive object displayed on the display device,
Acquiring a control parameter sequence of a predetermined operation of an interactive target matching the target data based on the target data included in the voice-driven data, and
A method of driving an interactive object, comprising controlling the interactive object to perform the predetermined operation based on the obtained control parameter sequence.

Further, controlling the display device to output voice based on the voice information corresponding to the voice-driven data, and / or displaying the text based on the text information corresponding to the voice-driven data. The driving method for an interactive object according to claim 1, wherein the method includes.

Controlling the interactive object to perform the predetermined operation based on the obtained control parameter sequence can be performed.
Determining the voice information corresponding to the target data and
Acquiring the time information to output the voice information and
Determining the execution time of a predetermined operation corresponding to the target data based on the time information.
The invention according to claim 1 or 2, wherein the interactive object is controlled to perform the predetermined operation by a control parameter sequence corresponding to the target data based on the execution time. How to drive an interactive object.

The control parameter sequence comprises one or more sets of control parameters.
Controlling the interactive object to perform the predetermined operation by the control parameter sequence corresponding to the target data based on the execution time is possible.
3. The third aspect of the present invention comprises calling each set of control parameters in the control parameter sequence at a predetermined speed so that the interactive object exhibits a state corresponding to each set of control parameters. Described how to drive an interactive object.

The control parameter sequence comprises one or more sets of control parameters.
Controlling the interactive object to perform the predetermined operation by the control parameter sequence corresponding to the target data based on the execution time is possible.
Determining the call speed of the control parameter sequence based on the execution time,
A claim comprising: calling each set of control parameters in the control parameter sequence at the calling speed so that the interactive object outputs a form corresponding to each set of control parameters. 3. The method for driving an interactive object according to 3.

Controlling the interactive object to perform the predetermined operation by the control parameter sequence corresponding to the target data based on the execution time is possible.
Including starting to call the control parameter sequence corresponding to the target data at a predetermined time point before outputting the voice information corresponding to the target data so that the interactive target starts to perform the predetermined operation. 3. The driving method for an interactive object according to claim 3.

The voice-driven data includes a plurality of target data, and includes a plurality of target data.
Controlling the interactive object to perform the predetermined operation based on the obtained control parameter sequence can be performed.
The interactive based on the control parameter sequence corresponding to the previously arranged target data according to word order in response to the detection that adjacent target data in the plurality of target data overlap. The method for driving an interactive object according to any one of claims 1 to 6, wherein the object is controlled to perform the predetermined operation, and includes.

The voice-driven data includes a plurality of target data, and includes a plurality of target data.
Controlling the interactive object to perform the predetermined operation based on the control parameter sequence corresponding to the target data can be performed.
In response to the detection that the execution times of the control parameter sequences corresponding to the adjacent target data in the plurality of target data overlap each other, the control parameter sequence corresponding to the adjacent target data The driving method for an interactive object according to any one of claims 1 to 6, wherein the overlapping portions are fused.

Acquiring a control parameter sequence of a predetermined operation of an interactive object matching the target data based on the target data included in the voice-driven data is possible.
In response to the voice-driven data including audio data, voice recognition is performed on the audio data, and the target data contained in the audio data is determined based on the recognized voice content. When,
It is characterized by including determining a target data included in the text data based on the text content contained in the text data in response to the voice-driven data including the text data. The driving method for an interactive object according to any one of claims 1 to 8.

The voice-driven data includes syllable data, and the voice-driven data includes syllable data.
Acquiring a control parameter sequence of a predetermined operation of an interactive object matching the target data based on the target data included in the voice-driven data is possible.
It is to determine whether or not the syllable data included in the voice-driven data is matched with the target syllable data, and the target syllable data belongs to the pre-divided different syllable types and different syllables. The types correspond to different predetermined mouth shapes, and the corresponding control parameter sequences are set for the different predetermined mouth shapes.
In response to the matching of the syllable data to the target syllable data, a predetermined mouth shape corresponding to the matched target syllable data is based on the syllable type to which the matched target syllable data belongs. The driving method for an interactive object according to any one of claims 1 to 9, wherein the control parameter sequence is acquired and includes.

Acquiring the first data other than the target data in the voice-driven data,
Acquiring the acoustic characteristics of the first data and
Acquiring the shape control parameters that match the acoustic features, and
The driving method for an interactive object according to any one of claims 1 to 10, further comprising controlling the appearance of the interactive object based on the appearance control parameter.

An interactive drive device
The first acquisition unit for acquiring the voice-driven data of the interactive target displayed on the display device, and
A second acquisition unit for acquiring a control parameter sequence of a predetermined operation of an interactive target matching the target data based on the target data included in the voice-driven data, and a second acquisition unit.
A drive device for an interactive object, comprising: a drive unit for controlling the interactive object to perform the predetermined operation based on the obtained control parameter sequence.

An output unit for controlling the display device to output voice based on the voice information corresponding to the voice-driven data and / or displaying text based on the text information corresponding to the voice-driven data. The interactive object drive device according to claim 12, further comprising.

Specifically, the drive unit is
Confirm the voice information corresponding to the target data,
Acquire the time information to output the voice information, and
Based on the time information, the execution time of the predetermined operation corresponding to the target data is determined.
The drive of an interactive object according to claim 12 or 13, wherein the interactive object is controlled to perform the predetermined operation by a control parameter sequence corresponding to the target data based on the execution time. Device.

The control parameter sequence comprises one or more sets of control parameters.
Specifically, when the drive unit controls the interactive object to perform the predetermined operation by the control parameter sequence corresponding to the target data based on the execution time.
Call each set of control parameters in the control parameter sequence at a predetermined rate so that the interactive object exhibits the appearance corresponding to each set of control parameters, or
Based on the execution time, the call rate of the control parameter sequence is determined.
The control parameters of each set in the control parameter sequence are called at the call rate so that the interactive object outputs the state corresponding to the control parameters of each set, or
It is characterized in that the control parameter sequence corresponding to the target data is started to be called at a predetermined time before the voice information corresponding to the target data is output so that the interactive target starts to perform the predetermined operation. 14. The drive device for interactive objects according to claim 14.

The voice-driven data includes a plurality of target data, and includes a plurality of target data.
Specifically, the drive unit is
In response to the detection that adjacent target data in the plurality of target data overlap, the interactive according to the control parameter sequence corresponding to the previously arranged target data based on the word order. Controlling the subject to perform the predetermined action, or
In response to the detection that the execution times of the control parameter sequences corresponding to the adjacent target data in the plurality of target data overlap each other, the control parameter sequence corresponding to the adjacent target data The drive device for an interactive object according to any one of claims 12 to 15, wherein the overlapping portions are fused.

The voice-driven data includes syllable data, and the voice-driven data includes syllable data.
Specifically, the second acquisition unit is
It is determined whether or not the syllable data included in the voice-driven data is matched with the target syllable data, and here, the target syllable data belongs to one kind of pre-divided syllable type and one kind. The syllable type corresponds to one type of predetermined mouth shape, and the corresponding control parameter sequence is set for one type of predetermined mouth shape.
In response to the matching of the syllable data to the target syllable data, a predetermined mouth shape corresponding to the matched target syllable data is based on the syllable type to which the matched target syllable data belongs. The driving device for an interactive object according to any one of claims 12 to 16, wherein a control parameter sequence is acquired.

The device further comprises a shape control unit.
The shape control unit is
Acquire the first data other than the target data in the voice-driven data,
Acquire the acoustic characteristics of the first data,
Acquire the shape control parameters that match the acoustic features,
The driving device for an interactive object according to any one of claims 12 to 17, wherein the appearance of the interactive object is controlled based on the appearance control parameter.

It ’s an electronic device,
Equipped with memory and processor,
The memory stores computer instructions that can be operated on the processor.
An electronic device, wherein the processor performs the method according to any one of claims 1 to 11 when the computer instruction is executed.

A computer-readable recording medium that stores computer programs.
A computer-readable recording medium, wherein the method according to any one of claims 1 to 11 is realized when the program is executed by a processor.