JP2019057116A

JP2019057116A - Lip sink processing program, recording media and lip sink processing method

Info

Publication number: JP2019057116A
Application number: JP2017180863A
Authority: JP
Inventors: 達郎折笠; Tatsuro Origasa
Original assignee: Koei Tecmo Holdings Co Ltd
Current assignee: Koei Tecmo Holdings Co Ltd
Priority date: 2017-09-21
Filing date: 2017-09-21
Publication date: 2019-04-11
Anticipated expiration: 2037-09-21
Also published as: JP7129769B2

Abstract

To relieve a processing load of lip sink processing.SOLUTION: An information processor 3 is functioned as an operation pattern acquisition processing part 13 for acquiring a mouth-like operation pattern of a character corresponding to a predetermined voice sequence, sound volume level acquisition processing part 15 for acquiring a sound volume level of a voice uttered from a character, an application rate determination processing part 17 for determining, as an application rate, a degree of the magnitude of an operation pattern applied to the character on the basis of the sound volume level; and an operation pattern execution processing part 19 for executing an operation pattern with a magnitude based on the application rate according to the vocalization of the character.SELECTED DRAWING: Figure 2

Description

本発明は、リップシンク処理プログラム、当該リップシンク処理プログラムが記録された記録媒体、及びリップシンク処理方法に関する。 The present invention relates to a lip sync processing program, a recording medium on which the lip sync processing program is recorded, and a lip sync processing method.

近年、ゲーム、テレビ番組、映画等の各種映像コンテンツ分野において、コンピュータグラフィクスによるアニメーションが利用されている。このようなアニメーションでは、登場するキャラクタの発声と口唇形状とを同期させるリップシンク技術が用いられる。 In recent years, animation by computer graphics has been used in various video content fields such as games, television programs, and movies. In such an animation, a lip sync technique for synchronizing the utterance of a character to appear and the lip shape is used.

例えば、特許文献１に記載されたゲーム装置においては、音声入力手段によって取り込んだプレーヤの音声を、音声解析手段により周波数分析して可聴周波数帯域の周波数成分ごとの音圧分布特徴から音を判別し、音声同期手段がキャラクタの口唇形状を音に応じて変更する。 For example, in the game device described in Patent Document 1, the sound of the player captured by the sound input means is frequency-analyzed by the sound analysis means, and the sound is discriminated from the sound pressure distribution characteristics for each frequency component in the audible frequency band. The voice synchronization means changes the lip shape of the character according to the sound.

特開２００２−３５１４８９号公報Japanese Patent Laid-Open No. 2002-351489

上記従来技術では、音声を解析するために複雑な演算処理が必要となるため、処理負荷が重いという課題があった。 The above-described prior art has a problem that processing load is heavy because complicated arithmetic processing is required to analyze the voice.

本発明はこのような問題点に鑑みてなされたものであり、処理負荷を軽減することが可能なリップシンク処理プログラム、記録媒体、リップシンク処理方法を提供することを目的とする。 The present invention has been made in view of such problems, and an object thereof is to provide a lip sync processing program, a recording medium, and a lip sync processing method capable of reducing the processing load.

上記目的を達成するために、本発明のリップシンク処理プログラムは、情報処理装置を、所定の音声列に対応したオブジェクトの口形状の動作パターンを取得する動作パターン取得処理部、前記オブジェクトに発声させる音声の音量レベルを取得する音量レベル取得処理部、前記音量レベルに基づいて前記オブジェクトに適用する前記動作パターンの大きさの度合いを適用率として決定する適用率決定処理部、前記オブジェクトの発声に合わせて前記動作パターンを前記適用率に基づいた大きさで実行する動作パターン実行処理部、として機能させる。 In order to achieve the above object, a lip sync processing program according to the present invention causes an information processing apparatus to utter a motion pattern acquisition processing unit that acquires a mouth-shaped motion pattern of an object corresponding to a predetermined audio string, A volume level acquisition processing unit that acquires a volume level of sound, an application rate determination processing unit that determines, as an application rate, a degree of the size of the motion pattern to be applied to the object based on the volume level, and according to the utterance of the object The operation pattern is made to function as an operation pattern execution processing unit that executes the operation pattern with a size based on the application rate.

また、上記目的を達成するために、本発明の記録媒体は、上記リップシンク処理プログラムを記録した、情報処理装置が読み取り可能な記録媒体である。 In order to achieve the above object, a recording medium of the present invention is a recording medium readable by an information processing apparatus on which the lip sync processing program is recorded.

また、上記目的を達成するために、本発明のリップシンク処理方法は、情報処理装置によって実行されるリップシンク処理方法であって、所定の音声列に対応したオブジェクトの口形状の動作パターンを取得するステップと、前記オブジェクトに発声させる音声の音量レベルを取得するステップと、前記音量レベルに基づいて前記オブジェクトに適用する前記動作パターンの大きさの度合いを適用率として決定するステップと、前記オブジェクトの発声に合わせて前記動作パターンを前記適用率に基づいた大きさで実行するステップと、を有する。 In order to achieve the above object, a lip sync processing method according to the present invention is a lip sync processing method executed by an information processing apparatus, and acquires a mouth-shaped motion pattern of an object corresponding to a predetermined audio sequence. A step of obtaining a volume level of sound to be uttered by the object, a step of determining a degree of the size of the motion pattern to be applied to the object based on the volume level as an application rate, Executing the motion pattern in a size based on the application rate in accordance with the utterance.

本発明によれば、リップシンク処理の処理負荷を軽減することができる。 According to the present invention, the processing load of lip sync processing can be reduced.

一実施形態に係るゲームシステムの全体構成の一例を表すシステム構成図である。It is a system configuration figure showing an example of the whole game system composition concerning one embodiment. 情報処理装置の機能的構成の一例を表すブロック図である。It is a block diagram showing an example of a functional structure of information processing apparatus. 音声波形及び適用率の一例を説明するための説明図である。It is explanatory drawing for demonstrating an example of an audio | voice waveform and an application rate. 適用率に基づいたキャラクタの口の開閉動作の大きさの例を説明するための説明図である。It is explanatory drawing for demonstrating the example of the magnitude | size of the opening / closing operation | movement of a character's mouth based on an application rate. 笑った表情と口形状の動作パターンが合成された例を説明するための説明図である。It is explanatory drawing for demonstrating the example with which the expression which laughed and the movement pattern of the mouth shape were synthesize | combined. 泣いた表情と口形状の動作パターンが合成された例を説明するための説明図である。It is explanatory drawing for demonstrating the example with which the expression pattern which cried and the movement pattern of mouth shape were synthesize | combined. 怒った表情と口形状の動作パターンが合成された例を説明するための説明図である。It is explanatory drawing for demonstrating the example with which the angry facial expression and the movement pattern of the mouth shape were synthesize | combined. 情報処理装置が実行する処理手順の一例を表すフローチャートである。It is a flowchart showing an example of the process sequence which information processing apparatus performs. 情報処理装置のハードウェア構成の一例を表すブロック図である。It is a block diagram showing an example of the hardware constitutions of information processing apparatus. 音声入力装置を備えた変形例に係るゲームシステムの全体構成の一例を表すシステム構成図である。It is a system block diagram showing an example of the whole structure of the game system which concerns on the modification provided with the audio | voice input apparatus.

＜１．発明の背景＞
まず、本発明の一実施の形態について説明する前に、発明の背景について説明する。 <1. Background of the Invention>
First, before describing an embodiment of the present invention, the background of the present invention will be described.

近年、ゲーム、シアター、ネット配信、テレビ番組、映画等の各種映像コンテンツ分野において、ＣＧ（コンピュータグラフィクス）によるアニメーションが利用されている。このようなアニメーションでは、登場するキャラクタが発声する音声と口形状とを同期させるリップシンク技術が使用されている。 In recent years, animation by CG (computer graphics) has been used in various video content fields such as games, theaters, online distribution, TV programs, movies and the like. In such an animation, a lip sync technique for synchronizing the voice uttered by the appearing character and the mouth shape is used.

一般にリップシンク技術では、キャラクタが発する音声の一音一音（例えば母音。あるいは母音と子音の組み合わせ）に対応した複数種類の口形状が予め用意されており、一音ごとに口形状を当てはめて変化させる。この場合、音を判別する必要があるため、例えば入力された音声を解析するための複雑な演算処理が必要となり、処理負荷が重くなる。また、アニメーションデータを作成する際には、例えば母音情報を手入力するといった手間が必要となり、アニメーションデータ作成の負荷が重くなる。 In general, in the lip sync technology, a plurality of types of mouth shapes corresponding to one sound (for example, a vowel or a combination of vowels and consonants) produced by a character are prepared in advance, and the mouth shape is applied to each sound. Change. In this case, since it is necessary to discriminate the sound, for example, a complicated calculation process for analyzing the input voice is required, which increases the processing load. Further, when creating animation data, for example, it is necessary to manually input vowel information, which increases the burden of creating animation data.

そこで本願発明者等は、鋭意研究を行った結果、キャラクタが発する音声を判別しなくても、所定の音声列に対応した口形状の動作パターンを用いることで、ある程度自然なリップシンク（キャラクタの発声と口形状とを同期させること）が可能であることに想到した。これにより、音声解析を不要として処理負荷を大幅に軽減できると共に、母音情報等を入力する手間が省けてアニメーションデータ作成の負荷を軽減することが可能となる。以下、この詳細について説明する。 Therefore, as a result of earnest research, the inventors of the present application have used a mouth-shaped motion pattern corresponding to a predetermined voice sequence without discriminating the voice uttered by the character. It was thought that it is possible to synchronize utterance and mouth shape. As a result, it is possible to significantly reduce the processing load by eliminating the need for voice analysis, and to reduce the burden of creating animation data by eliminating the trouble of inputting vowel information and the like. The details will be described below.

＜２．ゲームシステムの全体構成＞
本発明の一実施の形態について図面を参照しつつ説明する。なお、本実施形態では、本発明のリップシンク処理プログラム等をゲームに適用した場合について説明するが、適用対象をゲームに限定するものではない。 <2. Overall configuration of game system>
An embodiment of the present invention will be described with reference to the drawings. In this embodiment, the case where the lip sync processing program of the present invention is applied to a game will be described, but the application target is not limited to a game.

まず、図１を用いて、本実施形態に係るゲームシステム１の全体構成の一例について説明する。図１に示すように、ゲームシステム１は、情報処理装置３と、コントローラ５と、表示装置７を有する。コントローラ５及び表示装置７の各々は、情報処理装置３と通信可能に接続されている。なお、図１には有線により接続された場合を図示しているが、無線により接続されてもよい。 First, an example of the overall configuration of the game system 1 according to the present embodiment will be described with reference to FIG. As shown in FIG. 1, the game system 1 includes an information processing device 3, a controller 5, and a display device 7. Each of the controller 5 and the display device 7 is communicably connected to the information processing device 3. In addition, although FIG. 1 illustrates the case of being connected by wire, it may be connected wirelessly.

情報処理装置３は、例えば据え置き型のゲーム機である。但しこれに限定されるものではなく、例えば入力部や表示部等を一体に備えた携帯型のゲーム機でもよい。また、ゲーム機以外にも、例えば、サーバコンピュータ、デスクトップ型コンピュータ、ノート型コンピュータ、タブレット型コンピュータ等のように、コンピュータとして製造、販売等されているものや、携帯電話、スマートフォン、ファブレット等のように、電話機として製造、販売等されているものでもよい。 The information processing apparatus 3 is a stationary game machine, for example. However, the present invention is not limited to this, and for example, a portable game machine that is integrally provided with an input unit, a display unit, and the like may be used. In addition to game machines, for example, those manufactured and sold as computers such as server computers, desktop computers, notebook computers, tablet computers, mobile phones, smartphones, fablets, etc. As such, it may be manufactured and sold as a telephone.

プレイヤは、コントローラ５を用いて各種の操作入力を行う。図１に示す例では、コントローラ５は例えば十字キー９や複数のボタン１１等を有する。なお、コントローラ５は上記に代えて又は加えて、例えばジョイスティックやタッチパッド等を有してもよい。 The player performs various operation inputs using the controller 5. In the example illustrated in FIG. 1, the controller 5 includes, for example, a cross key 9 and a plurality of buttons 11. The controller 5 may include, for example, a joystick or a touch pad instead of or in addition to the above.

本実施形態では、情報処理装置３により、リップシンク処理プログラムの一例であるゲームプログラムが実行される場合について説明する。 In the present embodiment, a case where a game program which is an example of a lip sync processing program is executed by the information processing apparatus 3 will be described.

＜３．情報処理装置の機能的構成＞
次に、図２及び図３〜図７を用いて、情報処理装置３の機能的構成の一例について説明する。 <3. Functional configuration of information processing apparatus>
Next, an example of a functional configuration of the information processing apparatus 3 will be described with reference to FIGS. 2 and 3 to 7.

図２に示すように、情報処理装置３は、動作パターン取得処理部１３と、音量レベル取得処理部１５と、適用率決定処理部１７と、動作パターン実行処理部１９と、リセット処理部２１と、平滑化処理部２３と、動作合成処理部２５とを有する。 As illustrated in FIG. 2, the information processing apparatus 3 includes an operation pattern acquisition processing unit 13, a volume level acquisition processing unit 15, an application rate determination processing unit 17, an operation pattern execution processing unit 19, and a reset processing unit 21. , A smoothing processing unit 23 and a behavioral synthesis processing unit 25.

動作パターン取得処理部１３は、所定の音声列に対応したキャラクタ（オブジェクトの一例）の口形状の動作パターンを取得する。所定の音声列は、例えばランダムな順番で並べられた複数の母音等で構成される。なお、ランダムでなく予め定められた順番としてもよい。また、母音だけでなく母音と子音の組み合わせを含めてもよい。 The motion pattern acquisition processing unit 13 acquires a mouth-shaped motion pattern of a character (an example of an object) corresponding to a predetermined voice string. The predetermined speech string is composed of, for example, a plurality of vowels arranged in a random order. In addition, it is good also as a predetermined order instead of random. Further, not only vowels but also combinations of vowels and consonants may be included.

動作パターンは、上記音声列をキャラクタが発声する際の口形状の一連の動作であり、後述する適用率によって動作の大きさは変動するものの、キャラクタが発声する音声の内容に応じてパターンが変動されることはない。動作パターンは、口形状が閉じた状態から上記記音声列に対応した動作を開始するように生成されており、キャラクタが発声している間は上記音声列に対応した部分がリピートされる。 The movement pattern is a series of mouth-shaped movements when the character utters the above-mentioned voice sequence, and the pattern varies depending on the content of the voice uttered by the character, although the magnitude of the movement varies depending on the application rate described later. It will never be done. The motion pattern is generated so that the motion corresponding to the voice sequence is started from a state in which the mouth shape is closed, and the portion corresponding to the voice sequence is repeated while the character is speaking.

動作パターンは、キャラクタごとに１つのパターンが設定されている。例えば、キャラクタが性急な性格の場合には早口としたり、大人しい性格の場合には動作を小さめにする等、キャラクタの個性（性格や能力等）に応じて、上記音声列の内容や動作の態様を異なるように設定してもよい。なお、キャラクタごとに複数のパターンを設定してもよい。 As the motion pattern, one pattern is set for each character. For example, depending on the character's personality (personality, ability, etc.) May be set differently. A plurality of patterns may be set for each character.

なお、キャラクタの種類は、発声動作が可能な口を備えたオブジェクトであれば特に限定されるものではないが、例えば人間のキャラクタ、人間以外の動物キャラクタ、人間や動物以外の仮想的な生物キャラクタ、又は生物以外の物体等である。本実施形態では、後述の図４〜図７に示すように、オブジェクトが人間のキャラクタである場合を例にとって説明する。 The type of the character is not particularly limited as long as it is an object having a mouth capable of uttering action. For example, a human character, an animal character other than a human, a virtual biological character other than a human or an animal is used. Or an object other than a living thing. In the present embodiment, a case where the object is a human character will be described as an example as shown in FIGS.

動作パターンは、ゲームプログラムの実行によって情報処理装置３のＲＯＭ１０３やＲＡＭ１０５、記録装置１１７等（後述の図９参照）に記録される。動作パターン取得処理部１３は、それらから動作パターンを読み出して取得する。なお、動作パターンを外部のサーバ等から取得してもよい。 The operation pattern is recorded in the ROM 103, the RAM 105, the recording device 117, and the like (see FIG. 9 described later) of the information processing device 3 by executing the game program. The operation pattern acquisition processing unit 13 reads out and acquires an operation pattern from them. The operation pattern may be acquired from an external server or the like.

音量レベル取得処理部１５は、キャラクタに発声させる音声の音量レベルを取得する。音量レベルの取得は例えば所定の時間間隔で継続的に行われる。音量レベルは、音声の内容（台詞）等と共に音声情報に含まれており、ゲームプログラムの実行によって情報処理装置３のＲＯＭ１０３やＲＡＭ１０５、記録装置１１７等に記録される。音量レベル取得処理部１５は、それらから音量レベルを読み出して取得する。なお、マイク等の音声入力手段からリアルタイムに音声が入力される場合には、音量レベル取得処理部１５は入力された音声の音量に基づいて音量レベルを取得する（後述の図１０参照）。 The volume level acquisition processing unit 15 acquires a volume level of voice to be uttered by the character. The acquisition of the volume level is performed continuously at a predetermined time interval, for example. The volume level is included in the audio information together with the audio content (line) and the like, and is recorded in the ROM 103, the RAM 105, the recording device 117, and the like of the information processing apparatus 3 by executing the game program. The volume level acquisition processing unit 15 reads and acquires the volume level from them. When voice is input in real time from a voice input unit such as a microphone, the volume level acquisition processing unit 15 acquires a volume level based on the volume of the input voice (see FIG. 10 described later).

図３の上段に音声波形の一例を示す。図３において、音声波形の振幅の大きさが音量レベルに相当する。図３に示す例では、時間Ｔ１において発声が開始され（話し始め）、時間Ｔ１からＴ２までの間は音量が比較的大きく、時間Ｔ２からＴ３までの間は音量が比較的小さく、時間Ｔ３において発声が終了する（話し終わり）。 An example of a speech waveform is shown in the upper part of FIG. In FIG. 3, the amplitude of the speech waveform corresponds to the volume level. In the example shown in FIG. 3, utterance is started at time T1 (beginning of speaking), the volume is relatively high from time T1 to T2, the volume is relatively low from time T2 to T3, and at time T3. Speaking ends (end of talk).

図２に戻り、適用率決定処理部１７は、音量レベルに基づいてキャラクタに適用する動作パターンの大きさの度合いを適用率として決定する。適用率は例えば０％〜１００％の数値で決定され、０％の場合にはキャラクタの口形状は閉じた状態となり、数値が大きくなるにつれてキャラクタに反映される動作パターンの動作の大きさが増大され、１００％で最大となる。適用率決定処理部１７は、音量レベルが０（０近傍の場合も含む）である場合には適用率を０％に決定し、音量レベルが大きくなるにつれて適用率を大きくし、音量レベルが予め設定された最大値以上となった場合には適用率を１００％に決定する。 Returning to FIG. 2, the application rate determination processing unit 17 determines the degree of the size of the motion pattern applied to the character as the application rate based on the volume level. The application rate is determined by a numerical value of 0% to 100%, for example. When the numerical value is 0%, the character's mouth shape is closed, and as the numerical value increases, the motion pattern reflected in the character increases in magnitude. At 100%. The application rate determination processing unit 17 determines the application rate to be 0% when the volume level is 0 (including the case near 0), increases the application rate as the volume level increases, and the volume level is set in advance. When the value exceeds the set maximum value, the application rate is determined to be 100%.

なお、図３の上段に示すように、音量レベルは急激に変動する場合がある。このため、例えば音量レベルのそのままの値に応じて適用率を決定する場合、キャラクタの口形状の動作の大きさが急激に変動することとなり、不自然な動作となってしまう。 Note that, as shown in the upper part of FIG. 3, the volume level may fluctuate rapidly. For this reason, for example, when the application rate is determined in accordance with the value of the sound volume level as it is, the size of the character's mouth shape changes rapidly, resulting in an unnatural motion.

そこで本実施形態では、平滑化処理部２３により音量レベルの平滑化を行う。音量レベルの平滑化処理は、音量レベルの取得間隔に合わせて音量レベルが取得される度に継続的に行われる。平滑化の手法は特に限定されるものではないが、例えばある時点で取得した音量レベルと、当該時点の前に取得した１又は複数の音量レベルとの平均値を算出し、当該時点の音量レベルとする、移動平均の手法が考えられる。なお、移動平均以外の方式を採用してもよい。上述の適用率決定処理部１７は、平滑化処理部２３により平滑化された音量レベルに基づいて適用率を決定する。 Therefore, in this embodiment, the smoothing processing unit 23 smoothes the volume level. The volume level smoothing process is continuously performed every time the volume level is acquired in accordance with the volume level acquisition interval. The smoothing method is not particularly limited. For example, an average value of a volume level acquired at a certain time point and one or a plurality of volume levels acquired before that time point is calculated, and the sound volume level at that time point is calculated. A moving average method can be considered. A method other than moving average may be employed. The application rate determination processing unit 17 described above determines the application rate based on the volume level smoothed by the smoothing processing unit 23.

図３に適用率の一例を示す。図３に示す適用率は、上段に示す音声波形が平滑化処理部２３により平滑化され、当該平滑化された音量レベルに基づいて適用率決定処理部１７により決定されたものである。図３に示すように、時間Ｔ１以前では音量レベルが０のため、適用率も０となる。その後、時間Ｔ１において発声が開始されて音量レベルが急激に上昇し始めるが、音量レベルの平滑化により適用率は（急峻ではなく）なだらかに上昇し、時間Ｔ２までの間は比較的大きな値となる。その後、時間Ｔ２において音量が急激に小さくなるが、音量レベルの平滑化により適用率は（急峻ではなく）なだらかに下降し、時間Ｔ３までの間は比較的小さな値となっている。その後、時間Ｔ３において発声が終了して音量レベルが０に下降すると、適用率も０となる。 FIG. 3 shows an example of the application rate. The application rate shown in FIG. 3 is determined by the application rate determination processing unit 17 based on the smoothed sound volume level after the speech waveform shown in the upper part is smoothed by the smoothing processing unit 23. As shown in FIG. 3, since the volume level is 0 before time T1, the application rate is also 0. After that, the utterance is started at time T1, and the volume level starts to increase rapidly, but the application rate increases gently (not sharply) due to the smoothing of the volume level, and becomes relatively large until time T2. Become. After that, the sound volume suddenly decreases at time T2, but the application rate decreases gently (not steeply) due to the smoothing of the sound volume level, and remains relatively small until time T3. Thereafter, when the utterance ends at time T3 and the volume level falls to 0, the application rate also becomes 0.

なお、上記では音量レベルを平滑化し、当該平滑化された音量レベルに応じて適用率を決定するようにしたが、例えば音量レベルを平滑化せずに適用率を決定し、当該決定した適用率について平滑化するようにしてもよい。 In the above, the volume level is smoothed and the application rate is determined according to the smoothed volume level. However, for example, the application rate is determined without smoothing the volume level, and the determined application rate is determined. May be smoothed.

図２に戻り、動作パターン実行処理部１９は、キャラクタの発声に合わせて動作パターンを適用率に基づいた大きさで実行する。これにより、キャラクタが発声する音声の音量レベルのみを入力として、キャラクタの口を開閉する動きの大きさを変化させることができる。 Returning to FIG. 2, the motion pattern execution processing unit 19 executes the motion pattern in a size based on the application rate in accordance with the utterance of the character. Thereby, only the volume level of the voice uttered by the character can be input, and the magnitude of movement for opening and closing the mouth of the character can be changed.

図４に適用率に基づいたキャラクタの口の開閉動作の大きさの例を示す。図４に示すように、キャラクタの口の開閉動作は、適用率が大きい場合（音量レベルが大きい場合）には大きくなり、適用率が小さい場合（音量レベルが小さい場合）には小さくなる。なお、適用率の変動によって口の開閉動作の大きさは変動するものの、リピートされる動作パターンは共通であるため、適用率が変動しても口形状の一連の動き（所定の音声列に対応した動き）そのものは変化しない。 FIG. 4 shows an example of the magnitude of the opening / closing operation of the character's mouth based on the application rate. As shown in FIG. 4, the opening / closing operation of the character's mouth increases when the application rate is large (when the volume level is high), and decreases when the application rate is low (when the volume level is low). Note that although the size of the mouth opening / closing operation varies depending on the variation in the application rate, the repeated motion pattern is common, so even if the application rate varies, a series of mouth-shaped movements (corresponding to a predetermined audio sequence) The movement itself does not change.

なお、動作パターン実行処理部１９により動作パターンはリピートして実行されるが、単純にリピートさせたままで適用率を変動する場合、例えば動作パターンにおいて口形状が閉じ始めたタイミングで適用率が０から上昇し始める（つまりキャラクタが話し始めたタイミングで口形状が閉じ始める）といった不自然な動作が生じる可能性がある。 The motion pattern is repeatedly executed by the motion pattern execution processing unit 19. However, when the application rate varies while being simply repeated, for example, the application rate starts from 0 when the mouth shape starts to close in the motion pattern. There is a possibility that an unnatural motion such as starting to rise (that is, the mouth shape starts to close when the character starts speaking) may occur.

そこで本実施形態では、リセット処理部２１により、音量レベルが０（０近傍の場合も含む）から上昇し始めた際に、動作パターンが最初から実行されるように動作パターンの開始位置をリセットする。前述のように、動作パターンは口形状が閉じた状態から所定の音声列に対応した動作を開始するように生成されているので、このように音量レベルが０から上昇し始めるタイミングで動作パターンの開始位置をリセットすることにより、キャラクタの話し始め（発声の開始）のタイミングに同期させた自然な口形状の動作を表現できる。 Therefore, in the present embodiment, the reset processing unit 21 resets the start position of the operation pattern so that the operation pattern is executed from the beginning when the sound volume level starts to increase from 0 (including the case near 0). . As described above, since the motion pattern is generated so that the motion corresponding to the predetermined voice sequence is started from the state where the mouth shape is closed, the motion pattern of the motion pattern is started at the timing when the volume level starts to increase from 0 in this way. By resetting the start position, it is possible to express a natural mouth-shaped motion synchronized with the timing of the start of speaking of the character (start of utterance).

図２に戻り、動作合成処理部２５は、上記動作パターン実行処理部１９により実行されるキャラクタの口形状の動作パターンと、キャラクタの表情の動作とを合成する。キャラクタの表情は、キャラクタの感情等に応じて別途動作パターンが設定されている。動作合成処理部２５は、この表情の動作パターンと口形状の動作パターンとを合成することで、表情による感情の表現と音声に合わせた口の開閉動作とを組み合わせることができる。 Returning to FIG. 2, the behavioral synthesis processing unit 25 synthesizes the mouth-shaped motion pattern of the character executed by the motion pattern execution processing unit 19 and the facial expression motion of the character. As for the facial expression of the character, a separate motion pattern is set according to the emotion of the character. The behavioral synthesis processing unit 25 can synthesize the emotional expression based on the facial expression and the opening / closing operation of the mouth according to the voice by synthesizing the facial motion pattern and the mouth-shaped motion pattern.

図５〜図７に、表情と口形状の動作パターンが合成された例を示す。図５に示す例では、笑った顔に口形状の動作パターンが合成されている。図６に示す例では、泣き顔に口形状の動作パターンが合成されている。図７に示す例では、怒った顔に口形状の動作パターンが合成されている。これにより、キャラクタの表情による感情表現とリップシンクとを組み合わせることが可能となり、キャラクタのリアリティを向上できる。 FIGS. 5 to 7 show examples in which facial expressions and mouth-shaped motion patterns are combined. In the example shown in FIG. 5, a mouth-shaped motion pattern is synthesized with a laughing face. In the example shown in FIG. 6, a mouth-shaped motion pattern is synthesized with the crying face. In the example shown in FIG. 7, a mouth-shaped motion pattern is synthesized with an angry face. Thereby, it becomes possible to combine the emotion expression by the facial expression of the character and the lip sync, and the reality of the character can be improved.

なお、例えば笑った顔や怒った顔の場合には適用率を大きめに設定し、泣き顔の場合には適用率を小さめに設定する等、組み合わせる表情に応じて適用率を変化させてもよい。 Note that the application rate may be changed according to the facial expression to be combined, such as setting a higher application rate for a laughing face or an angry face and setting a lower application rate for a crying face.

なお、以上説明した各処理部における処理等は、これらの処理の分担の例に限定されるものではなく、例えば、更に少ない数の処理部（例えば１つの処理部）で処理されてもよく、また、更に細分化された処理部により処理されてもよい。また、上述した各処理部の機能は、後述するＣＰＵ１０１（後述の図９参照）が実行するゲームプログラムにより実装されるものであるが、例えばその一部がＡＳＩＣやＦＰＧＡ等の専用集積回路、その他の電気回路等の実際の装置により実装されてもよい。 Note that the processing in each processing unit described above is not limited to the example of sharing of these processing, and may be processed by, for example, a smaller number of processing units (for example, one processing unit), Further, it may be processed by a further subdivided processing unit. The functions of each processing unit described above are implemented by a game program that is executed by a CPU 101 (see FIG. 9 described later), for example, a part of which is a dedicated integrated circuit such as an ASIC or FPGA, or the like. It may be implemented by an actual device such as an electric circuit.

＜４．情報処理装置が実行する処理手順＞
次に、図８を用いて、情報処理装置３のＣＰＵ１０１によって実行される処理手順の一例について説明する。 <4. Processing procedure executed by information processing apparatus>
Next, an example of a processing procedure executed by the CPU 101 of the information processing apparatus 3 will be described with reference to FIG.

ステップＳ５では、情報処理装置３は、動作パターン取得処理部１３により、所定の音声列に対応したキャラクタの口形状の動作パターンを取得する。なお、複数のキャラクタが発声する場合は、キャラクタごとに動作パターンを取得する。 In step S 5, the information processing apparatus 3 uses the motion pattern acquisition processing unit 13 to acquire a mouth-shaped motion pattern of a character corresponding to a predetermined voice string. In addition, when a plurality of characters utter, an action pattern is acquired for each character.

ステップＳ１０では、情報処理装置３は、音量レベル取得処理部１５により、キャラクタに発声させる音声の音量レベルの取得を開始する。この後、音量レベルの取得は所定の時間間隔で継続的に行われる。 In step S 10, the information processing apparatus 3 starts the acquisition of the volume level of the voice to be uttered by the character by the volume level acquisition processing unit 15. Thereafter, the acquisition of the volume level is continuously performed at predetermined time intervals.

ステップＳ１５では、情報処理装置３は、平滑化処理部２３により、上記ステップＳ１０で取得した音量レベルの平滑化を開始する。この後、音量レベルの平滑化処理は、音量レベルの取得間隔に合わせて音量レベルが取得される度に継続的に行われる。 In step S 15, the information processing apparatus 3 starts smoothing the volume level acquired in step S 10 by the smoothing processing unit 23. Thereafter, the sound volume level smoothing process is continuously performed every time the sound volume level is acquired in accordance with the sound volume level acquisition interval.

ステップＳ２０では、情報処理装置３は、リセット処理部２１により、音量レベルが０（０近傍の場合も含む。以下同様）から上昇し始めたか否かを判定する。音量レベルが０のままである場合には（ステップＳ２０：ＮＯ）、ステップＳ２５に移る。 In step S 20, the information processing apparatus 3 determines whether or not the volume level has started to increase from 0 (including the case in the vicinity of 0. The same applies hereinafter) by the reset processing unit 21. If the volume level remains 0 (step S20: NO), the process proceeds to step S25.

ステップＳ２５では、情報処理装置３は、適用率決定処理部１７により、適用率を０に決定する。 In step S 25, the information processing apparatus 3 determines the application rate to be 0 by the application rate determination processing unit 17.

ステップＳ３０では、情報処理装置３は、適用率が０であるか否かを表すフラグＦを、適用率が０であることを表す「０」に設定する。その後、後述のステップＳ５５に移る。 In step S30, the information processing apparatus 3 sets a flag F indicating whether or not the application rate is 0 to “0” indicating that the application rate is 0. Thereafter, the process proceeds to step S55 described later.

一方、上記ステップＳ２０において、音量レベルが０から上昇した場合には（ステップＳ２０：ＹＥＳ）、ステップＳ３５に移る。 On the other hand, if the volume level has increased from 0 in step S20 (step S20: YES), the process proceeds to step S35.

ステップＳ３５では、情報処理装置３は、リセット処理部２１により、動作パターンが最初から実行されるように動作パターンの開始位置をリセットする。 In step S35, the information processing apparatus 3 causes the reset processing unit 21 to reset the start position of the operation pattern so that the operation pattern is executed from the beginning.

ステップＳ４０では、情報処理装置３は、適用率決定処理部１７により、平滑化された音量レベルに応じて適用率を決定する。 In step S 40, the information processing apparatus 3 determines the application rate according to the smoothed volume level by the application rate determination processing unit 17.

ステップＳ４５では、情報処理装置３は、適用率決定処理部１７により、適用率が０に降下したか否かを判定する。これは音量レベルが０（０近傍の場合も含む）に降下したか否かを判定することと同義である。適用率が０に降下した場合には（ステップＳ４５：ＹＥＳ）、上述したステップＳ３０に移り、フラグＦを「０」に設定する。一方、適用率が０となっていない場合には（ステップＳ４５：ＮＯ）、ステップＳ５０に移る。 In step S 45, the information processing apparatus 3 determines whether the application rate has dropped to 0 by the application rate determination processing unit 17. This is synonymous with determining whether or not the sound volume level has dropped to 0 (including the case in the vicinity of 0). When the application rate drops to 0 (step S45: YES), the process proceeds to step S30 described above, and the flag F is set to “0”. On the other hand, when the application rate is not 0 (step S45: NO), the process proceeds to step S50.

ステップＳ５０では、情報処理装置３は、適用率が０であるか否かを表すフラグＦを、適用率が０でないことを表す「１」に設定する。 In step S50, the information processing apparatus 3 sets a flag F indicating whether or not the application rate is 0 to “1” indicating that the application rate is not 0.

ステップＳ５５では、情報処理装置３は、動作パターン実行処理部１９により、キャラクタの発声に合わせて動作パターンを上記ステップＳ２５又は上記ステップＳ４０で決定した適用率に応じた大きさで実行する。 In step S55, the information processing apparatus 3 causes the motion pattern execution processing unit 19 to execute the motion pattern in accordance with the application rate determined in step S25 or step S40 according to the voice of the character.

ステップＳ６０では、情報処理装置３は、動作合成処理部２５により、キャラクタの表情の動作と、上記ステップＳ５５で実行されるキャラクタの口形状の動作パターンとを合成する。 In step S60, the information processing apparatus 3 uses the behavioral synthesis processing unit 25 to synthesize the facial expression motion of the character and the character mouth shape motion pattern executed in step S55.

ステップＳ６５では、情報処理装置３は、上述したフラグＦが、適用率が０であることを表す「０」であるか否かを判定する。フラグＦが「０」でない場合には（ステップＳ６５：ＮＯ）、先のステップＳ４０に戻り、フラグＦが「０」になるまでの間（適用率が０になるまでの間）ステップＳ４０〜ステップＳ６５を繰り返す。一方、フラグＦが「０」である場合には（ステップＳ６５：ＹＥＳ）、ステップＳ７０に移る。 In step S65, the information processing apparatus 3 determines whether or not the flag F described above is “0” indicating that the application rate is 0. If the flag F is not “0” (step S65: NO), the process returns to the previous step S40, and until the flag F becomes “0” (until the application rate becomes 0), steps S40 to S40. Repeat S65. On the other hand, when the flag F is “0” (step S65: YES), the process proceeds to step S70.

ステップＳ７０では、情報処理装置３は、例えばゲームにおけるムービーの終了やゲームの実行の終了等により、リップシンク処理を終了するか否かを判定する。リップシンク処理を終了しない場合には（ステップＳ７０：ＮＯ）、先のステップＳ２０に戻り、音量レベルが０から上昇するまでの間ステップＳ２０〜ステップＳ３０、ステップＳ５５〜ステップＳ７０を繰り返す。リップシンク処理を終了する場合には（ステップＳ７０：ＹＥＳ）、本フローを終了する。 In step S 70, the information processing apparatus 3 determines whether or not to end the lip sync process, for example, when the movie in the game ends or when the game execution ends. If the lip sync process is not terminated (step S70: NO), the process returns to the previous step S20, and steps S20 to S30 and steps S55 to S70 are repeated until the volume level increases from 0. When the lip sync process is to be ended (step S70: YES), this flow is ended.

なお、上述した処理手順は一例であって、上記手順の少なくとも一部を削除又は変更してもよいし、上記以外の手順を追加してもよい。また、上記手順の少なくとも一部の順番を変更してもよいし、複数の手順が単一の手順にまとめられてもよい。 In addition, the process procedure mentioned above is an example, Comprising: At least one part of the said procedure may be deleted or changed, and procedures other than the above may be added. In addition, the order of at least a part of the above procedure may be changed, and a plurality of procedures may be combined into a single procedure.

＜５．情報処理装置のハードウェア構成＞
次に、図９を用いて、上記で説明したＣＰＵ１０１等が実行するプログラムにより実装された各処理部を実現する情報処理装置３のハードウェア構成の一例について説明する。 <5. Hardware configuration of information processing apparatus>
Next, an example of the hardware configuration of the information processing apparatus 3 that implements each processing unit implemented by the program executed by the CPU 101 described above will be described with reference to FIG.

図９に示すように、情報処理装置３は、例えば、ＣＰＵ１０１と、ＲＯＭ１０３と、ＲＡＭ１０５と、ＧＰＵ１０６と、例えばＡＳＩＣ又はＦＰＧＡ等の特定の用途向けに構築された専用集積回路１０７と、入力装置１１３と、出力装置１１５と、記録装置１１７と、ドライブ１１９と、接続ポート１２１と、通信装置１２３を有する。これらの構成は、バス１０９や入出力インターフェース１１１等を介し相互に信号を伝達可能に接続されている。 As shown in FIG. 9, the information processing apparatus 3 includes, for example, a CPU 101, a ROM 103, a RAM 105, a GPU 106, a dedicated integrated circuit 107 constructed for a specific application such as an ASIC or FPGA, and an input device 113. An output device 115, a recording device 117, a drive 119, a connection port 121, and a communication device 123. These components are connected to each other through a bus 109, an input / output interface 111, and the like so that signals can be transmitted to each other.

ゲームプログラムは、例えば、ＲＯＭ１０３やＲＡＭ１０５、記録装置１１７等に記録しておくことができる。 For example, the game program can be recorded in the ROM 103, the RAM 105, the recording device 117, or the like.

また、ゲームプログラムは、例えば、フレキシブルディスクなどの磁気ディスク、各種のＣＤ、ＭＯディスク、ＤＶＤ等の光ディスク、半導体メモリ等のリムーバブルな記録媒体１２５に、一時的又は永続的（非一時的）に記録しておくこともできる。このような記録媒体１２５は、いわゆるパッケージソフトウエアとして提供することもできる。この場合、これらの記録媒体１２５に記録されたゲームプログラムは、ドライブ１１９により読み出されて、入出力インターフェース１１１やバス１０９等を介し上記記録装置１１７に記録されてもよい。 In addition, the game program is temporarily or permanently (non-temporarily) recorded on a removable recording medium 125 such as a magnetic disk such as a flexible disk, various CDs, MO disks, and optical disks such as a DVD, and semiconductor memory. You can also keep it. Such a recording medium 125 can also be provided as so-called package software. In this case, the game program recorded in these recording media 125 may be read by the drive 119 and recorded in the recording device 117 via the input / output interface 111, the bus 109, or the like.

また、ゲームプログラムは、例えば、ダウンロードサイト、他のコンピュータ、他の記録装置等（図示せず）に記録しておくこともできる。この場合、ゲームプログラムは、ＬＡＮやインターネット等のネットワークＮＷを介し転送され、通信装置１２３がこのプログラムを受信する。そして、通信装置１２３が受信したプログラムは、入出力インターフェース１１１やバス１０９等を介し上記記録装置１１７に記録されてもよい。 In addition, the game program can be recorded on, for example, a download site, another computer, another recording device, or the like (not shown). In this case, the game program is transferred via a network NW such as a LAN or the Internet, and the communication device 123 receives this program. The program received by the communication device 123 may be recorded in the recording device 117 via the input / output interface 111, the bus 109, or the like.

また、ゲームプログラムは、例えば、適宜の外部接続機器１２７に記録しておくこともできる。この場合、ゲームプログラムは、適宜の接続ポート１２１を介し転送され、入出力インターフェース１１１やバス１０９等を介し上記記録装置１１７に記録されてもよい。 Further, the game program can be recorded in an appropriate external connection device 127, for example. In this case, the game program may be transferred via an appropriate connection port 121 and recorded in the recording device 117 via the input / output interface 111, the bus 109, or the like.

そして、ＣＰＵ１０１が、上記記録装置１１７に記録されたプログラムに従い各種の処理を実行することにより、前述の動作パターン取得処理部１３や音量レベル取得処理部１５等による処理が実現される。この際、ＣＰＵ１０１は、例えば、上記記録装置１１７からプログラムを、直接読み出して実行してもよく、ＲＡＭ１０５に一旦ロードした上で実行してもよい。更にＣＰＵ１０１は、例えば、プログラムを通信装置１２３やドライブ１１９、接続ポート１２１を介し受信する場合、受信したプログラムを記録装置１１７に記録せずに直接実行してもよい。 Then, the CPU 101 executes various processes according to the program recorded in the recording device 117, thereby realizing the processes by the operation pattern acquisition processing unit 13, the volume level acquisition processing unit 15, and the like. At this time, for example, the CPU 101 may directly read and execute the program from the recording device 117 or may be executed after being once loaded into the RAM 105. Further, for example, when receiving the program via the communication device 123, the drive 119, and the connection port 121, the CPU 101 may directly execute the received program without recording it in the recording device 117.

また、ＣＰＵ１０１は、必要に応じて、前述のコントローラ５を含む、例えばマウス、キーボード、マイク等（図示せず）の入力装置１１３から入力する信号や情報に基づいて各種の処理を行ってもよい。 Further, the CPU 101 may perform various processes based on signals and information input from the input device 113 including the controller 5 described above, such as a mouse, a keyboard, and a microphone (not shown), as necessary. .

ＧＰＵ１０６は、ＣＰＵ１０１からの指示に応じて例えばレンダリング処理などの画像表示のための処理を行う。 The GPU 106 performs processing for image display such as rendering processing in accordance with an instruction from the CPU 101.

そして、ＣＰＵ１０１及びＧＰＵ１０６は、上記の処理を実行した結果を、例えば前述の表示装置７や音声出力部を含む、出力装置１１５から出力する。さらにＣＰＵ１０１及びＧＰＵ１０６は、必要に応じてこの処理結果を通信装置１２３や接続ポート１２１を介し送信してもよく、上記記録装置１１７や記録媒体１２５に記録させてもよい。 Then, the CPU 101 and the GPU 106 output the result of executing the above processing from the output device 115 including the display device 7 and the audio output unit described above, for example. Further, the CPU 101 and the GPU 106 may transmit the processing result via the communication device 123 or the connection port 121 as necessary, or may record the processing result in the recording device 117 or the recording medium 125.

＜６．実施形態の効果＞
本実施形態のゲームプログラムは、情報処理装置３を、所定の音声列に対応したキャラクタの口形状の動作パターンを取得する動作パターン取得処理部１３、キャラクタに発声させる音声の音量レベルを取得する音量レベル取得処理部１５、音量レベルに基づいてキャラクタに適用する動作パターンの大きさの度合いを適用率として決定する適用率決定処理部１７、キャラクタの発声に合わせて動作パターンを適用率に基づいた大きさで実行する動作パターン実行処理部１９、として機能させる。 <6. Effects of the embodiment>
The game program according to the present embodiment causes the information processing device 3 to acquire a motion pattern acquisition processing unit 13 that acquires a mouth-shaped motion pattern of a character corresponding to a predetermined sound sequence, and a volume that acquires a sound volume level to be uttered by the character. Level acquisition processing unit 15, application rate determination processing unit 17 that determines the degree of the size of the motion pattern applied to the character based on the volume level as an application rate, and the size based on the application rate of the motion pattern according to the character's utterance The function is executed as an operation pattern execution processing unit 19 to be executed.

このように、本実施形態においては、所定の音声列に対応したキャラクタの口形状の動作パターンを予め用意しておき、キャラクタに発声させる音声の音量レベルに応じて動作の大きさを変化させつつ動作パターンを実行する。これにより、母音情報等が不要となるため、例えば音声を解析するための複雑な演算処理等が不要となり、情報処理装置３の処理負荷を大幅に軽減できる。その結果、リアルタイムな音声入力に適用し易くなるので、例えばチャット、シアター、オンライン配信等への応用も可能となる。また、ＣＧアニメーションデータを作成する場合においても、例えば母音情報を手入力する手間が省けるので、アニメーションデータ作成の負荷を軽減できる。 As described above, in the present embodiment, a mouth-shaped motion pattern of a character corresponding to a predetermined voice sequence is prepared in advance, and the magnitude of the motion is changed according to the volume level of the voice uttered by the character. Execute the operation pattern. This eliminates the need for vowel information and the like, for example, eliminates the need for complicated arithmetic processing for analyzing speech, and the processing load on the information processing apparatus 3 can be greatly reduced. As a result, since it becomes easy to apply to real-time voice input, it can be applied to, for example, chat, theater, online distribution, and the like. Also, in creating CG animation data, for example, the labor of manually inputting vowel information can be saved, so that the burden of creating animation data can be reduced.

また、本実施形態では特に、動作パターンは、キャラクタの口形状が閉じた状態から音声列に対応した動作を開始するように生成されており、ゲームプログラムは、情報処理装置３を、音量レベルが０又は０近傍から上昇し始めた際に、動作パターンが最初から実行されるように動作パターンの開始位置をリセットするリセット処理部２１、としてさらに機能させる。 In the present embodiment, in particular, the motion pattern is generated so as to start the motion corresponding to the voice sequence from the state where the mouth shape of the character is closed, and the game program causes the information processing device 3 to have a volume level. When it starts to rise from 0 or near 0, it further functions as a reset processing unit 21 that resets the start position of the operation pattern so that the operation pattern is executed from the beginning.

これにより、例えばキャラクタが話し始めたタイミングで口形状が閉じ始める、といった不自然な動作を回避できる。したがって、キャラクタの話し始め（発声の開始）のタイミングに同期させた自然な口形状の動作を表現できる。 Thereby, for example, it is possible to avoid an unnatural motion such that the mouth shape starts to close at the timing when the character starts to speak. Therefore, it is possible to express a natural mouth-shaped motion synchronized with the timing of the character's start of speaking (start of utterance).

また、本実施形態では特に、適用率決定処理部１７は、音量レベルが０又は０近傍に下降した際に適用率を０に決定する。 In the present embodiment, in particular, the application rate determination processing unit 17 determines the application rate to 0 when the volume level drops to 0 or close to 0.

これにより、キャラクタが話し終わるタイミングで口形状を閉じた状態とすることができる。したがって、キャラクタの話し終わり（発声の終了）のタイミングに同期させた自然な口形状の動作を表現できる。 Thus, the mouth shape can be closed at the timing when the character finishes speaking. Therefore, it is possible to express a natural mouth-shaped movement synchronized with the timing of the character's talking end (speech end).

また、本実施形態では特に、ゲームプログラムは、情報処理装置３を、音量レベル取得処理部１５により取得された音量レベルの平滑化を行う平滑化処理部２３、としてさらに機能させ、適用率決定処理部１７は、平滑化処理部２３により平滑化された音量レベルに基づいて適用率を決定する。 In the present embodiment, in particular, the game program causes the information processing apparatus 3 to further function as the smoothing processing unit 23 that smoothes the volume level acquired by the volume level acquisition processing unit 15, thereby applying the application rate determination process. The unit 17 determines the application rate based on the volume level smoothed by the smoothing processing unit 23.

例えば音量レベルのそのままの値に応じて適用率を決定する場合、音量レベルが急激に変化した場合にキャラクタの口形状の動作の大きさが急激に変動することとなり、不自然な動作となってしまう。 For example, when the application rate is determined according to the value of the volume level as it is, when the volume level changes abruptly, the size of the character's mouth shape changes abruptly, resulting in an unnatural action. End up.

本実施形態によれば、上記のように音量レベルを平滑化させた上で適用率を決定するので、音量レベルの急激な変化による不自然な動作を抑制でき、ユーザの違和感を低減できる。 According to this embodiment, since the application rate is determined after smoothing the volume level as described above, an unnatural operation due to a sudden change in the volume level can be suppressed, and the user's uncomfortable feeling can be reduced.

また、本実施形態では特に、ゲームプログラムは、情報処理装置３を、動作パターン実行処理部１９により実行されるキャラクタの口形状の動作パターンと、キャラクタの表情の動作とを合成する動作合成処理部２５、としてさらに機能させる。 Particularly in the present embodiment, the game program causes the information processing apparatus 3 to synthesize a character mouth shape motion pattern executed by the motion pattern execution processing unit 19 and a character facial expression motion. 25, further function.

これにより、キャラクタの例えば笑顔、泣き顔、怒った顔等の表情による感情表現とリップシンクとを組み合わせることが可能となり、キャラクタのリアリティを向上できる。 As a result, it is possible to combine emotional expressions such as smiles, crying faces, angry faces, etc. of the character with lip sync, and improve the reality of the character.

＜７．変形例等＞
なお、本発明は、上記の実施形態に限られるものではなく、その趣旨及び技術的思想を逸脱しない範囲内で種々の変形が可能である。 <7. Modified example>
The present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the spirit and technical idea of the present invention.

上述したように、本発明は情報処理装置３の処理負荷を大幅に軽減できるので、リアルタイムな音声入力への適用が好適である。例えば、図１０に示すように、情報処理装置３にマイク等の音声入力装置２７を接続し、プレイヤの音声をリアルタイムに入力してゲーム等に登場するキャラクタに発声させてもよい。この場合、音量レベル取得処理部１５は、音声入力装置２７から入力されたプレイヤの音声の音量に基づいて音量レベルを取得する。その他の処理は上記実施形態と同様である。 As described above, the present invention can significantly reduce the processing load of the information processing apparatus 3, and therefore is suitable for real-time voice input. For example, as shown in FIG. 10, a voice input device 27 such as a microphone may be connected to the information processing device 3, and the voice of the player may be input in real time to utter a character appearing in a game or the like. In this case, the volume level acquisition processing unit 15 acquires the volume level based on the volume of the player's voice input from the voice input device 27. Other processes are the same as in the above embodiment.

また以上では、本発明のリップシンク処理プログラム等をゲームに適用した場合を一例として説明したが、本発明の適用対象はゲームに限定されるものではない。例えば、テレビ番組や映画等の各種映像コンテンツに適用してもよいし、チャット、３Ｄシアター、オンライン配信等に適用することも可能である。 In the above description, the case where the lip sync processing program of the present invention is applied to a game has been described as an example. However, the application target of the present invention is not limited to a game. For example, the present invention can be applied to various video contents such as a TV program and a movie, and can also be applied to chat, 3D theater, online distribution, and the like.

また、以上既に述べた以外にも、上記実施形態や各変形例による手法を適宜組み合わせて利用しても良い。その他、一々例示はしないが、上記実施形態や各変形例は、その趣旨を逸脱しない範囲内において、種々の変更が加えられて実施されるものである。 In addition to those already described above, the methods according to the above-described embodiments and modifications may be used in appropriate combination. In addition, although not illustrated one by one, the above-mentioned embodiment and each modification are implemented with various modifications within a range not departing from the gist thereof.

３情報処理装置
１３動作パターン取得処理部
１５音量レベル取得処理部
１７適用率決定処理部
１９動作パターン実行処理部
２１リセット処理部
２３平滑化処理部
２５動作合成処理部
１２５記録媒体 DESCRIPTION OF SYMBOLS 3 Information processing apparatus 13 Action pattern acquisition process part 15 Volume level acquisition process part 17 Application rate determination process part 19 Action pattern execution process part 21 Reset process part 23 Smoothing process part 25 Action synthesis process part 125 Recording medium

Claims

Information processing device
An action pattern acquisition processing unit for acquiring an action pattern of the mouth shape of the object corresponding to a predetermined voice sequence;
A volume level acquisition processing unit for acquiring a volume level of sound to be uttered by the object;
An application rate determination processing unit that determines, as an application rate, a degree of the size of the motion pattern to be applied to the object based on the volume level;
An action pattern execution processing unit for executing the action pattern in a size based on the application rate in accordance with the utterance of the object;
Lip sync processing program to function as

The operation pattern is
It is generated so as to start an operation corresponding to the voice sequence from a state in which the mouth shape of the object is closed,
The information processing apparatus;
A reset processing unit that resets the start position of the operation pattern so that the operation pattern is executed from the beginning when the volume level starts to increase from 0 or near 0;
To further function as,
The lip sync processing program according to claim 1.

The application rate determination processing unit
When the volume level drops to 0 or near 0, the application rate is determined to be 0;
The lip sync processing program according to claim 1 or 2.

The information processing apparatus;
A smoothing processing unit for smoothing the volume level acquired by the volume level acquisition processing unit;
Further function as
The application rate determination processing unit
Determining the application rate based on the volume level smoothed by the smoothing processing unit;
The lip sync processing program according to any one of claims 1 to 3.

The information processing apparatus;
A behavioral synthesis processing unit that synthesizes the motion pattern of the mouth shape of the object executed by the motion pattern execution processing unit and the motion of the facial expression of the object;
To further function as,
The lip sync processing program according to any one of claims 1 to 4.

6. A recording medium readable by an information processing apparatus, on which the lip sync processing program according to claim 1 is recorded.

A lip sync processing method executed by an information processing apparatus,
Obtaining a movement pattern of the mouth shape of the object corresponding to a predetermined voice sequence;
Obtaining a volume level of sound to be uttered by the object;
Determining, as an application rate, a degree of the size of the motion pattern to be applied to the object based on the volume level;
Executing the motion pattern in a size based on the application rate according to the utterance of the object;
A lip sync processing method.