JP5966971B2

JP5966971B2 - Information processing apparatus and program

Info

Publication number: JP5966971B2
Application number: JP2013039624A
Authority: JP
Inventors: 敏行幡田
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2013-02-28
Filing date: 2013-02-28
Publication date: 2016-08-10
Anticipated expiration: 2033-02-28
Also published as: JP2014166240A

Description

本発明は、ユーザが楽曲を聴きながら行う運動を支援するための運動コンテンツを出力する技術分野に関する。 The present invention relates to a technical field of outputting exercise content for supporting exercise performed by a user while listening to music.

例えば、特許文献１には、楽曲と、楽曲に合わせて行われる運動動作を表す運動映像とを出力する際に、ナレーションなどの補助音声を出力する技術が開示されている。 For example, Patent Document 1 discloses a technique for outputting auxiliary sound such as narration when outputting music and a motion video representing a motion performed in accordance with the music.

特開２０１０−１７８７７２号公報JP 2010-178772 A

運動を行うユーザをサポートするナレーションなどのサポート音声を出力するとき、運動映像の表示タイミングとサポート音声の出力タイミングがずれる場合がある。このとき、タイミングのずれを解消するために、サポート音声の出力タイミングを補正すると、ユーザの運動動作のリズムを乱してしまう場合がある。例えば、拍に合わせて、「１、２、３、４」とカウントするサポート音声が出力されるとする。ここで、タイミングのずれを解消するため、例えば、「３」の音声部分の出力タイミングを早めたとする。すると、「２」の音声部分の出力タイミングと「３」の音声部分の出力タイミングとの時間間隔が、テンポに応じた拍の時間間隔よりも短くなってしまう。そのため、ユーザが、リズムに合わせて運動動作を行うことができない。 When outputting a support voice such as a narration for supporting a user who performs exercise, the display timing of the motion video may be different from the output timing of the support voice. At this time, if the support voice output timing is corrected in order to eliminate the timing shift, the rhythm of the user's movement may be disturbed. For example, it is assumed that a support voice that counts “1, 2, 3, 4” in accordance with the beat is output. Here, in order to eliminate the timing shift, for example, it is assumed that the output timing of the audio portion “3” is advanced. Then, the time interval between the output timing of the audio part “2” and the output timing of the audio part “3” becomes shorter than the time interval of beats corresponding to the tempo. Therefore, the user cannot perform an exercise operation in accordance with the rhythm.

本発明は、以上の点に鑑みてなされたものであり、運動を行うユーザをサポートする音声が出力されるリズムが乱れないように、音声の出力タイミングを補正することを可能とする情報処理装置及びプログラムを提供することを課題とする。 The present invention has been made in view of the above points, and is an information processing apparatus capable of correcting the output timing of sound so as not to disturb the rhythm of outputting sound that supports a user who performs exercise. And providing a program.

上記課題を解決するために、請求項１に記載の発明は、楽曲の楽譜を示す楽譜情報と、運動動作を示す動作情報と、運動動作を行うユーザをサポートする音声を示す音声情報とを記憶する記憶手段と、指定されたテンポと前記記憶手段に記憶された前記楽譜情報とに従って生成される同期信号と、前記記憶手段に記憶された前記動作情報とに基づいて、前記運動動作が行われる映像を示す映像情報を表示手段に表示させる第１制御手段と、前記同期信号と、前記記憶手段に記憶された前記楽譜情報とに基づいて、前記楽曲を示す楽曲情報を出力手段により出力させる第２制御手段と、前記同期信号に基づいて、前記記憶手段に記憶された前記音声情報を前記出力手段により出力させる第３制御手段と、前記表示手段に前記映像情報が表示されるタイミングと、前記出力手段により前記音声情報が出力されるタイミングとに所定時間を超えるずれがあるかを判定する第１判定手段と、前記第１判定手段により前記ずれがあると判定されたとき、前記音声情報のうち前記出力手段により出力されている出力部分が、拍に合わせて出力される音声部分であるかを判定する第２判定手段と、前記第２判定手段により前記出力部分が拍に合わせて出力される音声部分ではないと判定されたとき、前記音声情報の出力タイミングを補正する補正手段と、を備えることを特徴とする。 In order to solve the above-mentioned problem, the invention according to claim 1 stores musical score information indicating a musical score of music, operational information indicating an exercise operation, and audio information indicating a voice supporting a user who performs the exercise operation. Based on the storage means, the specified tempo and the synchronization information generated in accordance with the musical score information stored in the storage means, and the motion information stored in the storage means. Based on the first control means for displaying video information indicating video on the display means, the synchronization signal, and the score information stored in the storage means, the output means outputs music information indicating the music. 2 control means, third control means for outputting the audio information stored in the storage means by the output means based on the synchronization signal, and the video information is displayed on the display means. When it is determined by the first determination means that there is a deviation exceeding a predetermined time between the timing and the timing at which the audio information is output by the output means; Second determination means for determining whether the output part of the audio information output by the output means is an audio part output in time with a beat, and the output part is converted to a beat by the second determination means. Correction means for correcting the output timing of the audio information when it is determined that the audio portion is not output together.

請求項２に記載の発明は、前記記憶手段に記憶された前記音声情報が、拍に合わせて出力される音声部分を含むかを判定する第３判定手段を更に備え、前記補正手段は、前記第３判定手段により前記音声情報が拍に合わせて出力される音声部分を含まないと判定された場合、前記音声情報の出力タイミングを補正せず、前記第３判定手段により前記音声情報が拍に合わせて出力される音声部分を含むと判定された場合にのみ、前記第２判定手段は、前記音声情報のうち前記出力手段により出力されている出力部分が、拍に合わせて出力される音声部分であるかを判定することを特徴とする。 The invention according to claim 2 further includes third determination means for determining whether or not the sound information stored in the storage means includes a sound portion output in time with a beat, and the correction means includes If it is determined by the third determining means that the audio information does not include an audio portion that is output in time with the beat, the output timing of the audio information is not corrected, and the audio information is added to the beat by the third determining means. Only when it is determined that an audio portion that is output together is included, the second determining means outputs the audio portion that is output by the output means in the audio information in accordance with the beat. It is characterized by determining whether it is.

請求項３に記載の発明は、前記第２判定手段により前記出力部分が拍に合わせて出力される音声部分であると判定されたとき、前記表示手段に前記映像情報が表示されるタイミングと、前記出力手段により前記音声情報が出力されるタイミングとに、前記所定時間よりも長い第２所定時間を超えるずれがあるかを判定する第４判定手段を更に備え、前記補正手段は、前記第４判定手段により前記ずれがあると判定されたとき、前記音声情報の出力タイミングを補正することを特徴とする。 The invention according to claim 3 is a timing at which the video information is displayed on the display unit when the second determination unit determines that the output part is an audio part that is output in time with a beat. And a fourth determination unit that determines whether there is a deviation exceeding a second predetermined time longer than the predetermined time with respect to a timing at which the audio information is output by the output unit. The output timing of the audio information is corrected when the determination unit determines that there is the deviation.

請求項４に記載の発明は、前記第２判定手段により拍に合わせて出力される音声部分であると判定される前記出力部分が、前記楽譜の何れかの小節の最初の拍の間に出力される音声部分であるかを判定する第５判定手段を更に備え、前記補正手段は、前記第５判定手段により前記出力部分が前記小節の最初の拍の間に出力される音声部分であると判定されたとき、前記表示手段に前記映像情報が表示されるタイミングと、前記出力手段により前記音声情報が出力されるタイミングとに、前記第２所定時間を超えるずれがあるか否かに関わらず、前記音声情報の出力タイミングを補正しないことを特徴とする。 According to a fourth aspect of the present invention, the output portion determined to be an audio portion output in time with a beat by the second determination means is output during the first beat of any measure of the score. A fifth determining means for determining whether the output portion is a sound portion to be played, wherein the correcting means is the sound portion output by the fifth determining means during the first beat of the measure. When determined, the timing at which the video information is displayed on the display means and the timing at which the audio information is output by the output means, regardless of whether or not there is a deviation exceeding the second predetermined time. , characterized in that it does not correct the output timing of the audio information.

請求項５に記載の発明は、前記記憶手段は、前記サポートする音声の時系列に沿った音量を示す前記音声情報を記憶し、前記記憶手段に記憶された前記音声情報が示す前記時系列において前記音量が極大になる時間的な出力位置を特定する第１特定手段と、前記第１特定手段により特定された前記出力位置のうち、前記指定されたテンポに応じた時間間隔で前記音量が極大になる複数の前記出力位置を特定する第２特定手段と、前記第２特定手段により特定された前記複数の出力位置に対応する時間的な範囲を示す範囲情報を記憶する第２記憶手段と、前記出力手段により出力されている前記出力部分の時間的な出力位置を取得する取得手段と、を更に備え、前記第２判定手段は、前記取得手段により取得された前記出力位置が、前記第２記憶手段に記憶された前記範囲情報が示す範囲にある場合、前記出力部分が拍に合わせて出力される音声部分であると判定することを特徴とする。 According to a fifth aspect of the present invention, the storage means stores the audio information indicating the volume along the time series of the supported voice, and the time information indicated by the audio information stored in the storage means Among the output positions specified by the first specifying means for specifying a temporal output position at which the volume is maximized and the output position specified by the first specifying means, the volume is maximized at a time interval corresponding to the designated tempo. Second storage means for specifying the plurality of output positions, and second storage means for storing range information indicating temporal ranges corresponding to the plurality of output positions specified by the second specification means, Acquisition means for acquiring a temporal output position of the output portion output by the output means, wherein the second determination means is configured such that the output position acquired by the acquisition means is the second output position. Record If the range indicated by the range information stored in means, and judging the said output portion is a voice section which is output in accordance with the beat.

請求項６に記載の発明は、前記記憶手段は、前記サポートする音声の時系列に沿った音量を示す前記音声情報を記憶し、前記補正手段は、拍に合わせて出力される音声部分とは異なる音声部分のうち、前記音量が所定値未満である音声部分が出力されるとき、優先的に前記音声情報の出力タイミングを補正することを特徴とする。 According to a sixth aspect of the present invention, the storage unit stores the audio information indicating the volume along the time series of the supported audio, and the correction unit is an audio part output in time with a beat Among the different audio parts, when an audio part whose volume is less than a predetermined value is output, the output timing of the audio information is corrected preferentially.

請求項７に記載の発明は、指定されたテンポと、楽曲の楽譜を示す楽譜情報、運動動作を示す動作情報及び運動動作を行うユーザをサポートする音声を示す音声情報を記憶する記憶手段に記憶された前記楽譜情報とに従って生成される同期信号と、前記記憶手段に記憶された前記動作情報とに基づいて、前記運動動作が行われる映像を示す映像情報を表示手段に表示させる第１制御ステップと、前記同期信号と、前記記憶手段に記憶された前記楽譜情報とに基づいて、前記楽曲を示す楽曲情報を出力手段により出力させる第２制御ステップと、前記同期信号に基づいて、前記記憶手段に記憶された前記音声情報を前記出力手段により出力させる第３制御ステップと、前記表示手段に前記映像情報が表示されるタイミングと、前記出力手段により前記音声情報が出力されるタイミングとに所定時間を超えるずれがあるかを判定する第１判定ステップと、前記第１判定ステップにより前記ずれがあると判定されたとき、前記音声情報のうち前記出力手段により出力されている出力部分が、拍に合わせて出力される音声部分であるかを判定する第２判定ステップと、前記第２判定ステップにより前記出力部分が拍に合わせて出力される音声部分ではないと判定されたとき、前記音声情報の出力タイミングを補正する補正ステップと、をコンピュータに実行させることを特徴とする。 According to the seventh aspect of the present invention, the storage means stores the specified tempo, the musical score information indicating the musical score of the music, the operational information indicating the exercise operation, and the audio information indicating the voice supporting the user performing the exercise operation. A first control step of causing the display means to display video information indicating a video on which the exercise motion is performed based on the synchronization signal generated according to the musical score information and the motion information stored in the storage means A second control step for outputting music information indicating the music by an output means based on the synchronization signal and the musical score information stored in the storage means, and the storage means based on the synchronization signal. A third control step for outputting the audio information stored in the output means by the output means; a timing at which the video information is displayed on the display means; and the output means. A first determination step for determining whether or not there is a deviation exceeding a predetermined time from a timing at which the audio information is output; and when the first determination step determines that there is a deviation, the output of the audio information A second determination step for determining whether the output portion output by the means is an audio portion output in time with a beat, and an audio portion in which the output portion is output in time with a beat in the second determination step When it is determined that it is not, the computer is caused to execute a correction step of correcting the output timing of the audio information.

請求項１又は７に記載の発明によれば、拍に合わせて出力されるべき音声部分とは異なる音声部分が出力されているとき、音声情報の出力タイミングが補正される。そのため、運動を行うユーザをサポートする音声が出力されるリズムが乱れないように、音声の出力タイミングを補正することができる。 According to the first or seventh aspect of the invention, when a voice part different from the voice part to be output in accordance with the beat is output, the output timing of the voice information is corrected. Therefore, it is possible to correct the output timing of the sound so that the rhythm of outputting the sound that supports the user who performs the exercise is not disturbed.

請求項２に記載の発明によれば、音声情報の出力タイミングが映像情報と出力タイミングとずれていても問題がない音声情報が出力される場合、不要な補正を行わないようにすることができる。 According to the second aspect of the present invention, it is possible to prevent unnecessary correction from being performed when there is no problem even if the output timing of the audio information is different from the output timing of the video information. .

請求項３に記載の発明によれば、拍に合わせて出力されるべき音声部分とは異なる音声部分が出力されているときに優先的に補正を行う一方で、拍に合わせて出力されるべき音声部分が出力されているときにも、補正を行う場合がある。そのため、タイミングのずれを迅速に解消することができる。 According to the third aspect of the present invention, when a sound part different from the sound part to be output in accordance with the beat is output, correction is performed preferentially, while the sound part should be output in accordance with the beat. Correction may be performed even when an audio portion is being output. Therefore, the timing shift can be quickly resolved.

請求項４に記載の発明によれば、拍に合わせて出力される音声部分の中でも、ユーザがリズムをとるために特に重要な小節の最初の拍の間に出力されるべき音声部分が出力されるとき、補正は行われない。そのため、運動を行うユーザをサポートする音声が出力されるリズムが乱れないようにしながら、タイミングのずれを迅速に解消することができる。 According to the fourth aspect of the present invention, among the audio portions that are output in time with the beat, the audio portion that is to be output during the first beat of a measure that is particularly important for the user to take a rhythm is output. Correction is not performed. Therefore, it is possible to quickly eliminate the timing shift while preventing the rhythm from which the voice supporting the user performing the exercise is output from being disturbed.

請求項５に記載の発明によれば、拍に合わせて出力されるべき音声部分であるかを適切に判定することができる。 According to the fifth aspect of the present invention, it is possible to appropriately determine whether the sound portion is to be output in time with the beat.

請求項６に記載の発明によれば、サポートする音声が無音状態又は無音に準じる状態であるときに、優先的に補正を行うことができる。そのため、ユーザに聞こえる音声が補正によって不自然になることを防止することができる。 According to the sixth aspect of the present invention, it is possible to preferentially perform correction when the sound to be supported is in a silent state or a state according to silence. Therefore, it is possible to prevent the sound heard by the user from becoming unnatural due to the correction.

本実施形態の運動コンテンツ生成システム１の概要構成例を示す図である。It is a figure showing an example of outline composition of exercise content generation system 1 of this embodiment. （Ａ）は、サポート音声の補正方法の一例を示す図であり、（Ｂ）は、許可レベルテーブルの構成例を示す図である。(A) is a figure which shows an example of the correction method of a support voice, (B) is a figure which shows the structural example of a permission level table. （Ａ）は、サポート音声の音声信号の波形の一例を示すグラフであり、（Ｂ）は、サポート音声の音声信号の絶対値をとる信号の波形の一例を示すグラフであり、（Ｃ）は、（Ｂ）に示す信号を微分した信号の波形の一例を示すグラフである。(A) is a graph which shows an example of the waveform of the audio signal of support voice, (B) is a graph which shows an example of the waveform of the signal which takes the absolute value of the audio signal of support voice, (C) It is a graph which shows an example of the waveform of the signal which differentiated the signal shown to (B). 出力端末５のＣＰＵ５１の許可レベルテーブル生成処理の処理例を示すフローチャートである。10 is a flowchart illustrating a processing example of permission level table generation processing of a CPU 51 of the output terminal 5. （Ａ）は、出力端末５のＣＰＵ５１の運動コンテンツ再生処理の処理例を示すフローチャートであり、（Ｂ）は、出力端末５のＣＰＵ５１の同期処理の処理例を示すフローチャートである。(A) is a flowchart which shows the process example of the exercise | movement content reproduction | regeneration processing of CPU51 of the output terminal 5, (B) is a flowchart which shows the process example of the synchronous process of CPU51 of the output terminal 5. FIG. 出力端末５のＣＰＵ５１のサポート音声再生処理の処理例を示すフローチャートである。10 is a flowchart illustrating a processing example of support voice reproduction processing of a CPU 51 of the output terminal 5.

以下、本発明の実施形態を図面に基づいて説明する。なお、以下に説明する実施の形態は、運動を支援するための運動コンテンツを生成する運動コンテンツ生成システムに本発明を適用した場合の実施形態である。運動コンテンツは、運動を支援するための映像及び音声を含む。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In addition, embodiment described below is embodiment at the time of applying this invention to the exercise content production | generation system which produces | generates the exercise content for supporting exercise. The exercise content includes video and audio for supporting exercise.

［１．運動コンテンツ生成システム１の構成］
始めに、図１を参照して、本実施形態の運動コンテンツ生成システム１の構成について説明する。図１は、本実施形態の運動コンテンツ生成システム１の概要構成例を示す図である。図１に示すように、運動コンテンツ生成システム１は、配信サーバ２と１つ以上の出力端末５とを含んで構成されている。配信サーバ２と出力端末５とは、ネットワーク１０を介して接続可能になっている。ネットワーク１０は、例えば、インターネットを含む。配信サーバ２には、データベース３が接続されている。データベース３には、運動に関する情報や楽曲に関する情報が登録されている。配信サーバ２は、データベース３に登録されている情報等を、定期的に又は出力端末５からの要求に応じて出力端末５に配信する。 [1. Configuration of Exercise Content Generation System 1]
First, the configuration of the exercise content generation system 1 according to the present embodiment will be described with reference to FIG. FIG. 1 is a diagram illustrating a schematic configuration example of an exercise content generation system 1 according to the present embodiment. As shown in FIG. 1, the exercise content generation system 1 includes a distribution server 2 and one or more output terminals 5. The distribution server 2 and the output terminal 5 can be connected via the network 10. The network 10 includes, for example, the Internet. A database 3 is connected to the distribution server 2. Information relating to exercise and information relating to music are registered in the database 3. The distribution server 2 distributes information registered in the database 3 to the output terminal 5 periodically or in response to a request from the output terminal 5.

出力端末５は、例えば、施設４に設置される端末装置である。出力端末５は、本発明の情報処理装置の一例である。出力端末５は、施設４の利用者４１により利用される。施設４は、例えば、スポーツ施設であってもよい。利用者４１は、スポーツ施設において、運動レッスンを受ける。運動レッスンは、複数の運動動作から構成されるレッスンである。この場合の出力端末５は、例えばパーソナルコンピュータであってもよい。 The output terminal 5 is a terminal device installed in the facility 4, for example. The output terminal 5 is an example of an information processing apparatus of the present invention. The output terminal 5 is used by a user 41 of the facility 4. The facility 4 may be a sports facility, for example. The user 41 receives an exercise lesson at the sports facility. The exercise lesson is a lesson composed of a plurality of exercise movements. The output terminal 5 in this case may be a personal computer, for example.

出力端末５は、モニタ５７と接続可能である。モニタ５７は、複数のスピーカ６４とディスプレイ６７とを備える表示装置であってもよい。この場合、出力端末５は、ディスプレイ６７と接続可能である。また、出力端末５は、スピーカ６４と接続可能である。出力端末５がモニタ５７へ音声信号を出力することにより、スピーカ６４により楽曲等が出力される。出力端末５がモニタ５７へ映像信号を出力することにより、ディスプレイ６７に運動映像等が表示される。運動映像は、運動動作を行うフィギュア８３を映し出した動画である。フィギュア８３は、例えば、人、動物、仮想上の生き物、ロボット等のかたちをした仮想物である。フィギュア８３は、三次元仮想空間に配置される。出力端末５は、スピーカ６４から出力される楽曲と、ディスプレイ６７に表示されるフィギュア８３の動きとが同期するように、信号を出力する。楽曲と運動映像とを出力することは、運動コンテンツを出力することの一例である。利用者４１は、スピーカ６４により出力される楽曲を聴きながら、ディスプレイ６７に表示されるフィギュア８３を見て、運動を行うことができる。操作者４２は、リモコン６６等を用いて出力端末５を操作することができる。利用者４１と操作者４２とは同一人物であってもよい。施設４がスポーツ施設である場合、操作者４２は、例えば、インストラクターであってもよい。 The output terminal 5 can be connected to the monitor 57. The monitor 57 may be a display device including a plurality of speakers 64 and a display 67. In this case, the output terminal 5 can be connected to the display 67. The output terminal 5 can be connected to a speaker 64. When the output terminal 5 outputs an audio signal to the monitor 57, music or the like is output from the speaker 64. When the output terminal 5 outputs a video signal to the monitor 57, an exercise video or the like is displayed on the display 67. The motion video is a moving image that shows a figure 83 performing a motion motion. The figure 83 is a virtual object in the form of, for example, a person, an animal, a virtual creature, or a robot. The figure 83 is arranged in a three-dimensional virtual space. The output terminal 5 outputs a signal so that the music output from the speaker 64 and the movement of the figure 83 displayed on the display 67 are synchronized. Outputting music and motion video is an example of outputting motion content. The user 41 can exercise while watching the figure 83 displayed on the display 67 while listening to the music output from the speaker 64. The operator 42 can operate the output terminal 5 using the remote controller 66 or the like. The user 41 and the operator 42 may be the same person. When the facility 4 is a sports facility, the operator 42 may be an instructor, for example.

［２．運動映像とサポート音声との同期］
運動レッスンが行われるときにモニタ５７が出力する運動コンテンツは、楽曲、運動映像及びサポート音声を含む。サポート音声は、運動する利用者４１をサポートするための音声である。例えば、サポート音声は、運動動作を解説するナレーションや、運動を指導する音声を含んでもよい。また、サポート音声は、例えば、楽曲の拍に合わせて出力される音声を含んでもよい。楽曲の拍に合わせて出力される音声とは、例えば、楽曲のテンポに対応する１拍の長さの自然数倍の時間間隔で、音量が極大になる音声である。例えば、テンポが１２０である場合、１拍の長さの自然数倍の時間間隔は、０．５秒、１秒、１．５秒等である。楽曲の拍に合わせて出力される音声として、例えば、「イチニサンシ」等の音声や、手拍子等の音声がある。 [2. Synchronization of motion video and support audio]
The exercise content output by the monitor 57 when an exercise lesson is performed includes music, exercise video, and support audio. The support voice is a voice for supporting the user 41 who exercises. For example, the support voice may include a narration that explains the movement and a voice that instructs the movement. In addition, the support voice may include, for example, a voice that is output in accordance with the beat of the music. The sound output according to the beat of the music is, for example, a sound whose volume is maximized at a time interval that is a natural number multiple of the length of one beat corresponding to the tempo of the music. For example, when the tempo is 120, the time interval that is a natural number multiple of the length of one beat is 0.5 seconds, 1 second, 1.5 seconds, or the like. Examples of the sound that is output in accordance with the beat of the music include a sound such as “Ichinisan” and a sound such as a clapping time.

運動コンテンツを再生するため、出力端末５は、ミュージックシーケンサ、３Ｄエンジン、サポート音声再生プログラムを記憶する。ミュージックシーケンサは、ＭＩＤＩデータに基づいて楽曲を再生するためのプログラムである。ＭＩＤＩデータは、ＭＩＤＩ（Musical Instrument Digital Interface）形式のデータである。ＭＩＤＩデータは、楽曲の楽譜を示すデータである。ＭＩＤＩデータは、本発明における楽譜情報の一例である。ミュージックシーケンサは、例えば、ソフトウェアとしてのシンセサイザーを含んでもよい。３Ｄエンジンは、三次元仮想空間で運動動作するフィギュア８３を二次元平面に投影した運動映像を生成するためのプログラムである。サポート音声再生プログラムは、サポート音声データに基づいて、サポート音声を再生するためのプログラムである。サポート音声データは、サポート音声の音声信号を示すデータである。具体的に、サポート音声データは、例えば、所定時間間隔ごとのサポート音声の音量のサンプル値の時系列を示すデータである。サポート音声データのデータ形式は、例えば、ＷＡＶ（RIFF waveform Audio Format）、ＡＡＣ（Advanced Audio Coding）、ＭＰ３（MPEG Audio Layer-3）等であってもよい。サポート音声データは、本発明における音声情報の一例である。 In order to reproduce the exercise content, the output terminal 5 stores a music sequencer, a 3D engine, and a support audio reproduction program. The music sequencer is a program for playing back music based on MIDI data. MIDI data is data in MIDI (Musical Instrument Digital Interface) format. MIDI data is data indicating the musical score of music. MIDI data is an example of musical score information in the present invention. The music sequencer may include a synthesizer as software, for example. The 3D engine is a program for generating a motion image obtained by projecting a figure 83 that moves in a three-dimensional virtual space onto a two-dimensional plane. The support audio reproduction program is a program for reproducing support audio based on the support audio data. The support audio data is data indicating an audio signal of support audio. Specifically, the support voice data is data indicating a time series of sample values of the volume of the support voice at predetermined time intervals, for example. The data format of the support audio data may be, for example, WAV (RIFF waveform Audio Format), AAC (Advanced Audio Coding), MP3 (MPEG Audio Layer-3), or the like. The support voice data is an example of voice information in the present invention.

運動コンテンツを再生するとき、出力端末５は、運動映像、楽曲及びサポート音声を同期して再生する必要がある。出力端末５は、ミュージックシーケンサを実行することにより、指定されたテンポとＭＩＤＩデータとに従って、イベントを生成する。イベントは、同期を行うための信号である。イベントは、本発明における同期信号の一例である。テンポは、例えば、ＭＩＤＩデータ中に指定されていたり、予め指定されたりする。ＭＩＤＩデータには、デルタタイムとイベントとが記述されている。デルタタイムは、あるイベントを出力してから、次のイベントを出力するまでの時間を示す。イベントとして、通常ＭＩＤＩイベント、３Ｄ制御用拡張イベント、音声再生用拡張イベント等がある。通常ＭＩＤＩイベントは、楽曲の再生を制御するためのイベントである。通常ＭＩＤＩイベントは、シンセサイザーへ出力される。 When reproducing the exercise content, the output terminal 5 needs to reproduce the exercise video, the music, and the support audio in synchronization. The output terminal 5 generates an event according to the designated tempo and MIDI data by executing the music sequencer. The event is a signal for performing synchronization. An event is an example of a synchronization signal in the present invention. The tempo is specified in MIDI data, for example, or specified in advance. In the MIDI data, a delta time and an event are described. The delta time indicates the time from when a certain event is output until the next event is output. Events include normal MIDI events, 3D control extended events, audio playback extended events, and the like. A normal MIDI event is an event for controlling the reproduction of music. Normally, MIDI events are output to the synthesizer.

３Ｄ制御用拡張イベントは、３Ｄエンジンによる処理を制御するために拡張されたイベントである。３Ｄ制御用拡張イベントは、例えば、運動映像を楽曲に同期させるためのイベントを含む。このイベントを、映像同期イベントという。映像同期イベントは、例えば、指定されたテンポに応じた時間間隔で、３Ｄエンジンへ出力される。なお、映像同期イベントに代えて、ＭＩＤＩクロックが３Ｄエンジンへ出力されてもよい。ＭＩＤＩクロックは、テンポに応じた時間間隔で出力されるイベントである。出力端末５は、３Ｄエンジンを実行することにより、所定のフレームレートで、運動映像を構成するフレーム画像を生成して、フレーム画像をディスプレイ６７に表示させる。出力端末５は、映像同期イベントに基づいて、運動映像の表示タイミングを、楽曲の出力タイミングに同期させる。同期方法の詳細については後述する。この同期によって、例えば、運動映像を構成する一部のフレーム画像の表示が省略されたり、運動映像の再生が一時的に停止されたりする場合がある。しかしながら、利用者４１は、フレーム画像が省略されたり、運動映像の再生が一時的に停止したりしたことを、知覚することは難しい。出力端末５は、楽曲の出力タイミングと運動映像の表示タイミングとを随時同期させることができる。 The extended event for 3D control is an event extended to control processing by the 3D engine. The extended event for 3D control includes, for example, an event for synchronizing a motion image with music. This event is called a video synchronization event. The video synchronization event is output to the 3D engine at a time interval corresponding to a specified tempo, for example. In place of the video synchronization event, a MIDI clock may be output to the 3D engine. The MIDI clock is an event output at a time interval corresponding to the tempo. By executing the 3D engine, the output terminal 5 generates a frame image constituting the motion video at a predetermined frame rate and causes the display 67 to display the frame image. The output terminal 5 synchronizes the display timing of the motion video with the output timing of the music based on the video synchronization event. Details of the synchronization method will be described later. Due to this synchronization, for example, the display of some frame images constituting the motion video may be omitted, or the playback of the motion video may be temporarily stopped. However, it is difficult for the user 41 to perceive that the frame image is omitted or the reproduction of the motion video is temporarily stopped. The output terminal 5 can synchronize the output timing of the music and the display timing of the motion video at any time.

音声再生用拡張イベントは、サポート音声を再生するためのイベントである。音声再生用拡張イベントは、例えば、再生に用いるサポート音声データを指定するイベント、指定されたサポート音声データを用いてサポート音声の再生を開始させるイベント等を含む。音声再生用拡張イベントは、サポート音声再生プログラムへ出力される。出力端末５は、サポート音声再生プログラムを実行することにより、音声再生用拡張イベントが出力されたタイミングで、サポート音声の出力を開始させる。 The extended event for audio reproduction is an event for reproducing support audio. The extended event for audio playback includes, for example, an event for specifying support audio data used for playback, an event for starting playback of support audio using the specified support audio data, and the like. The extended event for audio reproduction is output to the support audio reproduction program. The output terminal 5 executes the support sound reproduction program, and starts outputting the support sound at the timing when the sound reproduction extended event is output.

音声再生用拡張イベントにより、出力端末５は、サポート音声の出力を開始するタイミングを、運動映像に同期させることはできる。しかしながら、サポート音声の出力が開始された後、サポート音声の出力タイミングと、運動映像の表示タイミングとがずれる場合がある。例えば、出力端末５の処理負荷が高い状態では、サポート音声の出力タイミングが遅延する場合がある。また、例えば、サポート音声の出力タイミングが早まる可能性もある。そこで、サポート音声の出力が開始された後、サポート音声の出力タイミングを補正して、サポート音声の出力タイミングと運動映像の表示タイミングとを同期させることが考えられる。例えば、サポート音声の出力タイミングが遅い場合、出力端末５は、サポート音声のうち一部の音声を出力させないようにする。これにより、サポート音声の出力タイミングを早くすることができる。また、例えば、サポート音声の出力タイミングが早い場合、出力端末５は、サポート音声の出力を一時的に停止して、その後サポート音声出力を再開させる。これにより、サポート音声の出力タイミングを遅くすることができる。しかしながら、拍に合わせてタイミングをとるサポート音声が出力されているときに、このような補正を行うと、サポート音声が出力されるリズムが乱れてしまう。例えば、「イチニサンシ」とカウントするサポート音声において、「ニ」と「サン」との間で、補正が行われたとする。すると、「ニ」が出力されてから「サン」が出力されるまでの時間間隔が、指定されたテンポに応じた拍の時間間隔とずれてしまう。すると、利用者４１が、リズムに合わせて運動動作を行うことができない。 By the audio reproduction extended event, the output terminal 5 can synchronize the timing of starting the output of the support audio with the motion video. However, after the output of the support audio is started, the output timing of the support audio and the display timing of the motion video may be shifted. For example, when the processing load on the output terminal 5 is high, the output timing of the support voice may be delayed. Further, for example, the output timing of the support voice may be advanced. Therefore, after the output of the support voice is started, it is conceivable to correct the output timing of the support voice and synchronize the output timing of the support voice and the display timing of the motion video. For example, when the output timing of the support voice is late, the output terminal 5 does not output a part of the support voice. Thereby, the output timing of the support voice can be advanced. For example, when the output timing of the support voice is early, the output terminal 5 temporarily stops the output of the support voice and then restarts the support voice output. As a result, the output timing of the support voice can be delayed. However, if such a correction is performed while a support voice that is timed in accordance with the beat is output, the rhythm in which the support voice is output is disturbed. For example, it is assumed that the correction is performed between “d” and “sun” in the support voice that counts as “first day”. Then, the time interval from the output of “d” to the output of “sun” deviates from the beat time interval corresponding to the designated tempo. Then, the user 41 cannot perform an exercise operation according to the rhythm.

そこで、出力端末５は、運動映像の表示タイミングとサポート音声の表示タイミングとに、所定の閾値となる時間を超えるずれがあるとき、サポート音声データが示すサポート音声のうち、スピーカ６４により出力されている音声部分が、拍に合わせて出力される音声部分であるかを判定する。そして、出力端末５は、出力されている音声部分が、拍に合わせて出力される音声部分ではないと判定されたとき、サポート音声の出力タイミングを補正する。 Therefore, the output terminal 5 outputs the support audio indicated by the support audio data from the speaker 64 when the display timing of the motion video and the display timing of the support audio have a time difference exceeding a predetermined threshold value. It is determined whether the audio part that is present is an audio part that is output in time with the beat. The output terminal 5 corrects the output timing of the support voice when it is determined that the output voice part is not the voice part output in time with the beat.

以下に具体例を説明する。図２（Ａ）は、サポート音声の補正方法の一例を示す図である。例えば、サポート音声として、「イチニサンシ」とカウントする音声が数回出力された後、ナレーションが出力されるとする。図２（Ａ）（１）は、時間の経過に対する楽曲の進行を示す。図２（Ａ）（１）は、４拍で１小節が構成される例を示す。運動映像の表示タイミングは、楽曲の出力タイミングと随時同期させることができる。そのため、図２（Ａ）（１）は、運動映像の進行をも示す。図２（Ａ）（２）は、時間の経過に対するサポート音声の音声レベルの変化を示す。音声レベルは音量を示す。サポート音声に含まれる各音声部分は、サポート音声の再生位置と対応している。再生位置は、対応する音声部分が出力されるべき時間的な位置を示す。この時間的な位置は、楽曲の再生が開始されてから経過した時間である。図２（Ａ）（２）に示す時間Ｔ０〜５、Ｔ１１〜Ｔ１５、Ｔ２５〜Ｔ２９が、それぞれ再生位置である。各時間の時間間隔は、拍の長さに相当する。例えば、時間Ｔ１は「イチ」に対応する。従って、サポート音声の再生開始から時間Ｔ１が経過したときに、「イチ」が出力されるべきである。 A specific example will be described below. FIG. 2A illustrates an example of a support voice correction method. For example, it is assumed that a narration is output after a sound that counts as “supporting sound” is output several times as a support sound. FIGS. 2A and 2A show the progress of music over time. 2A and 2A show an example in which one measure is composed of four beats. The motion video display timing can be synchronized with the music output timing as needed. For this reason, FIGS. 2A and 2A also show the progress of the motion video. 2A and 2B show changes in the voice level of the support voice over time. The sound level indicates the volume. Each voice part included in the support voice corresponds to the playback position of the support voice. The reproduction position indicates a temporal position where the corresponding audio part should be output. This temporal position is the time that has elapsed since the reproduction of the music was started. Times T0 to T5, T11 to T15, and T25 to T29 shown in FIGS. 2A and 2B are playback positions. The time interval of each time corresponds to the length of the beat. For example, the time T1 corresponds to “1”. Therefore, “1” should be output when the time T1 has elapsed from the start of playback of the support voice.

時間Ｔ１〜Ｔ４において、「イチニサンシ」が出力される。ここでは、サポート音声の出力タイミングが運動映像の表示タイミングと合っている。その後、時間Ｔ１１、Ｔ１２において、「サンシ」が出力されるはずである。しかしながら、この時点から、サポート音声の出力タイミングが、運動映像の表示タイミングよりも、閾値以上遅延しているとする。そのため、実際には、楽曲の再生開始から時間Ｔ１１が経過したタイミングよりも遅れて、「サン」が出力される。つまり、楽曲の再生開始から経過した時間とサポート音声の再生位置とがずれる。ここで、出力端末５は、現在出力されている音声部分が、拍に合わせて出力される音声部分であるかを判定する。「サンシ」は、拍に合わせて出力される音声部分である。従って、出力端末５は、サポート音声の補正を行わない。その後、時間Ｔ１３〜Ｔ２７において、ナレーションが出力される。ナレーションは、拍に合わせて出力される音声部分ではない。そこで、出力端末５は、サポート音声を補正する。具体的に、図２（Ａ）（４）に示すように、時間Ｔ１３〜Ｔ１４の範囲に対応する音声の一部を出力させないことにより、サポート音声の再生位置を早くする。 From time T1 to time T4, “very good” is output. Here, the output timing of the support voice matches the display timing of the motion video. After that, at the times T11 and T12, “Sanshi” should be output. However, from this point in time, it is assumed that the output timing of the support sound is delayed by a threshold value or more than the display timing of the motion video. Therefore, in practice, “Sun” is output later than the timing at which the time T11 has elapsed from the start of music playback. That is, the elapsed time from the start of music playback and the playback position of the support voice are shifted. Here, the output terminal 5 determines whether the currently output audio part is an audio part output in time with a beat. “Sanshi” is an audio part that is output in time with the beat. Therefore, the output terminal 5 does not correct the support voice. Thereafter, narration is output at times T13 to T27. Narration is not an audio part that is output in time with the beat. Therefore, the output terminal 5 corrects the support voice. Specifically, as shown in FIGS. 2 (A) and 2 (4), the playback position of the support voice is accelerated by not outputting a part of the voice corresponding to the range of the time T13 to T14.

出力端末５は、現在出力されている音声部分が、拍に合わせて出力される音声部分であっても、サポート音声の出力タイミングを補正してもよい。ただし、出力端末５は、現在出力されている音声部分が、拍に合わせて出力される音声部分ではないとき、拍に合わせて出力される音声部分であるときよりも、優先的に補正を行う。具体的に、運動映像の表示タイミングとサポート音声の表示タイミングとのずれを判定するための閾値を変える。つまり、拍に合わせて出力される音声部分に対する閾値を、拍に合わせて出力される音声部分と異なる音声部分に対する閾値よりも大きくする。閾値が小さいほど、補正が行われる確率が高くなる。 The output terminal 5 may correct the output timing of the support voice even if the currently output audio portion is an audio portion output in time with the beat. However, the output terminal 5 preferentially corrects when the currently output audio part is not the audio part output in time with the beat, rather than when the audio part is output in time with the beat. . Specifically, the threshold value for determining the difference between the display timing of the motion video and the display timing of the support voice is changed. That is, the threshold value for the sound part output in time with the beat is set larger than the threshold value for the sound part different from the sound part output in time with the beat. The smaller the threshold, the higher the probability that correction will be performed.

また、出力端末５は、現在出力されている音声部分が、拍に合わせて出力される音声部分であって、且つ、小節の最初の拍の間に出力される音声部分である場合、サポート音声の出力タイミングを補正しない。その理由は、小節の最初の拍の間に出力される音声部分が、ユーザがリズムをとるために特に重要な音声部分であるからである。例えば、図２（Ａ）（２）において、サポート音声の再生位置が時間Ｔ１〜Ｔ２の間にあるとき、出力端末５は、補正を行わない。 The output terminal 5 supports the support voice when the currently outputted voice part is a voice part that is output in time with the beat and is output during the first beat of the measure. The output timing is not corrected. The reason is that the audio part output during the first beat of the measure is an especially important audio part for the user to take a rhythm. For example, in FIGS. 2A and 2B, the output terminal 5 does not perform correction when the playback position of the support voice is between times T1 and T2.

また、出力端末５は、現在出力されている音声部分が、拍に合わせて出力される音声部分ではないとき、現在出力されている音声部分が無音である場合に、最も優先して優先的に補正を行う。例えば、図２（Ａ）（２）において、時間Ｔ２５〜Ｔ２６の間は無音部分が出力されている。このとき、出力端末５は、最も優先的に補正を行う。具体的に、運動映像の表示タイミングとサポート音声の表示タイミングとのずれを判定するための閾値を、最も小さくする。これにより、利用者４１に聞こえるサポート音声が補正によって不自然に聞こえることを防止することができる。なお、出力端末５は、音声レベルが所定値以下である音声部分も、無音であるとみなす。 The output terminal 5 gives the highest priority when the currently output audio part is not an audio part output in time with the beat and the currently output audio part is silent. Make corrections. For example, in FIGS. 2 (A) and 2 (2), a silent portion is output between times T25 and T26. At this time, the output terminal 5 performs correction with the highest priority. Specifically, the threshold for determining the difference between the display timing of the motion video and the display timing of the support audio is minimized. Thereby, it is possible to prevent the support voice heard by the user 41 from being unnaturally heard by the correction. Note that the output terminal 5 also considers a voice part having a voice level equal to or lower than a predetermined value to be silent.

出力端末５は、１つのサポート音声データが示すサポート音声が、拍に合わせて出力される音声部分を含まない場合には、サポート音声の出力タイミングを補正しなくてもよい。この場合、サポート音声の出力タイミングが運動映像の表示タイミングとずれていても、利用者４１が運動するリズムを乱さないからである。 The output terminal 5 does not have to correct the output timing of the support voice when the support voice indicated by one support voice data does not include a voice portion that is output in time with the beat. In this case, even if the output timing of the support voice is shifted from the display timing of the motion video, the rhythm of the user 41 exercising is not disturbed.

以下に、拍に合わせて出力される音声部分であるか否かを判定するための処理を説明する。出力端末５は、判定に用いられる情報を予め生成する。具体的に、出力端末５は、許可レベルテーブルを生成する。許可レベルテーブルは、サポート音声の再生位置の範囲と許可レベルとが対応付けて登録されるテーブル情報である。許可レベルテーブルは、本発明における範囲情報の一例である。許可レベルは、サポート音声の補正が優先的に行われる度合いを示す。許可レベルが高いほど、優先的に補正される。許可レベルとして、レベル０〜４の何れかが設定される。レベル０は、拍に合わせて出力される音声部分であり、且つ、小節の最初の拍の間に出力される音声部分の再生位置の範囲に対応して設定される。レベル０に対応する再生位置では、サポート音声の補正は行われない。レベル１は、拍に合わせて出力される音声部分であり、且つ、小節の最初の拍とは異なる拍の間に出力される音声部分の再生位置の範囲に対応して設定される。レベル２は、拍に合わせて出力されない有音部分の再生位置の範囲に対応して設定される。レベル３は、拍に合わせて出力されない無音部分の再生位置の範囲に対応して設定される。レベル１〜３に対して、運動映像の表示タイミングとサポート音声の出力タイミングとのずれを判定するための閾値ｔｈ１〜ｔｈ３がそれぞれ設定される。これらの閾値は、ｔｈ１＞ｔｈ２＞ｔｈ３を満たす。スピーカ６４により現在出力されている音声部分の再生位置が、許可レベルテーブルにおいてレベル０又は１に対応する範囲にあるとする。この場合、出力端末５は、その音声部分が、拍に合わせて出力される音声部分であると判定する。一方、スピーカ６４により現在出力されている音声部分の再生位置が、許可レベルテーブルにおいてレベル２又は３に対応する範囲にあるとする。この場合、出力端末５は、その音声部分が、拍に合わせて出力される音声部分ではないと判定する。 Below, the process for determining whether it is the audio | voice part output according to a beat is demonstrated. The output terminal 5 generates information used for determination in advance. Specifically, the output terminal 5 generates a permission level table. The permission level table is table information in which the range of support audio playback positions and permission levels are registered in association with each other. The permission level table is an example of range information in the present invention. The permission level indicates the degree to which the support voice is preferentially corrected. The higher the permission level, the higher the priority. Any one of levels 0 to 4 is set as the permission level. Level 0 is an audio part that is output in time with a beat, and is set corresponding to the range of the playback position of the audio part that is output during the first beat of a measure. At the playback position corresponding to level 0, the support voice is not corrected. Level 1 is an audio part that is output in time with the beat, and is set corresponding to the range of the reproduction position of the audio part that is output during a beat different from the first beat of the measure. Level 2 is set corresponding to the range of the reproduction position of the sound part that is not output in time with the beat. Level 3 is set corresponding to the range of the reproduction position of the silent part that is not output in time with the beat. For levels 1 to 3, thresholds th1 to th3 for determining a difference between the display timing of the motion video and the output timing of the support sound are set. These threshold values satisfy th1> th2> th3. It is assumed that the reproduction position of the audio part currently output by the speaker 64 is in a range corresponding to level 0 or 1 in the permission level table. In this case, the output terminal 5 determines that the audio part is an audio part that is output in time with the beat. On the other hand, it is assumed that the reproduction position of the audio part currently output from the speaker 64 is in a range corresponding to level 2 or 3 in the permission level table. In this case, the output terminal 5 determines that the audio part is not an audio part output in time with the beat.

図２（Ｂ）は、許可レベルテーブルの構成例を示す図である。図２（Ｂ）において、０秒に対してレベル３が設定され、１０秒に対してレベル０が設定されている。この場合、再生位置が０秒から１０秒までの間の音声部分は、無音であることを示す。また、１０．２５０秒に対してレベル１が設定されている。この場合、再生位置が１０秒から１０．２５０までの間の音声部分は、拍に合わせて出力される音声部分であり、且つ、小節の最初の拍の間に出力される音声部分であることを示す。 FIG. 2B is a diagram illustrating a configuration example of the permission level table. In FIG. 2B, level 3 is set for 0 seconds, and level 0 is set for 10 seconds. In this case, the sound part whose playback position is between 0 seconds and 10 seconds indicates that there is no sound. Level 1 is set for 10.250 seconds. In this case, the audio part whose playback position is between 10 seconds and 10.250 is an audio part that is output in time with the beat, and is an audio part that is output during the first beat of the measure. Indicates.

出力端末５は、サポート音声データに基づいて、許可レベルテーブルを生成する。図３（Ａ）は、サポート音声の音声信号の波形の一例を示すグラフである。出力端末５は、サポート音声データから、サポート音声の音声信号を生成する。そして、出力端末５は、サポート音声の音声信号が示す音声レベルの絶対値をとる信号を生成する。このとき、出力端末５は、所定値未満の音声レベルを、音声レベル０に変更する。つまり、所定値未満の音声レベルの音声が無音とされる。 The output terminal 5 generates a permission level table based on the support voice data. FIG. 3A is a graph showing an example of a waveform of an audio signal of support voice. The output terminal 5 generates an audio signal of support audio from the support audio data. And the output terminal 5 produces | generates the signal which takes the absolute value of the audio | voice level which the audio | voice signal of support audio | voice shows. At this time, the output terminal 5 changes the audio level below the predetermined value to the audio level 0. That is, the sound having a sound level less than the predetermined value is silenced.

図３（Ｂ）は、サポート音声の音声信号の絶対値をとる信号の波形の一例を示すグラフである。出力端末５は、図３（Ｂ）に示す信号から、音声レベルが極大となる再生位置を特定する。具体的に、出力端末５は、図３（Ｂ）に示す信号を微分した信号を生成する。図３（Ｃ）は、図３（Ｂ）に示す信号を微分した信号の波形の一例を示すグラフである。出力端末５は、図３（Ｂ）に示す信号から、ゼロクロスポイントを抽出する。ゼロクロスポイントは、図３（Ｂ）に示す信号が微分値０と交わる再生位置である。図３（Ｂ）が示す音声レベルが極大となるときと極小となるときの何れにおいてもゼロクロスポイントが出現する。そのため、出力端末５は、音声レベルが極大となるときのゼロクロスポイントを抽出する。例えば、図３（Ｃ）において、時間Ｐ１〜Ｐ５等がゼロクロスポイントとして抽出される。 FIG. 3B is a graph showing an example of a waveform of a signal that takes the absolute value of the voice signal of the support voice. The output terminal 5 specifies the reproduction position where the sound level becomes maximum from the signal shown in FIG. Specifically, the output terminal 5 generates a signal obtained by differentiating the signal shown in FIG. FIG. 3C is a graph illustrating an example of a waveform of a signal obtained by differentiating the signal illustrated in FIG. The output terminal 5 extracts a zero cross point from the signal shown in FIG. The zero cross point is a reproduction position where the signal shown in FIG. A zero cross point appears both when the sound level shown in FIG. 3B is maximum and minimum. Therefore, the output terminal 5 extracts the zero cross point when the sound level becomes maximum. For example, in FIG. 3C, times P1 to P5 and the like are extracted as zero cross points.

出力端末５は、図３（Ｃ）に示す信号の中で出現する順番が連続するゼロクロスポイント同士の時間間隔を計算する。例えば、時間Ｐ１と時間Ｐ２との時間間隔、時間Ｐ２と時間Ｐ３との時間間隔、時間Ｐ３と時間Ｐ４との時間間隔、時間Ｐ４と時間Ｐ５との時間間隔、時間Ｐ５と時間Ｐ６との時間間隔等が計算される。ある２つのゼロクロスポイント同士の時間間隔が、拍の長さに応じた時間間隔であるとする。この場合、出力端末５は、この２つのゼロクロスポイントに囲まれた範囲に対応する音声部分を、拍に合わせて出力される音声部分であると判定する。例えば、テンポが１２０である場合、１拍の長さは０．５秒である。この場合、ゼロクロスポイント間の時間間隔が０．５秒の自然数倍である場合、このゼロクロスポイントに囲まれた範囲に対応する音声部分は、拍に合わせて出力される音声部分である。図３（Ｃ）において、時間Ｐ１と時間Ｐ２との時間間隔、時間Ｐ２と時間Ｐ３との時間間隔、時間Ｐ３と時間Ｐ４との時間間隔は、１拍の長さであるとする。従って、時間Ｐ１から時間Ｐ４までの範囲に対応する音声部分は、拍に合わせて出力される音声部分である。一方、時間Ｐ４と時間Ｐ５との時間間隔及び時間Ｐ５と時間Ｐ６との時間間隔は、拍の長さに応じた時間間隔ではないとする。従って、時間Ｐ４から時間Ｐ６までの範囲に対応する音声部分は、拍に合わせて出力される音声部分ではない。 The output terminal 5 calculates a time interval between zero cross points in which the order of appearance in the signal shown in FIG. For example, the time interval between time P1 and time P2, the time interval between time P2 and time P3, the time interval between time P3 and time P4, the time interval between time P4 and time P5, the time between time P5 and time P6 The interval etc. are calculated. Assume that the time interval between two zero cross points is a time interval according to the length of the beat. In this case, the output terminal 5 determines that the audio part corresponding to the range surrounded by the two zero cross points is an audio part output in accordance with the beat. For example, if the tempo is 120, the length of one beat is 0.5 seconds. In this case, when the time interval between the zero cross points is a natural multiple of 0.5 seconds, the audio part corresponding to the range surrounded by the zero cross points is an audio part output in time with the beat. In FIG. 3C, the time interval between the time P1 and the time P2, the time interval between the time P2 and the time P3, and the time interval between the time P3 and the time P4 are assumed to be one beat long. Therefore, the audio part corresponding to the range from time P1 to time P4 is an audio part that is output in time with the beat. On the other hand, the time interval between time P4 and time P5 and the time interval between time P5 and time P6 are not time intervals according to the length of the beat. Therefore, the audio part corresponding to the range from the time P4 to the time P6 is not an audio part output in time with the beat.

出力端末５は、拍に合わせて出力される音声部分を特定した場合、その音声部分の再生位置が、小節の最初の拍の再生位置であるかを判定する。例えば、テンポとＭＩＤＩデータに設定されている拍子記号に基づいて、小節の最初の拍の再生位置を計算することができる。出力端末５は、拍に合わせて出力される音声部分の再生位置が、小節の最初の拍の再生位置である場合、その再生位置とレベル０とを対応付けて許可レベルテーブルに登録する。出力端末５は、拍に合わせて出力される音声部分の再生位置が、小節の最初の拍の再生位置ではない場合、その再生位置とレベル１とを対応付けて許可レベルテーブルに登録する。例えば、時間Ｐ１が小節の最初の拍の再生位置である場合、出力端末５は、時間Ｐ１とレベル０とを対応付けて登録する。この場合、時間Ｐ２〜Ｐ４は、小節の最初の拍の再生位置ではない。従って、出力端末５は、時間Ｐ２〜Ｐ４のそれぞれと、レベル１とを対応付けて登録する。 When the output terminal 5 specifies the audio part output in time with the beat, the output terminal 5 determines whether the playback position of the audio part is the playback position of the first beat of the measure. For example, the playback position of the first beat of a measure can be calculated based on the tempo and the time signature set in the MIDI data. When the playback position of the audio part output in time with the beat is the playback position of the first beat of the measure, the output terminal 5 registers the playback position and level 0 in association with each other in the permission level table. When the playback position of the audio portion output in time with the beat is not the playback position of the first beat of the measure, the output terminal 5 registers the playback position and level 1 in association with each other in the permission level table. For example, when the time P1 is the reproduction position of the first beat of the measure, the output terminal 5 registers the time P1 and the level 0 in association with each other. In this case, the times P2 to P4 are not the playback position of the first beat of the measure. Accordingly, the output terminal 5 registers each of the times P2 to P4 in association with the level 1.

出力端末５は、拍に合わせて出力される音声部分とは異なる音声部分を特定した場合、その音声部分が有音であるか否かを判定する。出力端末５は、音声部分が有音である場合、その音声部分の再生位置とレベル２とを対応付けて許可レベルテーブルに登録する。出力端末５は、音声部分が無音である場合、その音声部分の再生位置とレベル３とを対応付けて許可レベルテーブルに登録する。図３（Ｃ）において、時間Ｐ４〜Ｐ５の間で、時間Ｔａ〜Ｔｂの間が、無音である。この場合、出力端末５は、時間Ｐ４とレベル２とを対応付けて登録する。また、出力端末５は、時間Ｔａとレベル３とを対応付けて登録する。また、出力端末５は、時間Ｔｂとレベル２とを対応付けて登録する。 When the output terminal 5 specifies a voice part that is different from the voice part output in time with the beat, the output terminal 5 determines whether or not the voice part is sounded. When the audio portion is sounded, the output terminal 5 associates the reproduction position of the audio portion with level 2 and registers them in the permission level table. When the audio part is silent, the output terminal 5 associates the reproduction position of the audio part with level 3 and registers them in the permission level table. In FIG. 3 (C), there is no sound between time P4 and P5 and between time Ta and Tb. In this case, the output terminal 5 registers time P4 and level 2 in association with each other. The output terminal 5 registers time Ta and level 3 in association with each other. The output terminal 5 registers time Tb and level 2 in association with each other.

［３．各装置の構成］
次に、図１を参照して、運動コンテンツ生成システムに含まれる各装置の構成について説明する。 [3. Configuration of each device]
Next, the configuration of each device included in the athletic content generation system will be described with reference to FIG.

［３−１．配信サーバ２の構成］
図１に示すように、配信サーバ２は、ＣＰＵ２１、ＲＯＭ２２、ＲＡＭ２３、バス２４、Ｉ／Ｏインタフェイス２５、表示制御部２６、ディスクドライブ２８、ネットワーク通信部３０及びＨＤＤ（ハードディスクドライブ）３７を備える。ＣＰＵ２１は、バス２４を介して、ＲＯＭ２２、ＲＡＭ２３、バス２４及びＩ／Ｏインタフェイス２５に接続されている。ＣＰＵ２１は、ＲＯＭ２２やＨＤＤ３７に記憶されたプログラムを実行することにより、配信サーバ２の各部を制御する。Ｉ／Ｏインタフェイス２５には、データベース３、表示制御部２６、ディスクドライブ２８、ネットワーク通信部３０、キーボード３１、マウス３２及びＨＤＤ３７が接続されている。ＣＰＵ２１は、Ｉ／Ｏインタフェイス２５を介してデータベース３にアクセスする。表示制御部２６は、ＣＰＵ２１の制御に基づいて映像信号をモニタ２７に出力する。ディスクドライブ２８は、記録媒体２９に対するデータの書き込み及び読み出しを行う。ネットワーク通信部３０は、配信サーバ２がネットワーク１０に接続するための制御を行う。ＨＤＤ３７には、ＯＳや各種制御プログラム等が記憶されている。 [3-1. Configuration of distribution server 2]
As shown in FIG. 1, the distribution server 2 includes a CPU 21, a ROM 22, a RAM 23, a bus 24, an I / O interface 25, a display control unit 26, a disk drive 28, a network communication unit 30, and an HDD (hard disk drive) 37. . The CPU 21 is connected to the ROM 22, the RAM 23, the bus 24, and the I / O interface 25 via the bus 24. The CPU 21 controls each unit of the distribution server 2 by executing a program stored in the ROM 22 or the HDD 37. Connected to the I / O interface 25 are a database 3, a display control unit 26, a disk drive 28, a network communication unit 30, a keyboard 31, a mouse 32, and an HDD 37. The CPU 21 accesses the database 3 via the I / O interface 25. The display control unit 26 outputs a video signal to the monitor 27 based on the control of the CPU 21. The disk drive 28 writes data to and reads data from the recording medium 29. The network communication unit 30 performs control for the distribution server 2 to connect to the network 10. The HDD 37 stores an OS, various control programs, and the like.

データベース３には、モーションデータ、ＭＩＤＩデータ、サポート音声データ等のデータが登録されている。モーションデータは、運動動作ごとに登録される。モーションデータは、三次元仮想空間におけるフィギュア８３の運動動作を定義するデータである。モーションデータは、ディスプレイ６７にフィギュア８３の運動動作を表示させるためのデータである。モーションデータは、運動動作の進行に応じたフィギュア８３の身体の各部の座標を含む。モーションデータは、本発明における運動情報の一例である。モーションデータは、モーション情報と対応付けて登録される。モーション情報は、運動動作を示す識別情報である。各ＭＩＤＩデータは、それぞれ所定の運動動作に合わせて作成される。その理由は、楽曲と運動映像とが出力されているときに、運動映像が示す運動動作に応じたサポート音声を出力させるためである。運動動作に応じたサポート音声が出力されるように、ＭＩＤＩデータに音声再生用拡張イベントを記述する必要がある。ＭＩＤＩデータが作成されるとき、例えば、テンポも決定される。例えば、運動動作を行う速度に応じたテンポが決定される。テンポは、例えば、ＭＩＤＩデータに記述される。ＭＩＤＩデータは、モーション情報に対応付けて登録される。 In the database 3, data such as motion data, MIDI data, and support voice data are registered. Motion data is registered for each movement. The motion data is data that defines the motion motion of the figure 83 in the three-dimensional virtual space. The motion data is data for displaying the motion motion of the figure 83 on the display 67. The motion data includes the coordinates of each part of the body of the figure 83 according to the progress of the motion movement. Motion data is an example of exercise information in the present invention. The motion data is registered in association with the motion information. The motion information is identification information indicating an exercise motion. Each MIDI data is created in accordance with a predetermined exercise operation. The reason is that when the music and the motion video are output, the support voice corresponding to the motion motion indicated by the motion video is output. It is necessary to describe an extended event for audio reproduction in the MIDI data so that a support audio corresponding to the exercise operation is output. When the MIDI data is created, for example, the tempo is also determined. For example, the tempo corresponding to the speed at which the exercise operation is performed is determined. The tempo is described in MIDI data, for example. MIDI data is registered in association with motion information.

［３−２．出力端末５の構成］
図１に示すように、出力端末５は、ＣＰＵ５１、ＲＯＭ５２、ＲＡＭ５３、バス５４、Ｉ／Ｏインタフェイス５５、表示制御部５６、ディスクドライブ５８、ネットワーク通信部６０、音声出力部６３、信号受信部６５及びＨＤＤ７を備える。ＣＰＵ５１は、バス５４を介して、ＲＯＭ５２、ＲＡＭ５３、バス５４及びＩ／Ｏインタフェイス５５に接続されている。ＣＰＵ５１は、時計機能及びタイマー機能を有する。ＣＰＵ５１は、ＲＯＭ５２やＨＤＤ７に記憶されたプログラムを実行することにより、出力端末５の各部を制御する。Ｉ／Ｏインタフェイス５５には、ＨＤＤ７、表示制御部５６、音声出力部６３、ディスクドライブ５８、ネットワーク通信部６０、キーボード６１、マウス６２及び信号受信部６５が接続されている。表示制御部５６は、ＣＰＵ５１の制御に基づいて映像信号をモニタ５７に出力する。音声出力部６３は、ＣＰＵ５１の制御に基づいて音声信号をモニタ５７に出力する。ディスクドライブ５８は、記録媒体５９に対するデータの書き込み及び読み出しを行う。信号受信部６５は、リモコン６６から出力される信号を受信する。リモコン６６は、操作者４２が出力端末５を操作するためのものである。 [3-2. Configuration of output terminal 5]
As shown in FIG. 1, the output terminal 5 includes a CPU 51, a ROM 52, a RAM 53, a bus 54, an I / O interface 55, a display control unit 56, a disk drive 58, a network communication unit 60, an audio output unit 63, and a signal receiving unit. 65 and HDD 7. The CPU 51 is connected to the ROM 52, the RAM 53, the bus 54, and the I / O interface 55 via the bus 54. The CPU 51 has a clock function and a timer function. The CPU 51 controls each unit of the output terminal 5 by executing a program stored in the ROM 52 or the HDD 7. Connected to the I / O interface 55 are an HDD 7, a display control unit 56, an audio output unit 63, a disk drive 58, a network communication unit 60, a keyboard 61, a mouse 62, and a signal receiving unit 65. The display control unit 56 outputs a video signal to the monitor 57 based on the control of the CPU 51. The audio output unit 63 outputs an audio signal to the monitor 57 based on the control of the CPU 51. The disk drive 58 writes and reads data to and from the recording medium 59. The signal receiving unit 65 receives a signal output from the remote controller 66. The remote controller 66 is for the operator 42 to operate the output terminal 5.

ＨＤＤ７は、本発明における記憶手段、第２記憶手段の一例である。ＨＤＤ７には、配信サーバ２から配信されたモーションデータ、ＭＩＤＩデータ、サポート音声データ等が記憶される。また、ＨＤＤ７には、レッスン情報、許可レベルテーブル、閾値ｔｈ１〜ｔｈ３等が記憶される。レッスン情報は、運動レッスンの構成を定めたレッスン情報である。例えば、操作者４２の操作に基づいて、出力端末５がレッスン情報を生成する。運動レッスンは、例えば、複数の運動動作から構成される。また例えば、運動レッスンを構成する各運動動作には、楽曲が割り当てられる。運動動作に対して割り当てられた楽曲は、楽曲が割り当てられた運動動作の運動映像がディスプレイ６７に表示されているときに、スピーカ６４により出力される楽曲である。レッスン情報は、例えば、複数のモーション情報を含む。複数のモーション情報のそれぞれは、運動レッスンを構成する運動動作を示す。複数のモーション情報は、運動動作の実行順に並べられている。上述したように、ＭＩＤＩデータはモーション情報と対応付けられている。ＭＩＤＩデータが示す楽曲が、モーション情報が示す運動動作に割り当てられる。許可レベルテーブルは、レッスンＩＤに対応付けて記憶される。レッスンＩＤは、許可レベルテーブルが用いられる運動レッスンを示す。 The HDD 7 is an example of a storage unit and a second storage unit in the present invention. The HDD 7 stores motion data, MIDI data, support audio data, and the like distributed from the distribution server 2. The HDD 7 stores lesson information, permission level tables, threshold values th1 to th3, and the like. The lesson information is lesson information that defines the composition of an exercise lesson. For example, the output terminal 5 generates lesson information based on the operation of the operator 42. The exercise lesson includes, for example, a plurality of exercise operations. In addition, for example, music is assigned to each exercise motion constituting the exercise lesson. The music allocated to the exercise motion is a music output from the speaker 64 when the motion image of the exercise motion to which the music is allocated is displayed on the display 67. The lesson information includes, for example, a plurality of motion information. Each of the plurality of pieces of motion information indicates an exercise operation constituting the exercise lesson. The plurality of pieces of motion information are arranged in the order of execution of the exercise operation. As described above, MIDI data is associated with motion information. The music indicated by the MIDI data is assigned to the exercise motion indicated by the motion information. The permission level table is stored in association with the lesson ID. The lesson ID indicates an exercise lesson in which the permission level table is used.

ＨＤＤ７には、更に、ＯＳ、運動支援プログラム、ミュージックシーケンサ、３Ｄエンジン、サポート音声再生プログラム、オーディオデバイスドライバ等の各種プログラム等が記憶されている。運動支援プログラムは、利用者４１の運動を支援するためのプログラムである。運動支援プログラムは、コンピュータとしてのＣＰＵ５１に、第１制御ステップと、第２制御ステップと、第３制御ステップと、第１判定ステップと、第２判定ステップと、補正ステップとを少なくとも実行させる。 The HDD 7 further stores various programs such as an OS, an exercise support program, a music sequencer, a 3D engine, a support voice reproduction program, and an audio device driver. The exercise support program is a program for supporting the exercise of the user 41. The exercise support program causes the CPU 51 as a computer to execute at least a first control step, a second control step, a third control step, a first determination step, a second determination step, and a correction step.

ＣＰＵ５１は、ミュージックシーケンサを実行することにより、ＭＩＤＩデータに基づいて、各種イベントを出力させる。具体的に、ＣＰＵ５１は、ＭＩＤＩデータから、例えばデルタタイムとイベントとの組を順次読み出す。ＣＰＵ５１は、デルタタイムと指定されたテンポとに応じた時間スリープした後、イベントを出力する。通常ＭＩＤＩイベントが出力されたとき、ＣＰＵ５１は、通常ＭＩＤＩイベントに基づいて、音声出力部６３による音声信号の出力を制御する。これにより、ＣＰＵ５１は、音声出力部６３からスピーカ６４へ、楽曲の音声信号を出力させ、スピーカ６４から楽曲を出力させる。 The CPU 51 outputs various events based on the MIDI data by executing the music sequencer. Specifically, the CPU 51 sequentially reads, for example, a set of delta time and event from the MIDI data. The CPU 51 sleeps for a time corresponding to the delta time and the specified tempo, and then outputs an event. When the normal MIDI event is output, the CPU 51 controls the output of the audio signal by the audio output unit 63 based on the normal MIDI event. As a result, the CPU 51 causes the audio output unit 63 to output a music audio signal to the speaker 64 and causes the speaker 64 to output the music.

ＣＰＵ５１は、３Ｄエンジンを実行することにより、モーションデータに基づいて、運動映像の再生処理を行う。具体的に、ＣＰＵ５１は、所定のフレームレートに応じた時間間隔で、運動映像を構成するフレーム画像を生成してディスプレイ６７に表示させる。フレーム画像を生成する処理は、射影変換、クリッピング、隠面消去、シェーディング、テクスチャマッピング等を含む。ＣＰＵ５１は、フレーム画像を表示制御部５６へ順次出力することで、ディスプレイ６７には、フィギュア８３が運動動作を行う運動映像が表示される。ＣＰＵ５１は、３Ｄ管理再生時間をＲＡＭ５３に記憶させて管理している。３Ｄ管理再生時間は、運動映像の再生が開始されてから経過した時間を示す。運動映像の再生開始時刻は、例えば、運動コンテンツの再生開始時刻と一致する。ＣＰＵ５１は、現在の３Ｄ管理再生時間に対応するフレーム画像を生成する。例えば、フレームレートが３０であり、現在の３Ｄ管理再生時間が２秒であるとする。このとき、ＣＰＵ５１は、運動映像を構成するフレーム画像のうち、先頭から６０フレーム目のフレーム画像を生成する。一方、ミュージックシーケンサにより出力される通常ＭＩＤＩイベントは、ＭＩＤＩデータに記述されたデルタタイムに応じたタイミングで出力される。ここで、楽曲の出力タイミングと運動映像の表示タイミングとがずれる場合がある。そこで、ＣＰＵ５１は、例えば、次に説明する方法で、運動映像の表示タイミングを楽曲の出力タイミングに同期させる。ＣＰＵ５１は、運動映像の再生開始時から、ミュージックシーケンサにより映像同期イベントが出力された回数をカウントする。例えば、フレームレートが３０であり、映像同期イベントが出力される時間間隔が０．５秒であるとする。現在の３Ｄ管理再生時間が１．９秒である場合、先頭から５７フレーム目のフレーム画像を表示することを示している。しかしながら、このときに、４回目の映像同期イベントが出力されたとする。この場合、楽曲の再生開始から２秒経過していることになる。そこで、出力端末５は、３Ｄ管理再生時間を２秒に変更する。つまり、ＣＰＵ５１は、３Ｄ管理再生時間を、楽曲の再生開始から経過した時間に一致させる。そして、ＣＰＵ５１は、変更後の３Ｄ管理再生時間に対応する６０フレーム目のフレーム画像を表示させる。なお、ＣＰＵ５１は、例えば、３Ｄ管理再生時間と楽曲の再生開始から経過した時間とのずれが所定値以上である場合にのみ、３Ｄ管理再生時間を、楽曲の再生開始から経過した時間に一致させてもよい。 The CPU 51 performs a motion video reproduction process based on the motion data by executing the 3D engine. Specifically, the CPU 51 generates frame images constituting the motion video at a time interval corresponding to a predetermined frame rate and causes the display 67 to display the frame images. The process of generating a frame image includes projective transformation, clipping, hidden surface removal, shading, texture mapping, and the like. The CPU 51 sequentially outputs the frame images to the display control unit 56, so that an exercise image in which the figure 83 performs an exercise operation is displayed on the display 67. The CPU 51 manages the 3D management playback time by storing it in the RAM 53. The 3D management playback time indicates the time that has elapsed since the start of playback of the motion video. The playback start time of the motion video matches, for example, the playback start time of the motion content. The CPU 51 generates a frame image corresponding to the current 3D management playback time. For example, assume that the frame rate is 30 and the current 3D management playback time is 2 seconds. At this time, the CPU 51 generates a frame image of the 60th frame from the top of the frame images constituting the motion video. On the other hand, the normal MIDI event output by the music sequencer is output at a timing corresponding to the delta time described in the MIDI data. Here, there are cases where the output timing of the music and the display timing of the motion video are shifted. Therefore, the CPU 51 synchronizes the display timing of the motion video with the output timing of the music, for example, by the method described below. The CPU 51 counts the number of times that the video synchronization event is output by the music sequencer from the start of the reproduction of the motion video. For example, assume that the frame rate is 30 and the time interval at which the video synchronization event is output is 0.5 seconds. When the current 3D management playback time is 1.9 seconds, the frame image of the 57th frame from the top is displayed. However, assume that the fourth video synchronization event is output at this time. In this case, 2 seconds have elapsed since the start of music playback. Therefore, the output terminal 5 changes the 3D management playback time to 2 seconds. That is, the CPU 51 matches the 3D management playback time with the time elapsed from the start of music playback. Then, the CPU 51 displays the 60th frame image corresponding to the changed 3D management playback time. For example, the CPU 51 matches the 3D management playback time with the time elapsed from the start of music playback only when the difference between the 3D management playback time and the time elapsed since the start of music playback is a predetermined value or more. May be.

ＣＰＵ５１は、サポート音声再生プログラムを実行することにより、サポート音声を再生する。ＣＰＵ５１は、音声再生用拡張イベントに基づいて、サポート音声の再生を開始する。具体的に、ＣＰＵ５１は、サポート音声データから順次データを読み出して、オーディオデバイスドライバへ順次送信する。オーディオデバイスドライバは、音声出力部６３に対するインタフェイスを提供するソフトウェアである。また、ＣＰＵ５１は、運動映像の表示タイミングとサポート音声の出力タイミングとが閾値以上であるとき、サポート音声の出力タイミングを補正する。補正の処理の詳細については、後述する。 The CPU 51 reproduces the support sound by executing the support sound reproduction program. The CPU 51 starts playback of the support voice based on the extended event for voice playback. Specifically, the CPU 51 sequentially reads data from the support voice data and sequentially transmits it to the audio device driver. The audio device driver is software that provides an interface to the audio output unit 63. Further, the CPU 51 corrects the output timing of the support audio when the display timing of the motion video and the output timing of the support audio are equal to or greater than the threshold. Details of the correction process will be described later.

ＣＰＵ５１は、オーディオデバイスドライバを実行することにより、サポート音声再生プログラムから送信されたデータに基づいて、サポート音声の音声信号を音声出力部６３から出力させる。ＣＰＵ５１は、１つのサポート音声データによるサポート音声の再生が開始されてから経過した時間を、ドライバ管理再生位置としてＲＡＭ５３に記憶させて管理している。例えば、ＣＰＵ５１は、音声信号の出力が進行するに従って、ドライバ管理再生位置を更新する。ＣＰＵ５１の処理負荷が高い場合、オーディオデバイスドライバによる音声信号の出力が遅延する場合がある。そのため、サポート音声の出力タイミングが運動映像の表示タイミングとずれる。この場合、ドライバ管理再生位置の進行が遅くなる。 By executing the audio device driver, the CPU 51 causes the audio output unit 63 to output an audio signal of the support audio based on the data transmitted from the support audio reproduction program. The CPU 51 manages the time elapsed since the start of the playback of the support voice using one support voice data as the driver management playback position by storing it in the RAM 53. For example, the CPU 51 updates the driver management reproduction position as the output of the audio signal proceeds. When the processing load on the CPU 51 is high, the output of the audio signal by the audio device driver may be delayed. For this reason, the output timing of the support audio is shifted from the display timing of the motion video. In this case, the progress of the driver management playback position is delayed.

各種プログラムは、例えば、配信サーバ２等のサーバからネットワーク１０を介してダウンロードされるようにしてもよい。また、各種プログラムは、記録媒体５９に記録されてディスクドライブ５８を介して読み込まれるようにしてもよい。なお、ミュージックシーケンサや３Ｄエンジンは、プログラムではなく、専用のハードウェアであってもよい。そして、出力端末５は、ハードウェアとしての３Ｄエンジンやミュージックシーケンサを備えてもよい。 Various programs may be downloaded from the server such as the distribution server 2 via the network 10, for example. Various programs may be recorded on the recording medium 59 and read via the disk drive 58. Note that the music sequencer and the 3D engine may be dedicated hardware instead of a program. The output terminal 5 may include a 3D engine or a music sequencer as hardware.

［４．運動コンテンツ生成システム１の動作］
次に、図４乃至図６を参照して、運動コンテンツ生成システム１の動作を説明する。図４は、出力端末５のＣＰＵ５１の許可レベルテーブル生成処理の処理例を示すフローチャートである。ＣＰＵ５１は、操作者４２の操作に基づいて、レッスン情報を生成する。ＣＰＵ５１は、レッスン情報に含まれる複数のモーション情報のそれぞれに対応するモーションデータ及びＭＩＤＩデータを、ＨＤＤ７から取得する。ＣＰＵ５１は、取得した複数のモーションデータ同士を、運動動作の実行順に従った順序で結合する。これにより、ＣＰＵ５１は、レッスン情報に対応する運動レッスン用のモーションデータを生成する。また、ＣＰＵ５１は、取得した複数のＭＩＤＩデータ同士を、運動動作の実行順に従った順序で結合する。これにより、ＣＰＵ５１は、レッスン情報に対応する運動レッスン用のＭＩＤＩデータを生成する。ＣＰＵ５１は、生成したモーションデータ及びＭＩＤＩデータを、生成したレッスン情報のレッスンＩＤに対応付けてＨＤＤ７に記憶させる。ＣＰＵ５１は、例えば、このときに、許可レベルテーブル生成処理を開始してもよい。 [4. Operation of Exercise Content Generation System 1]
Next, the operation of the exercise content generation system 1 will be described with reference to FIGS. FIG. 4 is a flowchart illustrating a processing example of the permission level table generation processing of the CPU 51 of the output terminal 5. The CPU 51 generates lesson information based on the operation of the operator 42. The CPU 51 acquires motion data and MIDI data corresponding to each of the plurality of motion information included in the lesson information from the HDD 7. CPU51 couple | bonds the acquired some motion data in the order according to the execution order of exercise | movement operation | movement. As a result, the CPU 51 generates motion data for an exercise lesson corresponding to the lesson information. In addition, the CPU 51 combines the acquired plurality of MIDI data in the order according to the execution order of the exercise operation. As a result, the CPU 51 generates MIDI data for exercise lessons corresponding to the lesson information. The CPU 51 stores the generated motion data and MIDI data in the HDD 7 in association with the lesson ID of the generated lesson information. For example, the CPU 51 may start the permission level table generation process at this time.

図４に示すように、ＣＰＵ５１は、生成したＭＩＤＩデータからデルタタイムやイベント等のデータを順次読み出すことにより、ＭＩＤＩデータから音声再生拡張イベントを検索する。次いで、ＣＰＵ５１は、音声再生拡張イベントを読み出したか否かを判定する（ステップＳ１）。このとき、ＣＰＵ５１は、音声再生拡張イベントを読み出したと判定した場合には（ステップＳ１：ＹＥＳ）、ステップＳ２に進む。 As shown in FIG. 4, the CPU 51 sequentially reads data such as delta time and event from the generated MIDI data, thereby searching for an audio reproduction extended event from the MIDI data. Next, the CPU 51 determines whether or not an audio reproduction extended event has been read (step S1). At this time, if the CPU 51 determines that the audio reproduction extended event has been read (step S1: YES), the CPU 51 proceeds to step S2.

ステップＳ２において、ＣＰＵ５１は、読み出した音声再生用拡張イベントに対応するサポート音声データをＨＤＤ７から取得する。そして、ＣＰＵ５１は、取得したサポート音声データから、サポート音声の音声信号を示す波形データＳＧ１を生成する。波形データＳＧ１は、例えば、所定時間間隔ごとの音声レベルのサンプル値を含む。次いで、ＣＰＵ５１は、波形データＳＧ１の各サンプル値の絶対値をとる。これにより、ＣＰＵ５１は、音声信号の絶対値を示す信号の波形データＳＧ２を生成する（ステップＳ３）。次いで、ＣＰＵ５１は、波形データＳＧ２に含まれるサンプル値のうち、所定の閾値未満のサンプル値を０に変更する。これにより、ＣＰＵ５１は、音声レベルが所定値未満の音声を無音とする信号を示す波形データＳＧ３を生成する（ステップＳ４）。閾値は、例えば、−２０ｄｂ〜−２５ｄｂの間で設定されてもよい。次いで、ＣＰＵ５１は、ローパスフィルタにより、波形データＳＧ３が示す信号から、所定のカットオフ周波数以上の信号成分を除去する。これにより、ＣＰＵ５１は、ノイズが除去された信号を示す波形データＳＧ４を生成する（ステップＳ５）。ローパスフィルタは、ハードウェアであってもよいしソフトウェアであってもよい。カットオフ周波数は、例えば、１００〜４００Ｈｚの間で設定されてもよい。波形データＳＧ４は、図３（Ｂ）の信号の波形を示す。次いで、ＣＰＵ５１は、波形データＳＧ４に含まれる各サンプル値の差分を計算する。これにより、ＣＰＵ５１は、波形データＳＧ４が示す信号を微分した信号を示す波形データＳＧ５を生成する（ステップＳ６）。 In step S <b> 2, the CPU 51 acquires support audio data corresponding to the read audio reproduction extended event from the HDD 7. And CPU51 produces | generates the waveform data SG1 which shows the audio | voice signal of a support audio | voice from the acquired support audio | voice data. The waveform data SG1 includes, for example, audio level sample values at predetermined time intervals. Next, the CPU 51 takes the absolute value of each sample value of the waveform data SG1. Thereby, CPU51 produces | generates the waveform data SG2 of the signal which shows the absolute value of an audio | voice signal (step S3). Next, the CPU 51 changes sample values less than a predetermined threshold among sample values included in the waveform data SG2 to zero. Thereby, CPU51 produces | generates the waveform data SG3 which shows the signal which silences the audio | voice whose audio | voice level is less than predetermined value (step S4). The threshold value may be set, for example, between −20 db and −25 db. Next, the CPU 51 removes a signal component having a frequency equal to or higher than a predetermined cutoff frequency from the signal indicated by the waveform data SG3 using a low-pass filter. Thereby, CPU51 produces | generates the waveform data SG4 which shows the signal from which noise was removed (step S5). The low-pass filter may be hardware or software. The cutoff frequency may be set between 100 and 400 Hz, for example. The waveform data SG4 indicates the waveform of the signal in FIG. Next, the CPU 51 calculates the difference between the sample values included in the waveform data SG4. Thereby, CPU51 produces | generates the waveform data SG5 which shows the signal which differentiated the signal which waveform data SG4 shows (step S6).

次いで、ＣＰＵ５１は、波形データＳＧ４に基づいて、ゼロクロスポイントとなる再生位置を示す時間Ｐを抽出する。時間Ｐは、楽曲の再生が開始されてから経過する時間である。ＣＰＵ５１は、波形データＳＧ４において、ゼロクロスポイントが出現する順に、各時間Ｐに番号を割り当てる。ＣＰＵ５１は、抽出した時間ＰをＲＡＭ５３に記録する（ステップＳ７）。次いで、ＣＰＵ５１は、ＭＩＤＩデータからテンポを取得する。次いで、ＣＰＵ５１は、テンポに対応する１拍の長さを計算する。次いで、ＣＰＵ５１は、各時間Ｐについて、次に出現する時間Ｐとの時間間隔を計算する。例えば、図３（Ｃ）の例において、時間Ｐ１について、時間Ｐ２との時間間隔が計算される。ＣＰＵ５１は、時間間隔が１拍の長さの自然数倍である時間Ｐを、時間ＴＰとして記録する（ステップＳ８）。なお、ＣＰＵ５１は、時間Ｐ同士の時間間隔と１拍の長さの自然数倍とに若干の差があっても、時間ＴＰとして記録してもよい。次いで、ＣＰＵ５１は、ＭＩＤＩデータから拍子記号を取得する。ＣＰＵ５１は、拍子記号に基づいて、記録した時間ＴＰのうち、小節の最初の拍に対応する時間を、時間ＴＰ０としてＲＡＭ５３に記録する（ステップＳ９）。例えば、拍子記号が４分の４拍子であり、テンポが１２０であるとする。この場合、再生位置が０〜０．５秒の範囲、２〜２．５秒の範囲等が、小節の最初の拍に対応する時間の範囲である。次いで、ＣＰＵ５１は、記録した時間ＴＰのうち、時間ＴＰ０以外の時間を、時間ＴＰ１としてＲＡＭ５３に記録する（ステップＳ１０）。次いで、ＣＰＵ５１は、時間Ｐのうち、時間ＴＰ以外の時間を、時間ＴＰ２としてＲＡＭ５３に記録する（ステップＳ１１）。次いで、ＣＰＵ５１は、時間ＴＰ２として記録された時間Ｐと、次に出現する時間Ｐとの範囲の中で、音声レベルが０になる時間の範囲を、無音区間として特定する。そして、ＣＰＵ５１は、無音区間が開始する再生位置を、時間ＴＰ３としてＲＡＭ５３に記録する（ステップＳ１２）。また、ＣＰＵ５１は、無音区間が終了する再生位置を、Ｔ２としてＲＡＭ５３に記録する。例えば、図３（Ｃ）の例において、時間Ｐ４とＰ５との範囲の中で、時間Ｔａは、無音区間が開始する再生位置であり、時間Ｔｂは無音区間が終了する再生位置である。 Next, the CPU 51 extracts a time P indicating a reproduction position that becomes a zero cross point based on the waveform data SG4. Time P is the time that has elapsed since the reproduction of the music was started. The CPU 51 assigns a number to each time P in the order in which the zero cross points appear in the waveform data SG4. The CPU 51 records the extracted time P in the RAM 53 (step S7). Next, the CPU 51 acquires the tempo from the MIDI data. Next, the CPU 51 calculates the length of one beat corresponding to the tempo. Next, the CPU 51 calculates the time interval between each time P and the time P that appears next. For example, in the example of FIG. 3C, the time interval between the time P1 and the time P2 is calculated. CPU51 records the time P whose time interval is a natural number multiple of the length of 1 beat as time TP (step S8). The CPU 51 may record the time TP even if there is a slight difference between the time interval between the times P and the natural number multiple of the length of one beat. Next, the CPU 51 acquires a time signature from the MIDI data. Based on the time signature, the CPU 51 records the time corresponding to the first beat of the bar in the recorded time TP as time TP0 in the RAM 53 (step S9). For example, assume that the time signature is 4/4 time and the tempo is 120. In this case, the range where the reproduction position is 0 to 0.5 seconds, the range of 2 to 2.5 seconds, and the like are the time ranges corresponding to the first beat of the measure. Next, the CPU 51 records a time other than the time TP0 in the recorded time TP as the time TP1 in the RAM 53 (step S10). Next, the CPU 51 records a time other than the time TP in the time P as the time TP2 in the RAM 53 (step S11). Next, the CPU 51 specifies a time range in which the sound level is 0 among the time P recorded as the time TP2 and the next time P appearing as a silent section. Then, the CPU 51 records the reproduction position where the silent period starts in the RAM 53 as the time TP3 (step S12). Further, the CPU 51 records the reproduction position at which the silent section ends in the RAM 53 as T2. For example, in the example of FIG. 3C, in the range of time P4 and P5, time Ta is a reproduction position where the silent period starts, and time Tb is a reproduction position where the silent period ends.

次いで、ＣＰＵ５１は、許可レベルテーブルに対する登録を行う（ステップＳ１３）。具体的に、ＣＰＵ５１は、時間ＴＰ０とレベル０とを対応付けて許可レベルテーブルに登録する。また、ＣＰＵ５１は、時間ＴＰ１とレベル１とを対応付けて許可レベルテーブルに登録する。また、ＣＰＵ５１は、時間ＴＰ２とレベル２とを対応付けて許可レベルテーブルに登録する。また、ＣＰＵ５１は、時間ＴＰ３とレベル３とを対応付けて許可レベルテーブルに登録する。次いで、ＣＰＵ５１は、ステップＳ１に進む。 Next, the CPU 51 performs registration with respect to the permission level table (step S13). Specifically, the CPU 51 registers the time TP0 and the level 0 in the permission level table in association with each other. Further, the CPU 51 associates the time TP1 with the level 1 and registers them in the permission level table. Further, the CPU 51 registers the time TP2 and the level 2 in the permission level table in association with each other. Further, the CPU 51 registers the time TP3 and the level 3 in the permission level table in association with each other. Next, the CPU 51 proceeds to step S1.

ステップＳ１において、ＣＰＵ５１は、ＭＩＤＩデータの終端を検出することにより、音声再生拡張イベントを読み出さなかったと判定した場合には（ステップＳ１：ＮＯ）、許可レベルテーブル生成処理を終了させる。ＣＰＵ５１は、生成した許可レベルテーブルを、生成したレッスン情報のレッスンＩＤに対応付けてＨＤＤ７に記憶させる。 In step S1, if the CPU 51 detects the end of the MIDI data and determines that the audio reproduction extended event has not been read (step S1: NO), the CPU 51 ends the permission level table generation process. The CPU 51 stores the generated permission level table in the HDD 7 in association with the lesson ID of the generated lesson information.

図５（Ａ）は、出力端末５のＣＰＵ５１の運動コンテンツ再生処理の処理例を示すフローチャートである。例えば、インストラクターが出力端末５を操作することにより、実行する運動レッスンを選択する。ＣＰＵ５１は、選択された運動レッスンのレッスンＩＤに対応するＭＩＤＩデータ、モーションデータ及び許可レベルテーブルを、ＨＤＤ７から取得する。ＣＰＵ５１は、取得したデータに基づいて、運動コンテンツ再生処理を開始する。ＣＰＵ５１は、運動コンテンツ再生処理を開始するとき、ＣＰＵ５１が有する時計機能から、現在時刻を取得する。そして、ＣＰＵ５１は、取得した時刻を、運動コンテンツ再生開始時刻としてＲＡＭ５３に記憶させる。また、ＣＰＵ５１は、ＭＩＤＩデータからテンポを取得する。運動コンテンツ再生処理において、ＣＰＵ５１は、ＭＩＤＩデータの先頭から、デルタタイムとイベントとの組を順次読み出す。そして、ＣＰＵ５１は、読み出したイベントに応じた処理を実行する。 FIG. 5A is a flowchart illustrating a processing example of the exercise content reproduction process of the CPU 51 of the output terminal 5. For example, the instructor selects the exercise lesson to be executed by operating the output terminal 5. The CPU 51 acquires MIDI data, motion data, and a permission level table corresponding to the lesson ID of the selected exercise lesson from the HDD 7. The CPU 51 starts the exercise content reproduction process based on the acquired data. When starting the exercise content playback process, the CPU 51 acquires the current time from the clock function of the CPU 51. Then, the CPU 51 stores the acquired time in the RAM 53 as the exercise content reproduction start time. Further, the CPU 51 acquires the tempo from the MIDI data. In the exercise content playback process, the CPU 51 sequentially reads a set of delta time and event from the beginning of the MIDI data. And CPU51 performs the process according to the read event.

具体的に、ＣＰＵ５１は、図５（Ａ）に示すように、ＭＩＤＩデータからデルタタイム及びイベントの組を１つ読み出す処理を実行する（ステップＳ２１）。次いで、ＣＰＵ５１は、ステップＳ２１の処理においてＭＩＤＩデータの終端が検出されたか否かを判定する（ステップＳ２２）。このとき、ＣＰＵ５１は、ＭＩＤＩデータの終端が検出されていないと判定した場合には（ステップＳ２２：ＮＯ）、読み出したデルタタイムと、指定されたテンポとに応じた時間スリープする。そして、ＣＰＵ５１は、ステップＳ２３に進む。 Specifically, as shown in FIG. 5A, the CPU 51 executes a process of reading one set of delta time and event from the MIDI data (step S21). Next, the CPU 51 determines whether or not the end of the MIDI data is detected in the process of step S21 (step S22). At this time, if the CPU 51 determines that the end of the MIDI data is not detected (step S22: NO), the CPU 51 sleeps for a time corresponding to the read delta time and the specified tempo. Then, the CPU 51 proceeds to step S23.

ステップＳ２３において、ＣＰＵ５１は、読み出したイベントが通常ＭＩＤＩイベントであるか否かを判定する。このとき、ＣＰＵ５１は、イベントが通常ＭＩＤＩイベントであると判定した場合には（ステップＳ２３：ＹＥＳ）、ステップＳ２４に進む。一方、ＣＰＵ５１は、イベントが通常ＭＩＤＩイベントではないと判定した場合には（ステップＳ２３：ＮＯ）、ステップＳ２５に進む。ステップＳ２４において、ＣＰＵ５１は、読み出した通常ＭＩＤＩイベントを、ソフトウェアシンセサイザーへ出力する。次いで、ＣＰＵ５１は、ステップＳ２１に進む。 In step S23, the CPU 51 determines whether or not the read event is a normal MIDI event. At this time, if the CPU 51 determines that the event is a normal MIDI event (step S23: YES), the CPU 51 proceeds to step S24. On the other hand, when the CPU 51 determines that the event is not a normal MIDI event (step S23: NO), the CPU 51 proceeds to step S25. In step S24, the CPU 51 outputs the read normal MIDI event to the software synthesizer. Next, the CPU 51 proceeds to step S21.

ステップＳ２５において、ＣＰＵ５１は、読み出したイベントが３Ｄ制御用拡張イベントであるか否かを判定する。このとき、ＣＰＵ５１は、イベントが３Ｄ制御用拡張イベントであると判定した場合には（ステップＳ２５：ＹＥＳ）、ステップＳ２６に進む。一方、ＣＰＵ５１は、イベントが３Ｄ制御用拡張イベントではないと判定した場合には（ステップＳ２５：ＮＯ）、ステップＳ２７に進む。イベントが通常ＭＩＤＩイベントでも３Ｄ制御用拡張イベントでもない場合、そのイベントは音声再生用拡張イベントである。ステップＳ２６において、ＣＰＵ５１は、読み出した３Ｄ制御用拡張イベントを、３Ｄエンジンへ出力する。次いで、ＣＰＵ５１は、ステップＳ２１に進む。 In step S25, the CPU 51 determines whether or not the read event is a 3D control extended event. At this time, if the CPU 51 determines that the event is an extended event for 3D control (step S25: YES), the CPU 51 proceeds to step S26. On the other hand, if the CPU 51 determines that the event is not an extended event for 3D control (step S25: NO), the CPU 51 proceeds to step S27. If the event is neither a normal MIDI event nor a 3D control extended event, the event is an audio playback extended event. In step S <b> 26, the CPU 51 outputs the read 3D control extended event to the 3D engine. Next, the CPU 51 proceeds to step S21.

ステップＳ２７において、ＣＰＵ５１は、読み出した音声再生用拡張イベントをサポート音声再生プログラムへ出力する。次いで、ＣＰＵ５１は、ステップＳ２１に進む。ステップＳ２２において、ＣＰＵ５１は、ＭＩＤＩデータの終端が検出されたと判定した場合には（ステップＳ２２：ＹＥＳ）、運動コンテンツ再生処理を終了させる。 In step S27, the CPU 51 outputs the read audio reproduction extended event to the support audio reproduction program. Next, the CPU 51 proceeds to step S21. In step S22, if the CPU 51 determines that the end of the MIDI data has been detected (step S22: YES), it ends the exercise content playback process.

図６は、出力端末５のＣＰＵ５１のサポート音声再生処理の処理例を示すフローチャートである。運動コンテンツ再生処理において、サポート音声の再生を開始させる音声再生用拡張イベントが出力される。すると、出力端末５は、サポート音声再生処理を開始する。ＣＰＵ５１は、音声再生用拡張イベントに対応するサポート音声データからのデータを読み出し位置を初期化する。データの読み出し位置は、サポート音声データにおいて、各データが格納されているアドレスに対応する。ＣＰＵ５１は、データの読み出し位置を、サポート音声データの先頭に設定する。 FIG. 6 is a flowchart showing a processing example of support voice reproduction processing of the CPU 51 of the output terminal 5. In the exercise content playback process, an audio playback extended event for starting playback of the support audio is output. Then, the output terminal 5 starts support voice reproduction processing. The CPU 51 initializes the position for reading data from the support audio data corresponding to the extended event for audio reproduction. The data read position corresponds to the address where each data is stored in the support voice data. The CPU 51 sets the data read position to the head of the support audio data.

次いで、ＣＰＵ５１は、許可レベルテーブルに、レベル０及びレベル１の少なくとも何れかが登録されているか否かを判定する（ステップＳ３１）。このとき、ＣＰＵ５１は、レベル０及びレベル１の少なくとも何れかが登録されていると判定した場合には（ステップＳ３１：ＹＥＳ）、ステップＳ３２に進む。一方、ＣＰＵ５１は、レベル０及びレベル１の何れも登録されていないと判定した場合には（ステップＳ３１：ＮＯ）、ステップＳ３３に進む。 Next, the CPU 51 determines whether or not at least one of level 0 and level 1 is registered in the permission level table (step S31). At this time, if the CPU 51 determines that at least one of level 0 and level 1 is registered (step S31: YES), the CPU 51 proceeds to step S32. On the other hand, if the CPU 51 determines that neither level 0 nor level 1 is registered (step S31: NO), the process proceeds to step S33.

ステップＳ３２において、ＣＰＵ５１は、補正フラグをＴＲＵＥに設定する。次いで、ＣＰＵ５１は、ステップＳ３４に進む。ステップＳ３３において、ＣＰＵ５１は、補正フラグをＦＡＬＳＥに設定する。次いで、ＣＰＵ５１は、ステップＳ３４に進む。補正フラグは、サポート音声を補正するか否かを示す情報である。補正フラグがＴＲＵＥである場合、ＣＰＵ５１は、サポート音声を補正する。補正フラグがＦＡＬＳＥである場合、ＣＰＵ５１は、サポート音声を補正しない。 In step S32, the CPU 51 sets the correction flag to TRUE. Next, the CPU 51 proceeds to step S34. In step S33, the CPU 51 sets the correction flag to FALSE. Next, the CPU 51 proceeds to step S34. The correction flag is information indicating whether or not the support voice is corrected. When the correction flag is TRUE, the CPU 51 corrects the support voice. When the correction flag is FALSE, the CPU 51 does not correct the support voice.

ステップＳ３４において、ＣＰＵ５１は、運動コンテンツの再生開始から経過した時間を、サポート音声の再生開始時間ｔｂとして設定する。例えば、ＣＰＵ５１は、時計機能から現在時刻を取得する。そして、ＣＰＵ５１は、現在時刻から運動コンテンツ再生開始を引いて、時間ｔｂを計算する。次いで、ＣＰＵ５１は、３Ｄエンジンから３Ｄ管理再生時間ｔ１を取得する（ステップＳ３５）。次いで、ＣＰＵ５１は、オーディオデバイスドライバから、ドライバ管理再生位置を示す時間ｔ２を取得する（ステップＳ３６）。次いで、ＣＰＵ５１は、許可レベルテーブルから、時間ｔｂ＋ｔ２に対応する許可レベルを取得する（ステップＳ３７）。時間ｔｂ＋ｔ２は、運動コンテンツの再生が開始されてから現在までに経過した時間である。次いで、ＣＰＵ５１は、時間ｔ１から時間ｔｂ＋ｔ２を引くことにより、差分ｄｉｆｆを計算する（ステップＳ３８）。差分ｄｉｆｆは、運動映像の表示タイミングとサポート音声の出力タイミングとのずれを示す。次いで、ＣＰＵ５１は、差分ｄｉｆｆの絶対値ａｄを計算する（ステップＳ３９）。次いで、ＣＰＵ５１は、補正フラグがＴＲＵＥであるか否かを判定する（ステップＳ４０）。このとき、ＣＰＵ５１は、補正フラグがＴＲＵＥであると判定した場合には（ステップＳ４０：ＹＥＳ）、ステップＳ４１に進む。一方、ＣＰＵ５１は、補正フラグがＴＲＵＥではないと判定した場合には（ステップＳ４０：ＮＯ）、ステップＳ４８に進む。 In step S <b> 34, the CPU 51 sets the time elapsed from the start of playback of the exercise content as the support audio playback start time tb. For example, the CPU 51 acquires the current time from the clock function. Then, the CPU 51 calculates time tb by subtracting the start of exercise content reproduction from the current time. Next, the CPU 51 acquires the 3D management playback time t1 from the 3D engine (step S35). Next, the CPU 51 acquires a time t2 indicating a driver management reproduction position from the audio device driver (step S36). Next, the CPU 51 acquires a permission level corresponding to the time tb + t2 from the permission level table (step S37). The time tb + t2 is the time that has elapsed since the start of the reproduction of the exercise content. Next, the CPU 51 calculates the difference diff by subtracting the time tb + t2 from the time t1 (step S38). The difference diff indicates a difference between the display timing of the motion video and the output timing of the support sound. Next, the CPU 51 calculates the absolute value ad of the difference diff (step S39). Next, the CPU 51 determines whether or not the correction flag is TRUE (step S40). At this time, if the CPU 51 determines that the correction flag is TRUE (step S40: YES), the CPU 51 proceeds to step S41. On the other hand, if the CPU 51 determines that the correction flag is not TRUE (step S40: NO), the CPU 51 proceeds to step S48.

ステップＳ４１において、ＣＰＵ５１は、絶対値ａｄが閾値ｔｈ３よりも大きいか否かを判定する。このとき、ＣＰＵ５１は、絶対値ａｄが閾値ｔｈ３以下であると判定した場合には（ステップＳ４１：ＮＯ）、ステップＳ４８に進む。一方、ＣＰＵ５１は、絶対値ａｄが閾値ｔｈ３よりも大きいと判定した場合には（ステップＳ４１：ＹＥＳ）、ステップＳ４２に進む。ステップＳ４２において、ＣＰＵ５１は、取得した許可レベルが０であるか否かを判定する。このとき、ＣＰＵ５１は、許可レベルが０であると判定した場合には（ステップＳ４２：ＹＥＳ）、ステップＳ４８に進む。一方、ＣＰＵ５１は、許可レベルが０ではないと判定した場合には（ステップＳ４２：ＮＯ）、ステップＳ４３に進む。ステップＳ４３において、ＣＰＵ５１は、取得した許可レベルが１であるか否かを判定する。このとき、ＣＰＵ５１は、許可レベルが１ではないと判定した場合には（ステップＳ４３：ＮＯ）、ステップＳ４５に進む。一方、ＣＰＵ５１は、許可レベルが１であると判定した場合には（ステップＳ４３：ＹＥＳ）、ステップＳ４４に進む。ステップＳ４４において、ＣＰＵ５１は、絶対値ａｄが閾値ｔｈ１よりも大きいか否かを判定する。このとき、ＣＰＵ５１は、絶対値ａｄが閾値ｔｈ１以下であると判定した場合には（ステップＳ４４：ＮＯ）、ステップＳ４８に進む。一方、ＣＰＵ５１は、絶対値ａｄが閾値ｔｈ１よりも大きいと判定した場合には（ステップＳ４４：ＹＥＳ）、ステップＳ４７に進む。 In step S41, the CPU 51 determines whether or not the absolute value ad is larger than the threshold value th3. At this time, when the CPU 51 determines that the absolute value ad is equal to or less than the threshold th3 (step S41: NO), the CPU 51 proceeds to step S48. On the other hand, when the CPU 51 determines that the absolute value ad is larger than the threshold th3 (step S41: YES), the CPU 51 proceeds to step S42. In step S42, the CPU 51 determines whether or not the acquired permission level is zero. At this time, if the CPU 51 determines that the permission level is 0 (step S42: YES), the CPU 51 proceeds to step S48. On the other hand, when the CPU 51 determines that the permission level is not 0 (step S42: NO), the CPU 51 proceeds to step S43. In step S43, the CPU 51 determines whether or not the acquired permission level is 1. At this time, if the CPU 51 determines that the permission level is not 1 (step S43: NO), the process proceeds to step S45. On the other hand, when the CPU 51 determines that the permission level is 1 (step S43: YES), the CPU 51 proceeds to step S44. In step S44, the CPU 51 determines whether or not the absolute value ad is larger than the threshold value th1. At this time, if the CPU 51 determines that the absolute value ad is equal to or less than the threshold th1 (step S44: NO), the CPU 51 proceeds to step S48. On the other hand, when the CPU 51 determines that the absolute value ad is larger than the threshold th1 (step S44: YES), the CPU 51 proceeds to step S47.

ステップＳ４５において、ＣＰＵ５１は、取得した許可レベルが２であるか否かを判定する。このとき、ＣＰＵ５１は、許可レベルが２ではないと判定した場合には（ステップＳ４５：ＮＯ）、ステップＳ４７に進む。一方、ＣＰＵ５１は、許可レベルが２であると判定した場合には（ステップＳ４５：ＹＥＳ）、ステップＳ４６に進む。ステップＳ４６において、ＣＰＵ５１は、絶対値ａｄが閾値ｔｈ２よりも大きいか否かを判定する。このとき、ＣＰＵ５１は、絶対値ａｄが閾値ｔｈ２以下であると判定した場合には（ステップＳ４６：ＮＯ）、ステップＳ４８に進む。一方、ＣＰＵ５１は、絶対値ａｄが閾値ｔｈ２よりも大きいと判定した場合には（ステップＳ４６：ＹＥＳ）、ステップＳ４７に進む。 In step S45, the CPU 51 determines whether or not the acquired permission level is 2. At this time, if the CPU 51 determines that the permission level is not 2 (step S45: NO), the process proceeds to step S47. On the other hand, when the CPU 51 determines that the permission level is 2 (step S45: YES), the CPU 51 proceeds to step S46. In step S46, the CPU 51 determines whether or not the absolute value ad is larger than the threshold value th2. At this time, when the CPU 51 determines that the absolute value ad is equal to or less than the threshold th2 (step S46: NO), the CPU 51 proceeds to step S48. On the other hand, when the CPU 51 determines that the absolute value ad is larger than the threshold th2 (step S46: YES), the CPU 51 proceeds to step S47.

ステップＳ４７において、ＣＰＵ５１は、同期処理を実行する。図５（Ｂ）は、出力端末５のＣＰＵ５１の同期処理の処理例を示すフローチャートである。図５（Ｂ）に示すように、ＣＰＵ５１は、差分ｄｉｆｆが０より大きいか否かを判定する（ステップＳ６１）。このとき、ＣＰＵ５１は、差分ｄｉｆｆが０より大きいと判定した場合には（ステップＳ６１：ＹＥＳ）、ステップＳ６２に進む。一方、ＣＰＵ５１は、差分ｄｉｆｆが０以下であると判定した場合には（ステップＳ６１：ＮＯ）、ステップＳ６３に進む。ステップＳ６２において、ＣＰＵ５１は、サポート音声データからのデータの読み出し位置を、絶対値ａｄが示す時間に相当するバイト数分、サポート音声データの終端の方向へ移動させる。ステップＳ６３において、ＣＰＵ５１は、絶対値ａｄが示す時間分、スリープする。ＣＰＵ５１は、ステップＳ６２又はＳ６３の処理を終えると、同期処理を終了させて、ステップＳ４８に進む。 In step S47, the CPU 51 executes a synchronization process. FIG. 5B is a flowchart illustrating a processing example of the synchronization processing of the CPU 51 of the output terminal 5. As shown in FIG. 5B, the CPU 51 determines whether or not the difference diff is larger than 0 (step S61). At this time, if the CPU 51 determines that the difference diff is greater than 0 (step S61: YES), the CPU 51 proceeds to step S62. On the other hand, when the CPU 51 determines that the difference diff is 0 or less (step S61: NO), the CPU 51 proceeds to step S63. In step S62, the CPU 51 moves the read position of the data from the support voice data toward the end of the support voice data by the number of bytes corresponding to the time indicated by the absolute value ad. In step S63, the CPU 51 sleeps for the time indicated by the absolute value ad. When finishing the process of step S62 or S63, the CPU 51 ends the synchronization process and proceeds to step S48.

図６に示すように、ステップＳ４８において、ＣＰＵ５１は、サポート音声データから、ＲＡＭ５３に確保されたバッファのデータサイズ分のデータを読み出して、バッファに格納する。このとき、ＣＰＵ５１は、現在のデータの読み出し位置からバッファのデータサイズ分のデータを読み出す。次いで、ＣＰＵ５１は、サポート音声データからデータを読み出すことができたか否かを判定する（ステップＳ４９）。このとき、ＣＰＵ５１は、データを読み出すことができたと判定した場合には（ステップＳ４９：ＹＥＳ）、ステップＳ５１に進む。ステップＳ５１において、ＣＰＵ５１は、データの読み出し位置を、バッファのデータサイズ分、サポート音声データの終端の方向へ移動させる。そして、ＣＰＵ５１は、バッファに格納したデータを、オーディオデバイスドライバへ送信する。次いで、ＣＰＵ５１は、ステップＳ３５に進む。一方、ＣＰＵ５１は、データを読み出すことがでなかったと判定した場合には（ステップＳ４９：ＮＯ）、サポート音声再生処理を終了させる。この場合、データの読み出し位置が、サポート音声データの終端に達している。 As shown in FIG. 6, in step S <b> 48, the CPU 51 reads data corresponding to the data size of the buffer secured in the RAM 53 from the support audio data, and stores it in the buffer. At this time, the CPU 51 reads data corresponding to the data size of the buffer from the current data reading position. Next, the CPU 51 determines whether or not data can be read from the support voice data (step S49). At this time, if the CPU 51 determines that the data could be read (step S49: YES), the CPU 51 proceeds to step S51. In step S51, the CPU 51 moves the data read position by the data size of the buffer toward the end of the support audio data. Then, the CPU 51 transmits the data stored in the buffer to the audio device driver. Next, the CPU 51 proceeds to step S35. On the other hand, if the CPU 51 determines that the data could not be read (step S49: NO), the support voice reproduction process is terminated. In this case, the data read position has reached the end of the support voice data.

以上説明したように、本実施形態によれば、ＣＰＵ５１は、運動映像の表示タイミングとサポート音声の出力タイミングとに閾値を超えるずれがあるかを判定する。ＣＰＵ５１は、閾値を超えるずれがあると判定したとき、サポート音声においてスピーカ６４により出力されている音声部分が、拍に合わせて出力される音声部分であるかを判定する。ＣＰＵ５１は、拍に合わせて出力される音声部分ではないと判定されたとき、サポート音声の出力タイミングを補正する。そのため、サポート音声が出力されるリズムが乱れないように、サポート音声の出力タイミングを補正することができる。 As described above, according to the present embodiment, the CPU 51 determines whether there is a deviation exceeding the threshold value between the display timing of the motion video and the output timing of the support audio. When the CPU 51 determines that there is a deviation exceeding the threshold, the CPU 51 determines whether the audio part output from the speaker 64 in the support audio is an audio part output in accordance with the beat. The CPU 51 corrects the output timing of the support voice when it is determined that the voice part is not output in time with the beat. Therefore, it is possible to correct the output timing of the support voice so that the rhythm at which the support voice is output is not disturbed.

１運動コンテンツ生成システム
２配信サーバ
３データベース
５出力端末
７ＨＤＤ
５１ＣＰＵ
５２ＲＯＭ
５３ＲＡＭ
５６表示制御部
６０ネットワーク通信部
６３音声出力部
５７モニタ
６４スピーカ
６７ディスプレイ 1 Exercise content generation system 2 Distribution server 3 Database 5 Output terminal 7 HDD
51 CPU
52 ROM
53 RAM
56 Display Control Unit 60 Network Communication Unit 63 Audio Output Unit 57 Monitor 64 Speaker 67 Display

Claims

Storage means for storing musical score information indicating a musical score of music, operational information indicating an exercise motion, and audio information indicating a voice that supports a user performing the exercise motion;
Video information indicating a video on which the exercise operation is performed based on a synchronization signal generated according to a specified tempo and the musical score information stored in the storage means, and the operation information stored in the storage means First control means for displaying on the display means;
Second control means for causing the output means to output music information indicating the music based on the synchronization signal and the score information stored in the storage means;
Third control means for causing the output means to output the audio information stored in the storage means based on the synchronization signal;
First determination means for determining whether there is a deviation exceeding a predetermined time between the timing at which the video information is displayed on the display means and the timing at which the audio information is output by the output means;
When it is determined by the first determination means that there is the deviation, a second determination is made as to whether or not the output portion of the audio information output by the output means is an audio portion output in time with a beat. A determination means;
Correction means for correcting the output timing of the audio information when the second determination means determines that the output portion is not an audio portion output in time with a beat;
An information processing apparatus comprising:

And further comprising third determination means for determining whether the sound information stored in the storage means includes a sound portion output in time with a beat,
The correction means does not correct the output timing of the audio information when the third determination means determines that the audio information does not include an audio portion that is output in time with a beat,
Only when it is determined by the third determination means that the audio information includes an audio portion that is output in time with the beat, the second determination means outputs the output of the audio information that is output by the output means. The information processing apparatus according to claim 1, wherein the information processing unit determines whether the part is an audio part output in time with a beat.

When the second determining means determines that the output part is an audio part that is output in time with a beat, the timing at which the video information is displayed on the display means, and the audio information is output by the output means And a fourth determination means for determining whether there is a deviation exceeding a second predetermined time longer than the predetermined time.
3. The information processing apparatus according to claim 1, wherein the correction unit corrects the output timing of the audio information when the fourth determination unit determines that there is a shift. 4.

It is determined whether the output portion determined to be an audio portion output in time with a beat by the second determination means is an audio portion output during the first beat of any measure of the score. And a fifth determination means for
The correcting means determines when the video information is displayed on the display means when the fifth determining means determines that the output portion is an audio portion output during the first beat of the measure. the on and when audio information is output by the output unit, regardless of whether there is a deviation exceeding the second predetermined time, according to claim 3, characterized in that does not correct the output timing of the audio information The information processing apparatus described in 1.

The storage means stores the audio information indicating a volume along a time series of the supported audio,
First specifying means for specifying a temporal output position at which the volume is maximized in the time series indicated by the audio information stored in the storage means;
Second specifying means for specifying a plurality of the output positions at which the volume is maximized at a time interval corresponding to the specified tempo among the output positions specified by the first specifying means;
Second storage means for storing range information indicating temporal ranges corresponding to the plurality of output positions specified by the second specifying means;
Obtaining means for obtaining a temporal output position of the output portion being output by the output means;
Further comprising
The second determination means is a sound in which the output portion is output in time with a beat when the output position acquired by the acquisition means is within a range indicated by the range information stored in the second storage means. The information processing apparatus according to claim 1, wherein the information processing apparatus is determined to be a part.

The storage means stores the audio information indicating a volume along a time series of the supported audio,
The correction means preferentially corrects the output timing of the audio information when an audio portion whose volume is less than a predetermined value is output from audio portions different from the audio portion output in time with the beat. The information processing apparatus according to claim 1, wherein the information processing apparatus is an information processing apparatus.

It is generated according to the specified tempo and the musical score information stored in the storage means for storing the musical score information indicating the musical score of the music, the operational information indicating the motion and the voice information indicating the voice supporting the user performing the motion. A first control step for causing the display means to display video information indicating a video on which the exercise motion is performed based on the synchronization signal and the motion information stored in the storage means;
A second control step of causing the output means to output music information indicating the music based on the synchronization signal and the score information stored in the storage means;
A third control step for causing the output means to output the audio information stored in the storage means based on the synchronization signal;
A first determination step of determining whether there is a deviation exceeding a predetermined time between a timing at which the video information is displayed on the display means and a timing at which the audio information is output by the output means;
When it is determined in the first determination step that there is a deviation, a second determination is made as to whether or not the output part output by the output means in the audio information is an audio part output in time with a beat. A determination step;
A correction step of correcting the output timing of the audio information when the second determination step determines that the output portion is not an audio portion output in time with a beat;
A program that causes a computer to execute.