JP2011003991A

JP2011003991A - Information processor, operating method of the same, program

Info

Publication number: JP2011003991A
Application number: JP2009143529A
Authority: JP
Inventors: Kazue Kaneko; 和恵金子
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2009-06-16
Filing date: 2009-06-16
Publication date: 2011-01-06

Abstract

PROBLEM TO BE SOLVED: To solve a problem that an output time of the whole slide show becomes too long when a reproduction time of a voice is extremely long compared with a display time of an image, in a conventional method where the display time is changed for each of a plurality of images according to an output time of a voice associated with each of the plurality of images.SOLUTION: This information processor includes a display means, an adjustment means, and an output control means. The display means displays each of a plurality of images for a fixed time according to a predetermined order, and can display a plurality of images simultaneously. The adjustment means adjusts a start time and a finish time of an output of a sound associated with each image within a range which does not exceed a time when each image is displayed not to output the sound associated with each of the plurality of images simultaneously. The output control means outputs the sound based on the result of adjustment by the adjustment means.

Description

本発明は、複数の画像と、それぞれの画像に対応付けられた音声とをダイジェスト再生する技術に関する。 The present invention relates to a technique for digest reproduction of a plurality of images and sound associated with each image.

静止画像を一枚ごとに順次提示する機能としてスライドショーと呼ばれるものがある。従来は、講演者が説明をするためなどに用い、手動で写真や図を順送りしながらその内容を説明するという使われ方をしていた。 There is a so-called slide show as a function for sequentially presenting still images one by one. In the past, it was used to give explanations by lecturers, and was used to explain the contents of pictures and diagrams manually.

一方、現在ではデジタルカメラの普及により、写真にＢＧＭなどの音楽をつけて提示して行くソフトウエェアが現れている。また、デジタルカメラには音声メモ機能を持つものあり、写真に声でメモをつけることができる。音声データを持つ写真を一定時間ごとに写真を切り替えるスライドショーで提示する場合、写真の表示時間よりも音声データが長い場合には音声が途中で切れてしまうという課題がある。そして、途中で音が切れてしまうことを防ぐため、表示時間よりも音声データが長い場合には表示時間を延長する技術が提案されている（例えば、特許文献１）。 On the other hand, with the widespread use of digital cameras, software that presents music with background music such as BGM has appeared. Also, some digital cameras have a voice memo function, which allows you to make notes with a voice. When presenting a photo having audio data in a slide show in which the photo is switched every certain time, there is a problem that the audio is cut off in the middle if the audio data is longer than the display time of the photo. And in order to prevent that a sound cuts out on the way, when audio data is longer than display time, the technique of extending display time is proposed (for example, patent documents 1).

特開平１０−１４５７３０号公報JP-A-10-145730

特許文献１のように、音声データの長さに合わせて表示時間を変更する方法では、スライドごとに表示時間が異なる場合があり、画像の表示時間に対して音声の再生時間が非常に長い場合には、スライドショー全体の再生時間が長くなる。 In the method of changing the display time according to the length of the audio data as in Patent Document 1, the display time may be different for each slide, and the audio reproduction time is very long with respect to the image display time The playback time of the entire slide show becomes longer.

そこで、本発明は、視聴者が非常に長い再生時間を要することなく、画像データ及び音声データの概要を把握することが可能な技術を提供することを目的とする。 Accordingly, an object of the present invention is to provide a technique that enables a viewer to grasp an outline of image data and audio data without requiring a very long reproduction time.

上記課題を解決するため、本発明の情報処理装置は、所定の順序に従って表示手段に複数の画像のそれぞれを一定時間表示させると共に、２以上の画像を同時に表示することが可能な情報処理装置であって、複数の画像のそれぞれに対応付けられた音が同時に出力されないように、各画像が表示される時間を超えない範囲で、各画像に対応付けられた音の出力の開始および終了時刻を調整する調整手段と、前記調整手段による調整の結果に基づいて音を出力させる出力制御手段とを備える。 In order to solve the above problems, an information processing apparatus of the present invention is an information processing apparatus capable of displaying each of a plurality of images on a display unit for a predetermined time according to a predetermined order and simultaneously displaying two or more images. In order to prevent the sounds associated with each of the plurality of images from being output at the same time, the start and end times of the sound associated with each image are set within a range not exceeding the time for which each image is displayed. Adjustment means for adjusting, and output control means for outputting sound based on the result of adjustment by the adjustment means.

本発明によれば、視聴者が非常に長い再生時間を要することなく、画像データ及び音声データの概要を把握することが可能となる。 According to the present invention, it is possible for a viewer to grasp an outline of image data and audio data without requiring a very long reproduction time.

情報処理装置を示す図である。It is a figure which shows information processing apparatus. 初期配置の処理を示すフローチャートである。It is a flowchart which shows the process of initial arrangement | positioning. 位置調整の処理を示すフローチャートである。It is a flowchart which shows the process of position adjustment. スライドショーの映像の一例を示す図である。It is a figure which shows an example of the image | video of a slide show. 配置や移動の処理についての記号を示す図である。It is a figure which shows the symbol about the process of arrangement | positioning and a movement. 初期配置の方法および移動の判断基準についての説明図である。It is explanatory drawing about the method of initial arrangement | positioning, and the judgment criterion of a movement. タイムチャートの一例を示す図である。It is a figure which shows an example of a time chart. スライドショーの再生例を示す図である。It is a figure which shows the example of reproduction | regeneration of a slide show. スライドショーの再生例を示す図である。It is a figure which shows the example of reproduction | regeneration of a slide show. 初期配置の方法および移動の判断基準についての説明図である。It is explanatory drawing about the method of initial arrangement | positioning, and the judgment criterion of a movement. 音声区間検出を行なう場合のタイムチャートの一例を示す図である。It is a figure which shows an example of the time chart in the case of performing audio | voice area detection. 音声区間検出を行なう場合の切断方法の一例を示す図である。It is a figure which shows an example of the cutting | disconnection method in the case of performing audio | voice area detection. 配置や移動についての記号を示す図である。It is a figure which shows the symbol about arrangement | positioning and a movement. 初期配置の方法および移動の判断基準についての説明図である。It is explanatory drawing about the method of initial arrangement | positioning, and the judgment criterion of a movement. 移動量計算のフローチャートである。It is a flowchart of movement amount calculation. タイムチャートの一例を示す図である。It is a figure which shows an example of a time chart. 動画像の再生枠設定を示す図である。It is a figure which shows the reproduction frame setting of a moving image. クライマックス検出を行なう場合の動画像の再生枠設定例を示す図である。It is a figure which shows the example of a reproduction frame setting of the moving image in the case of performing climax detection. 注目時間を設けない場合の配置や移動についての記号を示す図である。It is a figure which shows the symbol about arrangement | positioning and a movement when not providing attention time. 初期配置の方法および移動の判断基準についての説明図である。It is explanatory drawing about the method of initial arrangement | positioning, and the judgment criterion of a movement. 各映像表示時間が異なる場合のタイムチャート作成例を示す図である。It is a figure which shows the example of time chart preparation in case each video display time differs. 映像再生中に再生時間を変更する場合のタイムチャート作成例を示す図である。It is a figure which shows the example of a time chart preparation in the case of changing reproduction time during video reproduction.

以下、図面を参照しながら本発明の好適な実施例について説明していく。 Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.

（実施例１）
図１（ａ）は、実施例１に係る情報処理装置の機能ブロック図である。 Example 1
FIG. 1A is a functional block diagram of the information processing apparatus according to the first embodiment.

上記情報処理装置は、画像記憶部１０１、画像選択部１０２、再生時間調整部１０３、音声編集部１０４、画像再生部１０５、音声出力部１０６、画像表示部１０７を備える。 The information processing apparatus includes an image storage unit 101, an image selection unit 102, a reproduction time adjustment unit 103, an audio editing unit 104, an image reproduction unit 105, an audio output unit 106, and an image display unit 107.

尚、上記の各部を１つの機器内に備えるものとして、例えばデジタルカメラが想定される。 Note that, for example, a digital camera is assumed to include the above-described units in one device.

上記情報処理装置がデジタルカメラである場合、上記情報処理装置は撮像部１０８も備えており、この撮像部１０８により撮像することで取得した画像を画像表示部１０７に表示する機能を有する。 When the information processing apparatus is a digital camera, the information processing apparatus also includes an imaging unit 108, and has a function of displaying an image acquired by imaging by the imaging unit 108 on the image display unit 107.

また、出力インターフェース１０９を備えてもよく、このとき、当該出力インターフェース１０９を介して、上記画像を外部ディスプレイ１１０に表示する機能を有する。 Further, an output interface 109 may be provided, and at this time, a function of displaying the image on the external display 110 via the output interface 109 is provided.

以下、本実施例の情報処理装置が備える各要素について説明する。 Hereinafter, each element with which the information processing apparatus of a present Example is provided is demonstrated.

画像記憶部１０１は、静止画像や動画像などの画像を記憶する。画像選択部１０２は、画像表示部１０７に表示すべき画像を選択する。再生時間調整部１０３は、画像を再生するタイミングを調整する。 The image storage unit 101 stores images such as still images and moving images. The image selection unit 102 selects an image to be displayed on the image display unit 107. The reproduction time adjustment unit 103 adjusts the timing for reproducing an image.

音声編集部１０４は、必要に応じて、画像に対応付けられた音声データを編集する。画像再生部１０５は、再生時間調整部１０３によって調整されたタイミングに従って、画像を再生（出力）する。 The sound editing unit 104 edits sound data associated with an image as necessary. The image reproduction unit 105 reproduces (outputs) the image according to the timing adjusted by the reproduction time adjustment unit 103.

音声出力部１０６は、画像に対応付けられた音声を出力する。画像表示部１０７は、画像再生部１０５から出力された画像を表示する。 The audio output unit 106 outputs audio associated with the image. The image display unit 107 displays the image output from the image reproduction unit 105.

図１（ｂ）は、本実施例の情報処理装置のハードウエア構成図である。
同図において、上記情報処理装置は、は中央演算装置（ＣＰＵ）２８０１、ランダムアクセスメモリ（ＲＡＭ）２８０２、リードオンリーメモリ（ＲＯＭ）２８０３を有する。
また、上記情報処理装置は、ハードディスクドライブ（ＨＤＤ）２８０４、ディスプレイ２８０５、ボタン２８０６、スピーカー２８０７を有する。 FIG. 1B is a hardware configuration diagram of the information processing apparatus according to this embodiment.
In the figure, the information processing apparatus includes a central processing unit (CPU) 2801, a random access memory (RAM) 2802, and a read only memory (ROM) 2803.
The information processing apparatus includes a hard disk drive (HDD) 2804, a display 2805, a button 2806, and a speaker 2807.

以下、本実施例の情報処理装置が備える各構成部について説明する。 Hereinafter, each component with which the information processing apparatus of a present Example is provided is demonstrated.

ＣＰＵ２８０１は、情報処理装置の機能をはじめ当該装置全体の制御を司る。ＲＡＭ２８０２は、主記憶装置として機能する。ＲＯＭ２８０３は、プログラムや固定的なデータを記憶する。ＨＤＤ２８０４は、プログラムや画像を格納する。
尚、映像再生装置の機能を実現するプログラムは、ＲＯＭ２８０３に保持してもＨＤＤ２８０４に保持しても構わない。 The CPU 2801 controls the entire apparatus including the functions of the information processing apparatus. The RAM 2802 functions as a main storage device. The ROM 2803 stores programs and fixed data. The HDD 2804 stores programs and images.
Note that the program for realizing the function of the video playback device may be stored in the ROM 2803 or the HDD 2804.

ディスプレイ２８０５は、画像表示部１０７を実現する。ボタン２８０６は、画像再生部１０５に対する再生の開始・終了・中断・継続などの指示操作に利用される。スピーカー２８０７は、音声出力部１０６を実現する。 The display 2805 implements the image display unit 107. A button 2806 is used for instructing operations such as start / end / interruption / continuation of playback to the image playback unit 105. The speaker 2807 implements the audio output unit 106.

図４（ａ）は、本実施例に係る情報処理装置によって再生されるスライドショーの画像の例であり、この例において画像はすべて静止画像であり、画像２、３、４、６、８、９、１０には音声データが対応付けられている。 FIG. 4A is an example of a slide show image reproduced by the information processing apparatus according to the present embodiment. In this example, all the images are still images, and images 2, 3, 4, 6, 8, 9 10 is associated with audio data.

また、画像１から１０は一枚ごとではなく、表示の開始および終了時刻をずらしながら一定時間表示されると共に、同時に複数表示される。
なお、音声データとは、画像撮影時に録音した音声メモや周囲の環境音、撮影後の任意に時間に付与したナレーション、効果音、音楽を含むオーディオデータのことをいう。 Further, the images 1 to 10 are not displayed one by one, but are displayed for a fixed time while shifting the display start and end times, and a plurality of images are simultaneously displayed.
The voice data refers to audio data including a voice memo recorded at the time of image shooting, ambient environmental sounds, narration, sound effects, and music arbitrarily given after shooting.

図８は画像表示部１０７にスライドショー表示される画像の例を示す図であり、画像表示部１０７に表示される画像は、時間の経過に従って、８０１から８２０の順に遷移する。ここでは、各画像は右から左に移動しながら表示されるものとする。 FIG. 8 is a diagram illustrating an example of an image displayed as a slide show on the image display unit 107. The image displayed on the image display unit 107 transitions in the order of 801 to 820 as time passes. Here, each image is displayed while moving from right to left.

図２および図３は画像に対応付けられた音声データを再生するタイミングを設定する動作を示すフローチャートである。 2 and 3 are flowcharts showing the operation for setting the timing for reproducing the audio data associated with the image.

図２は、各画像の表示タイミングを計算して初期設定をおこなうのためのフローチャートである。 FIG. 2 is a flowchart for performing initial setting by calculating the display timing of each image.

また、図３は、初期設定された画像に対応付けられた音声データの再生タイミングの重複を調べ、再生タイミングが重複しないように再生タイミングを調整するためのフローチャートである。 FIG. 3 is a flowchart for checking the reproduction timing of audio data associated with an initially set image and adjusting the reproduction timing so that the reproduction timing does not overlap.

まず、図２のフローチャートについて各画像の表示タイミングを設定する処理を説明する。 First, processing for setting the display timing of each image will be described with reference to the flowchart of FIG.

ステップＳ２０１において、画像選択部１０２は画像記憶部１０１からスライドショーに使用する画像を所定の順序に従って選択する。 In step S201, the image selection unit 102 selects images to be used for the slide show from the image storage unit 101 according to a predetermined order.

次に、ステップＳ２０２において、画像選択部１０２は再生すべき画像の選択が終了したか否か判断し、終了したと判断したら処理を終了する。 Next, in step S202, the image selection unit 102 determines whether or not the selection of the image to be reproduced has been completed.

一方、再生すべき画像のうち、未だ選択されて画像が存在する場合、ステップＳ２０３において、再生時間調整部１０３は、ステップＳ２０１で選択された画像を表示するタイミング設定する。
尚、順番に表示される画像は表示時間が一部重なるように開始時間と終了時間とがずれており、ステップＳ２０３で設定される表示タイミングが初期設定に相当する。 On the other hand, when an image to be reproduced is still selected and there is an image, in step S203, the reproduction time adjustment unit 103 sets the timing for displaying the image selected in step S201.
Note that the images displayed in order have a start time and an end time shifted so that the display times partially overlap, and the display timing set in step S203 corresponds to the initial setting.

次に、ステップＳ２０４で、再生時間調整部１０３は、選択された画像が動画像であるか否か判断し動画像でないと判断した場合、ステップＳ２０５において、再生時間調整部１０３は、画像に対応付けられた音声データがあるか否かの判断を行なう。 Next, in step S204, when the reproduction time adjustment unit 103 determines whether the selected image is a moving image and determines that the selected image is not a moving image, in step S205, the reproduction time adjustment unit 103 corresponds to the image. It is determined whether there is attached audio data.

ステップＳ２０５において、再生時間調整部１０３が選択された画像に対応付けられた音声データがないと判断した場合、ステップＳ２０１に戻って、次の画像に対する処理を行なう。 In step S205, when the reproduction time adjustment unit 103 determines that there is no audio data associated with the selected image, the process returns to step S201 to perform processing for the next image.

一方、ステップＳ２０５において再生時間調整部１０３が選択された画像に対応付けられた音声データがあると判断した場合には、ステップＳ２０６に進み、再生時間調整部１０３が、音声再生時間を設定する。 On the other hand, when the reproduction time adjustment unit 103 determines in step S205 that there is audio data associated with the selected image, the process proceeds to step S206, and the reproduction time adjustment unit 103 sets the audio reproduction time.

ここで、図５（ａ）（ｂ）を用いて、音声再生時間の設定や変更方法に関する記号について説明する。 Here, with reference to FIGS. 5 (a) and 5 (b), symbols relating to the setting and changing method of the audio playback time will be described.

図５（ａ）は、変数記号の説明であり、図５（ｂ）は、画像表示時間と音声再生時間との関係を図示したものである。 FIG. 5A illustrates the variable symbols, and FIG. 5B illustrates the relationship between the image display time and the audio reproduction time.

この例では、同時に複数表示されている画像のうち注目すべき画像一つが分かるような表示を行なう注目表示区間を設けている。 In this example, an attention display section is provided in which display is performed so that one of the images to be noticed can be recognized among the images displayed at the same time.

薄い灰色の帯５０１がｍ番目の画像の表示時間（時点ＰＳ（ｍ）以後から時点ＰＥ（ｍ）以前まで）を示し、濃い灰色帯５０２がｍ番目の画像の注目表示を行なっている注目時間（時点ＰＡＳ（ｍ）から時点ＰＡＥ（ｍ）まで）を示す。 The light gray band 501 indicates the display time of the mth image (from the time point PS (m) to the time point PE (m)), and the dark gray band 502 indicates the time of attention display of the mth image. (From time PAS (m) to time PAE (m)).

また、波線の帯５０３がｍ番目の画像に対応付けられた音声の再生時間（時点ＶＳ（ｍ）から時点ＶＥ（ｍ）まで）を示す。
尚、時点ＬＶＥ（ｍ）はｍ番目の画像に対応付けられた音声の直前に再生される音声の終了位置であり、ｍ番目の画像に対応付けられた音声データの再生タイミングを前へ移動可能かどうかの判定に使用する。 Further, a wavy line 503 indicates a sound reproduction time (from time VS (m) to time VE (m)) associated with the m-th image.
The time point LVE (m) is the end position of the sound reproduced immediately before the sound associated with the mth image, and the reproduction timing of the sound data associated with the mth image can be moved forward. Used to determine whether or not

次に、初期配置の方法を図６（ａ）に示す。初期配置の方法として、次の３種類の例を挙げるが、これ以外の方法でもよい。
尚、図６において、ＰＳ等に付与したｍ番目の画像を表す（ｍ）は省略する。 Next, an initial arrangement method is shown in FIG. The following three types of examples are given as the initial arrangement method, but other methods may be used.
In FIG. 6, (m) representing the mth image assigned to PS or the like is omitted.

（１）音声データの中心位置を注目時間の中心位置に合わせる。（２）音声データの終了位置を注目時間の終了位置に合わせる。（３）音声データの開始位置を注目時間の開始位置に合わせる。 (1) The center position of the audio data is matched with the center position of the attention time. (2) The end position of the audio data is matched with the end position of the attention time. (3) The start position of the audio data is matched with the start position of the attention time.

次に、ステップＳ２０６で再生時間調整部１０３が初期配置をする処理を実行した後にステップＳ２０７へ進み、再生時間調整部１０３は、音声データが位置を合わせた時点で音声データの再生時間が画像を表示する時間の前後の境界を越えるか否か判断する。 Next, in step S206, the playback time adjustment unit 103 performs the initial placement process, and then proceeds to step S207. The playback time adjustment unit 103 displays the playback time of the audio data at the time when the audio data is aligned. It is judged whether the boundary before and after the time to display is exceeded.

次に、ステップＳ２０７で境界を越えると判断した場合には、ステップＳ２０８において、音声編集部１０４は、境界を越えた部分の音声データを再生させない処理を実行し、ステップＳ２０１へ戻る。
尚、境界を越えた部分の音声データを再生させない処理とは、対象となる音声データの一部を指定する処理、対象となる音声データの一部に相当する音声データを作成する処理等である。 Next, when it is determined in step S207 that the boundary is exceeded, in step S208, the audio editing unit 104 performs a process that does not reproduce the audio data of the portion exceeding the boundary, and the process returns to step S201.
The processing that does not reproduce the audio data in the portion beyond the boundary includes processing that specifies a part of the target audio data, processing that creates audio data corresponding to a part of the target audio data, and the like. .

一方、ステップＳ２０７で境界を越えないと判断しなかった場合には、ステップＳ２０８の処理を実行せずに、ステップＳ２０１へ戻る。
なお、ステップＳ２０７、ステップＳ２０８の処理は省略可能である。尚、ステップＳ２０７、ステップＳ２０８を省略した場合、後述する図３のフローチャートのステップＳ３０７，Ｓ３１６，Ｓ３２５，Ｓ３２８，Ｓ３３３で画像の表示時間を越えないように調整される。 On the other hand, if it is not determined in step S207 that the boundary is not exceeded, the process returns to step S201 without executing the process of step S208.
Note that the processing in step S207 and step S208 can be omitted. When step S207 and step S208 are omitted, adjustment is made so as not to exceed the image display time in steps S307, S316, S325, S328, and S333 in the flowchart of FIG.

ステップＳ２０４で動画像と判断された場合は、ステップＳ２０９で再生時間調整部１０３は、動画像再生枠の設定を行なう。 If it is determined in step S204 that the image is a moving image, the reproduction time adjustment unit 103 sets a moving image reproduction frame in step S209.

次にステップＳ２１０で再生時間調整部１０３は選択された画像に対応付けられた音声データがあるか否かの判断を行ない、音声データがない場合にはステップＳ２０１へ戻る。 In step S210, the reproduction time adjustment unit 103 determines whether there is audio data associated with the selected image. If there is no audio data, the process returns to step S201.

一方、ステップＳ２１０で音声データがあると判断した場合、ステップＳ２１１１で再生時間調整部１０３は音声再生枠設定を行ない、ステップＳ２０１へ戻る。
尚、ステップＳ２０９とステップＳ２１０の処理の詳細については後述する。 On the other hand, if it is determined in step S210 that there is audio data, the reproduction time adjustment unit 103 performs audio reproduction frame setting in step S2111 and returns to step S201.
Details of the processes in steps S209 and S210 will be described later.

図７（ａ）は、図４（ａ）のスライドショーの対象となる画像の表示タイミング及び、音声データの再生タイミングを、図２のフローチャートに従って設定した場合のタイムチャートである。 FIG. 7A is a time chart in the case where the display timing of the images to be subjected to the slide show of FIG. 4A and the reproduction timing of the audio data are set according to the flowchart of FIG.

各画像の表示時間ｔ秒、同時に表示可能な画像の個数をＮ（本実施例では５）とし、ｔ／Ｎ秒ずらして画像を表示する画像表示時間の中で、濃い灰色の注目時間は他の画像の注目時間と重ならないように配置される。 The display time of each image is t seconds, and the number of images that can be displayed simultaneously is N (5 in the present embodiment). It is arranged so as not to overlap with the attention time of the image.

また、画像表示時間に合わせて当該画像に対応付けられている音声の再生時間も初期配置される。
尚、本実施例において音声の初期配置の方法としては、（１）の「音声データの中心位置を注目時間の中心位置に合わせる」を採用している。 In addition, the reproduction time of the sound associated with the image is also initially arranged in accordance with the image display time.
In this embodiment, as the method of initial arrangement of audio, (1) “Adjust the center position of audio data to the center position of time of interest” is adopted.

しかしながら、音声再生時間の中には重複箇所が見られこのまま再生すると音声が重なって聴き取りにくくなる。 However, overlapping portions are seen in the audio playback time, and if they are played as they are, the audio overlaps and it becomes difficult to hear.

そこで、図３のフローチャートに従って、音声の再生タイミングの重複を解消する。 Therefore, according to the flowchart of FIG.

ステップＳ３０１では、映像選択部１０２が映像記録部１０１から再生すべき画像を選択する。 In step S <b> 301, the video selection unit 102 selects an image to be reproduced from the video recording unit 101.

次に、ステップＳ３０２において、再生時間調整部１０３は、ステップＳ３０１で画像が選択されたか否かを判断し、画像選択されなかった場合には一連の処理を終了する。 Next, in step S302, the reproduction time adjustment unit 103 determines whether an image is selected in step S301. If no image is selected, the series of processing ends.

一方、画像が選択された場合、ステップＳ３０３において、再生時間調整部１０３は、選択された画像に対応づけられた音声（対象音声）が存在するか否か判断する。以後、選択された画像に対応づけられた音声を対象音声と呼ぶ。 On the other hand, if an image is selected, in step S303, the playback time adjustment unit 103 determines whether there is a sound (target sound) associated with the selected image. Hereinafter, the sound associated with the selected image is referred to as a target sound.

次に、ステップＳ３０３で対象音声が存在しないと判断された場合、ステップＳ３０８の処理を実行する。 Next, when it is determined in step S303 that the target voice does not exist, the process of step S308 is executed.

一方、ステップＳ３０３で対象音声が存在すると判断された場合、ステップＳ３０４において、再生時間調整部１０３は、対象音声は画像の表示開始位置と重なるか否か判断する。 On the other hand, when it is determined in step S303 that the target sound exists, in step S304, the reproduction time adjustment unit 103 determines whether the target sound overlaps the display start position of the image.

ステップＳ３０４において重なる場合には、ステップＳ３０５で対象音声の再生開始タイミングを画像の表示開始時間と重ならないタイミングまで強制的に後へ移動させる。 If they overlap in step S304, the reproduction start timing of the target sound is forcibly moved backward to a timing that does not overlap with the image display start time in step S305.

ステップＳ３０６において再生時間調整部１０３は、対象音声の終了位置が表示終了位置と重なるかどうかの判断を行ない、重なる場合には、ステップＳ３０７の処理を実行する。 In step S306, the playback time adjustment unit 103 determines whether or not the end position of the target sound overlaps the display end position, and if so, executes the process of step S307.

ステップＳ３０７において、音声編集部１０４は対象音声の後ろを表示終了位置で切断して、ステップＳ３０８の処理を実行する。
尚、切断とは、音声の再生時間の一部を指定すること、音声の再生時間の一部に相当する別の音声データを生成することをいう。もしくは、音声データの再生開始時間と終了時間を対応付けておき、途中からもしくは途中までの再生を行なうことをいう。 In step S307, the voice editing unit 104 cuts behind the target voice at the display end position, and executes the process of step S308.
Note that the disconnection means designating a part of the audio reproduction time and generating another audio data corresponding to a part of the audio reproduction time. Alternatively, the reproduction start time and the end time of the audio data are associated with each other, and reproduction from the middle or halfway is performed.

ステップＳ３０６で重ならないと判断した場合はステップＳ３０８の処理を実行し、ステップＳ３０４で画像の表示開始位置と重ならないと判断した場合にも、ステップＳ３０８の処理を実行する。
なお、ステップＳ３０３からステップＳ３０７までの処理は、図２のステップＳ２０７とステップＳ２０８の処理を行なった場合には省略可能である。 If it is determined in step S306 that the image does not overlap, the process of step S308 is executed. If it is determined in step S304 that the image does not overlap with the image display start position, the process of step S308 is also executed.
Note that the processing from step S303 to step S307 can be omitted when the processing of step S207 and step S208 in FIG. 2 is performed.

ステップＳ３０８では直前位置の設定を行なう。
尚、直前位置とは、ステップＳ３０３において、音声が存在しないと判断された場合、ステップＳ３０２で選択された画像の次の画像表示開始位置となり、それ以外の場合はステップＳ３０２で選択された画像に対応付けられた音声の終了位置となる。また、音声が存在する場合の音声データはこれ以後の処理に対して直前音声となる。 In step S308, the previous position is set.
Note that the immediately preceding position is the next image display start position of the image selected in step S302 if it is determined in step S303 that no sound is present, and otherwise the image selected in step S302 is changed to the image selected in step S302. This is the end position of the associated voice. In addition, the sound data when there is sound becomes the immediately preceding sound for the subsequent processing.

即ち、ステップＳ３０１からステップＳ３０８までの処理は、１番目の画像に関する処理であり、後述するステップＳ３０９以後の処理が２番目以後の画像に関する処理である。 That is, the processes from step S301 to step S308 are processes related to the first image, and the processes after step S309 described later are processes related to the second and subsequent images.

ステップＳ３０９では、画像選択部１０２が画像記録部１０１から２番目以後に表示される画像を順次選択する。 In step S <b> 309, the image selection unit 102 sequentially selects the second and subsequent images displayed from the image recording unit 101.

ステップＳ３１０では、再生時間調整部１０３が再生すべき映像が終了したか判断し、終了したらステップＳ３２９の処理を実行する。 In step S310, the playback time adjustment unit 103 determines whether or not the video to be played has ended. When the video has ended, the process of step S329 is executed.

画像が存在する場合、ステップＳ３１１へ進み、再生時間調整部１０３が選択した画像に対応付けられた音声（対象音声）が存在するかどうか判断する。 If an image exists, the process advances to step S311 to determine whether there is a sound (target sound) associated with the image selected by the reproduction time adjustment unit 103.

ステップＳ３１１で再生時間調整部１０３が、音声が存在しないと判断した場合、ステップＳ３２６へ進む。 If the playback time adjustment unit 103 determines in step S311 that there is no sound, the process proceeds to step S326.

一方、ステップＳ３１１で再生時間調整部１０３が、音声が存在すると判断した場合、ステップＳ３１２で対象音声が直前位置と重なるかどうかの判断を行ない、重ならないと判断した場合は、ステップＳ３１７の処理を実行する。 On the other hand, if the playback time adjustment unit 103 determines in step S311 that there is sound, it determines whether or not the target sound overlaps the previous position in step S312, and if it does not overlap, the process in step S317 is performed. Execute.

ステップＳ３１２で直前位置と重なると判断した場合には、ステップＳ３１３で再生時間調整部１０３が、直前音声があるか否かの判断を行なう。
尚、直前音声とは、直前位置更新処理で設定されるもので、当該画像が表示されるまでに表示される画像に対応付けられている音声データの最新のものに該当する。 If it is determined in step S312 that the position overlaps with the immediately preceding position, in step S313, the reproduction time adjustment unit 103 determines whether or not there is an immediately preceding sound.
The immediately preceding sound is set in the immediately preceding position update process, and corresponds to the latest sound data associated with the image displayed until the image is displayed.

また、ステップＳ３１３で直前音声がないと判断するのは、当該画像が表示されるまでに表示される画像に対応付けられた音声データがない場合のみである。この場合、ステップＳ３１４で、再生時間調整部１０３が、対象の音声データを直前位置と重ならない位置まで強制的に後へ移動させる。 In step S313, it is determined that there is no immediately preceding sound only when there is no sound data associated with the image displayed until the image is displayed. In this case, in step S314, the reproduction time adjustment unit 103 forcibly moves the target audio data backward to a position that does not overlap the previous position.

続いて、ステップＳ３１５の処理を実行する。 Subsequently, the process of step S315 is executed.

ステップＳ３１３で再生時間調整部１０３が、直前音声があると判断した場合には、ステップＳ３１８で、再生時間調整部１０３が、直前音声が前へ動くかどうかの判断を行なう。 If the playback time adjustment unit 103 determines in step S313 that there is a previous sound, in step S318, the playback time adjustment unit 103 determines whether the previous sound moves forward.

直前音声が前へ移動可能かどうかの判断基準の例を図６（ｂ）に示す。 An example of a criterion for determining whether or not the immediately preceding voice can be moved forward is shown in FIG.

前へ移動可能かどうかの判断基準として、次の４種類の例を挙げるが、これ以外の基準でもよい。 The following four types of examples are given as criteria for determining whether or not it is possible to move forward, but other criteria may be used.

（１）音声の終了位置ＶＥが映像の注目開始位置ＰＡＳより後にある。（２）音声の終了位置ＶＥが映像の注目中心位置ＰＡＭより後にある。（３）音声の開始位置ＶＳが映像の注目開始位置ＰＡＳより後にある。（４）音声の終了位置ＶＥが映像の注目終了位置ＰＡＥより後にある。 (1) The audio end position VE is after the video attention start position PAS. (2) The audio end position VE is after the target center position PAM of the video. (3) The audio start position VS is after the video attention start position PAS. (4) The audio end position VE is after the video end position PAE.

この基準では、移動できる量は（１）よりも（４）の方が少ない。 Under this criterion, the amount that can be moved is smaller in (4) than in (1).

また、初期配置の記述と移動可能の基準の組み合わせには注意が必要である。 In addition, attention must be paid to the combination of the description of the initial arrangement and the movable standard.

初期配置で（２）の「音声データの終了位置を注目時間の終了位置に合わせる」とした場合、前への移動可能基準として（４）を選択した場合、前へ移動できなくなる。 When (2) “match the end position of the audio data to the end position of the time of interest” is set in the initial arrangement, when (4) is selected as the forward movement reference, it is impossible to move forward.

初期配置で（３）の「音声データの開始位置を注目時間の開始位置に合わせる」とした場合、前への移動可能基準として（３）を選択した場合、前へ移動できなくなる。図６（ｂ）の移動可能量の項目は、各移動判断基準における移動可能な量を示すための計算式である。 When (3) is set to “start audio data start position at the start position of the time of interest” in the initial arrangement, if (3) is selected as the forward movable reference, it is impossible to move forward. The item of the movable amount in FIG. 6B is a calculation formula for indicating the movable amount in each movement determination criterion.

尚、（Ａ）は、対象音声が移動基準で移動できる量を計算した結果であり、（Ｂ）は直前音声と対象音声の重なる区間の量を計算した結果である。（Ａ）＞（Ｂ）の場合には（Ｂ）だけ移動可能とし、（Ａ）≦（Ｂ）場合には（Ａ）だけ移動可能とする。 Note that (A) shows the result of calculating the amount by which the target voice can move on the basis of movement, and (B) shows the result of calculating the amount of the section where the immediately preceding voice and the target voice overlap. When (A)> (B), only (B) can be moved, and when (A) ≦ (B), only (A) can be moved.

また、図７（ｂ）は、画像番号３に対応付けられた音声データを対象として処理を行なっている時に、直前音声である画像番号２に対応付けられた音声データを前に移動する例を模式的に示した図である。 FIG. 7B shows an example in which the audio data associated with image number 2 that is the immediately preceding sound is moved forward when processing is performed on the audio data associated with image number 3. It is the figure shown typically.

移動可能かどうかの判断基準には（２）の「音声の終了位置が映像の注目中心位置より後にある」を採用している。 As a criterion for determining whether or not it is possible to move, (2) “the end position of the audio is behind the center position of interest of the video” is adopted.

この例ではステップＳ３１８で画像番号２に対応付けられた音声データが移動可能であると判断され、ステップＳ３１９の処理を実行される。 In this example, it is determined in step S318 that the audio data associated with the image number 2 is movable, and the process of step S319 is executed.

ステップＳ３１９では、再生時間調整部１０３が、直前音声を前へ移動する。 In step S319, the reproduction time adjustment unit 103 moves the immediately preceding sound forward.

移動は、移動可能量だけ行なう。 The movement is performed by the movable amount.

図７（ｃ）は、画像番号２に対応付けられた音声データを移動した結果を模式的に示した図である。 FIG. 7C is a diagram schematically showing the result of moving the audio data associated with the image number 2.

次にステップＳ３２０で、再生時間調整部１０３は、直前音声の移動後の直前位置（この場合は移動後の直前音声）と対象音声が重なるか否かの判断する。 Next, in step S320, the reproduction time adjustment unit 103 determines whether or not the target sound overlaps with the position immediately before the movement of the immediately preceding sound (in this case, the sound immediately before the movement).

ステップＳ３２０で重ならないと判断した場合にはステップＳ３１５の処理を実行する。 If it is determined in step S320 that they do not overlap, the process of step S315 is executed.

一方、ステップＳ３２０で、重なると判断された場合は、ステップＳ３２１において、再生時間調整部１０３が、対象の音声データが後へ移動可能か否かの判断を行なう。 On the other hand, if it is determined in step S320 that they overlap, in step S321, the playback time adjustment unit 103 determines whether the target audio data can be moved backward.

ステップＳ３１８で直前音声が前へ移動可能でないと判断された場合にもステップＳ３２１の処理を実行し、対象音声データが後へ移動可能かどうかの判断を行なう。 Even when it is determined in step S318 that the immediately preceding voice cannot be moved forward, the process of step S321 is executed to determine whether or not the target voice data can be moved backward.

図６（ｃ）は、画像に対応付けられた音声データの再生時間が、当該画像の再生時間の後方へ移動可能か否かの判断基準の例を模式的に示す図である。 FIG. 6C is a diagram schematically illustrating an example of a criterion for determining whether or not the reproduction time of audio data associated with an image can be moved to the rear of the reproduction time of the image.

後方へ移動可能か否かの判断基準として、次の４種類の例を挙げるが、これ以外の基準でもよい。 The following four types of examples are given as criteria for determining whether or not the vehicle can move backward, but other criteria may be used.

（１）音声の開始位置ＶＳが画像の注目終了位置ＰＡＥより前にある。（２）音声の開始位置ＶＳが画像の注目中心位置ＰＡＭより前にある。（３）音声の終了位置ＶＥが映像の注目終了位置ＰＡＥより前にある。（４）音声の開始位置ＶＳが映像の注目開始位置ＰＡＳより前にある。 (1) The audio start position VS is before the image attention end position PAE. (2) The voice start position VS is before the target center position PAM of the image. (3) The audio end position VE is before the video end position PAE. (4) The audio start position VS is before the video start position PAS.

この基準では、移動できる量は、（１）よりも（４）の方が少ない。また、初期配置の記述と移動可能の基準の組み合わせには注意が必要である。初期配置で（２）の「音声データの終了位置を注目時間の終了位置に合わせる」とした場合、後への移動可能基準として（３）を選択した場合、後へ移動できなくなる。 Under this criterion, the amount that can be moved is smaller in (4) than in (1). In addition, attention must be paid to the combination of the description of the initial arrangement and the movable standard. When (2) “Match audio data end position to end position of attention time” is selected in the initial arrangement, when (3) is selected as the backward movement reference, it is impossible to move backward.

初期配置で（３）の「音声データの開始位置を注目時間の開始位置に合わせる」とした場合、後への移動可能基準として（４）を選択した場合、後へ移動できなくなる。 When (3) “match the start position of the audio data with the start position of the time of interest” is set in the initial arrangement, if (4) is selected as the backward movement reference, it is impossible to move backward.

移動可能量としては、音声データが表示の終了位置より後へ越えない範囲内で計算する。 The movable amount is calculated within a range where the audio data does not exceed the display end position.

ステップＳ３２１で音声データが後へ移動可能と判断された場合には、ステップＳ３２２で、再生時間調整部１０３が対象の音声データを後へ移動する。
図６（ｄ）の移動可能量の項目は、各移動判断基準における移動可能な量を示すための計算式である。図６（ｃ）の前へ移動する場合の移動可能量は、直前音声の終了位置の制約を受けるが、図６（ｄ）の後へ移動する場合の移動慣用量は、直前音声の終了位置の制約を受けない。しかし、移動可能なだけ移動した場合、その後の処理で再び前へ移動させる必要が出て来る可能性が多くなる。そのため、実際の移動量は、（Ａ）移動可能量、（Ｂ）直前音声と対象音声の重なる区間の量とし、（Ａ）＞（Ｂ）の場合には（Ｂ）だけ移動し、（Ａ）≦（Ｂ）場合には（Ａ）だけ移動する。 If it is determined in step S321 that the audio data can be moved backward, in step S322, the reproduction time adjustment unit 103 moves the target audio data backward.
The item “movable amount” in FIG. 6D is a calculation formula for indicating the movable amount in each movement determination criterion. The movement possible amount in the case of moving to the front of FIG. 6C is restricted by the end position of the immediately preceding voice, but the movement routine amount in the case of moving to the back of FIG. 6D is the end position of the immediately preceding sound. Not subject to restrictions. However, if it has moved as much as possible, there is a greater possibility that it will be necessary to move it forward again in subsequent processing. Therefore, the actual amount of movement is (A) the possible amount of movement, (B) the amount of the section where the immediately preceding voice and the target voice overlap, and if (A)> (B), the movement amount is (B), ) ≦ (B), move only (A).

次に、ステップＳ３２３で、再生時間調整部１０３が、対象音声の移動後も直前位置と対象音声が重なるか否かの判断を行ない、重ならない場合にはステップＳ３１５の処理を実行する。 Next, in step S323, the playback time adjustment unit 103 determines whether or not the previous position overlaps with the target voice even after the target voice has moved, and if not, executes the process of step S315.

一方、ステップＳ３２３で対象音声の移動後も直前位置と重なると判断した場合にはステップＳ３２４の処理を実行し、ステップＳ３２１で音声データが後へ移動可能ではないと判断した場合にもステップＳ３２４の処理を実行する。 On the other hand, if it is determined in step S323 that the target voice overlaps with the previous position even after the movement of the target voice, the process of step S324 is executed. Execute the process.

図７（ｄ）は、直前音声が前へ移動可能でなく、対象の音声データが後へ移動可能でない例を模式的に示した図である。 FIG. 7D is a diagram schematically illustrating an example in which the immediately preceding voice is not movable forward and the target voice data is not movable backward.

ここでは画像番号４が対象の音声データとなっている。 Here, the image number 4 is the target audio data.

画像番号４の直前音声である画像番号３に対応付けられた音声データは、画像番号２に対応付けられた音声データの直後にあるため前に移動できない。 The sound data associated with the image number 3 that is the sound immediately before the image number 4 cannot be moved forward because it is immediately after the sound data associated with the image number 2.

後へ移動可能かどうかの判断基準は（２）「音声の開始位置が映像の注目中心位置より前にある」を採用しているが、画像番号４の音声データは終了位置が表示時間と同じなので後へ移動できない。 The criterion for determining whether or not it is possible to move backward is (2) “The start position of the sound is before the focus center position of the video”, but the end position of the sound data of image number 4 is the same as the display time. So I can't move back.

ステップＳ３２４では、再生時間調整部１０３が、直前音声の後ろを切断可能かどうかの判断を行なう。 In step S324, the reproduction time adjustment unit 103 determines whether it is possible to cut the back of the immediately preceding voice.

図７（ｅ）は直前音声の後ろが切断可能でない例を模式的に示した図である。 FIG. 7E is a diagram schematically showing an example in which the portion behind the immediately preceding voice is not cuttable.

対象の音声データの開始位置が直前音声の開始位置の前にあるため、直前音声は切断できない。 Since the start position of the target audio data is before the start position of the immediately preceding sound, the immediately preceding sound cannot be disconnected.

あえて切断するとしたら、直前音声を消去し、なおかつその直前音声まで切断する必要がある。 If it is intentionally disconnected, it is necessary to erase the previous voice and to cut to the previous voice.

この場合の、切断可能かどうかの判断基準としては、直前音声がなくならないこととしている。 In this case, as a criterion for determining whether or not disconnection is possible, the immediately preceding voice is not lost.

これ以外に、直前音声の終了位置が注目開始点より後にあること、直前音声の終了位置が注目中心位置より後にあることなどを用いてもよい。 In addition to this, the end position of the immediately preceding voice may be after the attention start point, the end position of the immediately preceding sound may be after the attention center position, or the like may be used.

ステップＳ３２４で切断不可能と判断した場合には、ステップＳ３１４へ進み、対象の音声データを重ならない位置まで強制的に後へ移動する。その例を図７（ｆ）に示す。 If it is determined in step S324 that cutting is impossible, the process proceeds to step S314, and the target audio data is forcibly moved backward to a position where it does not overlap. An example is shown in FIG.

ステップ３１５では、対象音声データの終了位置が画像の表示終了位置と重なるか否かの判断を行ない、重なる場合には、のステップＳ３１６の処理を実行する。 In step 315, it is determined whether or not the end position of the target audio data overlaps the display end position of the image. If they overlap, the process of step S316 is executed.

次に、ステップＳ３１６において、音声編集部１０４は対象の音声データの後ろを画像表示終了位置で切断して、ステップＳ３１７の処理を実行する。 Next, in step S316, the voice editing unit 104 cuts behind the target voice data at the image display end position, and executes the process of step S317.

図７（ｇ）は切断前、図７（ｈ）は切断後の一例を模式的に示す図である。 FIG. 7G schematically shows an example before cutting, and FIG. 7H schematically shows an example after cutting.

ステップＳ３２４で直前音声の後ろを切断可能と判断した場合には、ステップＳ３２５で、再生時間調整部１０３が、直前音声の後ろを切断して、ステップＳ３１５の処理を実行する。
なお、ステップＳ３２１で対象の音声データが後ろへ移動可能と判断された場合の例を図７（ｉ）に示す。 If it is determined in step S324 that the back of the previous voice can be disconnected, in step S325, the playback time adjustment unit 103 cuts the back of the previous voice and executes the process of step S315.
FIG. 7 (i) shows an example when it is determined in step S321 that the target audio data can be moved backward.

処理の対象である画像番号６に対応付けられた音声データは、画像番号４に対応付けられた音声データと重複するため、移動可能な範囲内で移動する。 Since the audio data associated with the image number 6 that is the object of processing overlaps with the audio data associated with the image number 4, it moves within a movable range.

移動後の例を図７（ｊ）に示す。 An example after the movement is shown in FIG.

ステップＳ３２３で対象音声の移動後も直前位置と重なると判断されるため、ステップＳ３２４へと進み直前音声の後ろを切断可能かどうかの判断するからである。 This is because it is determined in step S323 that the target voice overlaps with the previous position after the movement of the target voice, and thus the process proceeds to step S324 and it is determined whether the back of the previous voice can be cut.

図７（ｊ）の例では直前音声の後ろを切断可能と判断して、ステップＳ３２５で、再生時間調整部１０３が、直前音声の後ろを切断する。 In the example of FIG. 7 (j), it is determined that the back of the previous voice can be cut, and in step S325, the reproduction time adjustment unit 103 cuts the back of the previous voice.

切断前の例が図７（ｊ）、切断後の例が図７（ｋ）である。 FIG. 7 (j) shows an example before cutting, and FIG. 7 (k) shows an example after cutting.

ステップＳ３１７では再生時間調整部１０３が、現在の映像に対応付けた音声データの終了位置を直前位置とし、その後ステップＳ３０９の処理を繰り返す。 In step S317, the reproduction time adjustment unit 103 sets the end position of the audio data associated with the current video as the immediately preceding position, and then repeats the process of step S309.

ステップＳ３２６では、直前音声があるかどうかの判断を行なう。直前音声がない場合には、Ｓ３０９の処理を繰り返す。直前音声がある場合にはＳ３２７へ進む。 In step S326, it is determined whether there is a previous voice. If there is no immediately preceding voice, the process of S309 is repeated. If there is an immediately preceding voice, the process proceeds to S327.

ステップＳ３２７は直前音声が対応する画像の表示終了位置を越えるかどうかの判断を行なう。越えない場合にはステップＳ３０９の処理を繰り返す、越える場合にはＳ３２８で、直前音声の終了位置が表示終了位置を越えないよう後ろを切断し、ステップＳ３０９の処理を繰り返す。
なお、ステップＳ３２６からＳ３２８の処理は、図２のフローチャートでステップＳ２０７、Ｓ２０８の処理を省略した場合にのみ必要な処理である。図２のフローチャートでステップＳ２０７、Ｓ２０８の処理を行った場合、ステップＳ３２６からＳ３２８の処理は必要ない。 In step S327, it is determined whether or not the immediately preceding sound exceeds the display end position of the corresponding image. If it does not exceed, the process of step S309 is repeated. If it exceeds, the back end is cut so that the end position of the immediately preceding voice does not exceed the display end position, and the process of step S309 is repeated.
Note that the processing of steps S326 to S328 is necessary only when the processing of steps S207 and S208 is omitted in the flowchart of FIG. When the processes of steps S207 and S208 are performed in the flowchart of FIG. 2, the processes of steps S326 to S328 are not necessary.

ステップＳ３１０で再生すべき映像が終了したと判断した場合に、終了処理としてステップＳ３２９の処理を実行する。 If it is determined in step S310 that the video to be played has ended, the process of step S329 is executed as the end process.

ステップＳ３２９では、再生時間調整部１０３が、直前位置が終了位置と重なるかどうかの判断を行ない、重ならない場合はそのまま終了する。 In step S329, the reproduction time adjustment unit 103 determines whether or not the previous position overlaps the end position. If the position does not overlap, the process ends.

一方、ステップＳ３２９で直前位置が終了位置と重なると判断した場合には、ステップＳ３３０で再生時間調整部１０３が、直前音声が前に動くかどうかの判断を行なう。 On the other hand, if it is determined in step S329 that the previous position overlaps the end position, in step S330, the reproduction time adjustment unit 103 determines whether or not the previous sound moves forward.

動く場合には、ステップＳ３３１で、再生時間調整部１０３が、直前音声を前へ移動させステップＳ３３２の処理を実行し、動かない場合にはステップＳ３３３の処理を実行する。 In the case of movement, in step S331, the reproduction time adjustment unit 103 moves the immediately preceding sound forward to execute the process of step S332, and in the case of not moving, executes the process of step S333.

ステップＳ３３２では、再生時間調整部１０３が、直前音声の移動後も直前位置が終了と重なるかどうかの判断を行ない、直前位置と終了位置とが重ならないと判断した場合は一連の処理を終了する。 In step S332, the reproduction time adjustment unit 103 determines whether or not the immediately preceding position overlaps with the end even after the movement of the immediately preceding sound. If it is determined that the immediately preceding position and the end position do not overlap, the series of processing ends. .

ステップＳ３３２で移動後も直前位置が終了と重なると判断した場合には、ステップＳ３３３で、再生時間調整部１０３が、直前音声の後を切断して終了する。
なお、ステップＳ３２９から終了までの処理は、直前音声がステップＳ３１５、Ｓ３２８の処理を通過している場合には必要ない。また、図２のフローチャートでステップＳ２０７，Ｓ２０８の処理を諸ル略していない場合にも必要ない。 If it is determined in step S332 that the previous position overlaps with the end even after the movement, in step S333, the reproduction time adjustment unit 103 cuts after the previous sound and ends.
Note that the processing from step S329 to the end is not necessary when the immediately preceding voice has passed the processing of steps S315 and S328. Further, it is not necessary even when the steps S207 and S208 are not omitted in the flowchart of FIG.

以上の処理を適用した結果を、図７（ｌ）に示す。図７（ｌ）においては、音声データの重複箇所はなくなっている。 The result of applying the above processing is shown in FIG. In FIG. 7 (l), there are no overlapping portions of the audio data.

以上のとおり、図３のフローチャートは音の出力制御を行うためのタイムチャートを作成するものである。 As described above, the flowchart of FIG. 3 creates a time chart for performing sound output control.

図３のフローチャートで作成したタイムチャートに従い、画像再生部１０５はスライドショーを再生し、図８に示すように音声出力部１０６と画像表示部１０７から音声と画像を出力する。 According to the time chart created in the flowchart of FIG. 3, the image reproduction unit 105 reproduces a slide show, and outputs audio and images from the audio output unit 106 and the image display unit 107 as shown in FIG.

再生される映像は静止画像を右から左へ移動させてながら表示して行くものであり、その動きの一部を代表するものが８０１〜８１９である。 The reproduced video is displayed while moving a still image from right to left, and 801 to 819 represent a part of the movement.

右から左へ移動して行く静止画像は中央部分まで徐々に大きくなり中央部分で最大となる。この中央部分の最大表示時間を注目時間とする。 The still image moving from the right to the left gradually increases to the central portion and becomes the maximum at the central portion. The maximum display time of this central portion is set as the attention time.

図８に表示されている吹き出しは、どの映像に対応付けた音声が再生されているのかを図で示すためのもので、実際の表示は行なわなくてもよい。 The speech balloons displayed in FIG. 8 are for showing which video the audio associated with is reproduced, and the actual display may not be performed.

図７（ｌ）のタイムチャートにあるとおり、画像番号１の画像には音声データが対応付けられておらず、８０１、８０２では音声は再生されていない。 As shown in the time chart of FIG. 7L, no audio data is associated with the image of image number 1, and no audio is reproduced in 801 and 802.

画像番号２の音声データは注目時間の前後の境界を越えるように配置されている。 The audio data of the image number 2 is arranged so as to exceed the boundary before and after the attention time.

画像番号２が注目表示の少し前にある８０３の時点で音声再生が開始され、注目時間中の８０４と注目時間を過ぎた８０５まで音声再生が継続されている。なお図８の８０１〜８０３は非連続な動きにしてもよいし、８０１〜８０３の間も画像は移動しかつ拡大し縮小する連続的な動きにしてもよい。
なお、画像の表示形態は図８の例に限定されるものではない。 Audio reproduction is started at a time 803 when the image number 2 is slightly before the attention display, and the sound reproduction is continued until 804 during the attention time and 805 after the attention time. In FIG. 8, 801 to 803 may be a discontinuous movement, or the image may be a continuous movement between 801 to 803 that moves, expands and contracts.
The display form of the image is not limited to the example of FIG.

画像の表示形態について他の例を図９に示す。 Another example of the image display form is shown in FIG.

静止画像の表示位置をランダムに配置し、小さいサイズで表示を開始し、時間経過に従い静止画像の位置を固定したままサイズを大きくし、注目時間を過ぎするとサイズを小さくして表示時間の終了位置で消去するものである。 The display position of the still image is randomly arranged, the display starts at a small size, the size is increased while the position of the still image is fixed with the passage of time, the size is reduced when the attention time has passed, and the end position of the display time To erase.

図７（ｌ）のタイムチャートにあるとおり、画像番号１の画像には音声データが対応付けられておらず、９０１、９０２では音声は再生されていない。 As shown in the time chart of FIG. 7L, no audio data is associated with the image of image number 1, and no audio is reproduced in 901 and 902.

画像番号２に対応付けられた音声データは注目時間の前後の境界を越えるように配置されている。 The audio data associated with the image number 2 is arranged so as to cross the boundary before and after the attention time.

画像番号２が注目表示の少し前にある９０３の時点で音声再生が開始され、注目時間中の９０４と注目時間を過ぎた９０５まで音声再生が継続されている。 Audio reproduction is started at the time 903 when the image number 2 is slightly before the attention display, and the audio reproduction is continued until 904 during the attention time and 905 after the attention time.

なお、図５（ａ）（ｂ）、図６（ｂ）（ｃ）では配置位置や移動範囲の限定で、表示時間中に注目時間を設け、注目時間に音声データの少なくとも一部が重なるようにしていたが、注目時間を設けずに、音声データが表示時間内に収まるような配置にしてもよい。 In FIGS. 5A, 5B and 6B, 6C, attention time is provided during the display time due to limitations on the arrangement position and movement range, and at least part of the audio data overlaps the attention time. However, the arrangement may be such that the audio data is within the display time without providing the attention time.

この場合の音声再生時間の配置や移動方法についての記号説明を図５（ｃ）（ｄ）に示す。 Symbol explanations about the arrangement and movement method of the audio reproduction time in this case are shown in FIGS.

図５（ｃ）が変数記号の説明であり、図５（ｄ）が位置関係を図示したものである。 FIG. 5C illustrates the variable symbols, and FIG. 5D illustrates the positional relationship.

注目時間を設けない場合の初期配置の方法を図１０（ａ）、音声データが前へ移動可能かどうかの判断基準を図１０（ｂ）、音声データが後へ移動可能かどうかの判断基準を図１０（ｃ）に示す。 FIG. 10A shows an initial arrangement method when no attention time is provided, FIG. 10B shows a criterion for determining whether or not audio data can be moved forward, and a criterion for determining whether or not audio data can be moved backward. As shown in FIG.

注目時間をなくすことで、音声の位置調整の幅は広がるが、どの映像に対応付けられた音声が再生されているのか、分かりにくくなる。そのため、音声が再生されている映像を示すために、吹き出し表示や静止画像の枠の強調表示などを行なってもよい。 Eliminating the attention time broadens the range of audio position adjustment, but makes it difficult to determine which video is being played back. Therefore, in order to show a video in which sound is reproduced, a balloon display or a still image frame highlight display may be performed.

また、図４（ａ）に示したスライドショーの映像の例では、音声データの長さを対象としており、音声データの音の状態については考慮していない。音声データの中には、音量が小さくて音の聞こえない区間や雑音などの無用な区間を含むことがあり、これら無用な音声を除外して音声データの長さを短縮することができる。そのためのブロック図を図１（ｃ）に示す。 In the example of the slide show video shown in FIG. 4A, the length of the audio data is targeted, and the sound state of the audio data is not considered. The voice data may include a section where the volume is low and the sound cannot be heard and a useless section such as noise, and the length of the voice data can be reduced by excluding these useless sounds. A block diagram for this purpose is shown in FIG.

図１（ａ）と同じ要素については同符号を付し説明を省略する。 The same elements as those in FIG. 1A are denoted by the same reference numerals and description thereof is omitted.

音声区間検出部１２０８は、音声区間を検出する。 The voice segment detection unit 1208 detects a voice segment.

音声区間検出には様々な方式がある。音量で判断し、一定以上のボリュームのある区間を検出するものや、人の声らしさを判断して人の音声のある区間のみを検出するもの、複数の音のモデルを使用して音の種別を判断してその区間を検出するものなどである。ここではその方式を限定しない。 There are various methods for detecting a voice interval. Judging by volume and detecting sections with a volume above a certain level, detecting sections that detect human voice and detecting only sections with human voice, and sound types using multiple sound models And the section is detected. Here, the method is not limited.

図４（ｂ）に、映像再生装置によって再生されるスライドショーの各映像と音声区間検出された音声データの長さの例を示す。 FIG. 4B shows an example of each video of the slide show reproduced by the video reproduction device and the length of the audio data detected in the audio section.

音声データ１３０１で音声区間検出を行なった場合、非音声区間１３０２と音声区間１３０３が検出される。 When speech section detection is performed using the speech data 1301, a non-speech section 1302 and a speech section 1303 are detected.

図５（ｅ）（ｆ）を用いて、音声再生時間の配置や方法についての記号を説明をする。 With reference to FIGS. 5E and 5F, symbols relating to the arrangement and method of the audio playback time will be described.

図５（ｅ）が変数記号の説明であり、図５（ｆ）が位置関係を図示したものである。１４０１は音声データを示す。この中の非音声区間を白く、音声区間を波線で示し、音声データの中にある音声区間の最初の音声区間の開始位置（ＶＳ）と最後の音声区間の終了位置（ＶＥ）を音声の開始と終了位置とする。図４（ｂ）の例では、音声区間１３０３の開始位置がＶＳ、終了位置がＶＥとなる。音声データの中に音声区間が複数ある場合は、最初の音声区間の開始位置がＶＳ、最後の音声区間の終了位置がＶＥとなる。 FIG. 5E illustrates the variable symbols, and FIG. 5F illustrates the positional relationship. Reference numeral 1401 denotes audio data. The non-speech section is white, the speech section is indicated by a wavy line, and the start position (VS) of the first speech section and the end position (VE) of the last speech section in the speech data are the start of speech. And the end position. In the example of FIG. 4B, the start position of the voice section 1303 is VS and the end position is VE. When there are a plurality of voice sections in the voice data, the start position of the first voice section is VS and the end position of the last voice section is VE.

初期配置の方法および移動の判断基準の説明は、ＶＳとＶＥを使えば図６と同じになるので省略する。 The description of the method of initial arrangement and the criteria for determining the movement will be omitted because VS and VE are the same as in FIG.

図４（ｂ）のスライドショーを、図２のフローチャートで配置した場合の初期配置のタイムチャートを図１１（ａ）に示す。音声が重複している箇所が複数あるが、重複しないように調整しなくてはならないのは、音声区間が重複している箇所である。 FIG. 11A shows a time chart of the initial arrangement when the slide show of FIG. 4B is arranged according to the flowchart of FIG. There are a plurality of places where voices overlap, but it is a part where voice sections overlap that must be adjusted so as not to overlap.

図１１（ａ）に対して、図３のフローチャートの処理を適用した例を図１１（ｂ）に示す。
図１１（ｂ）では映像番号３の非音声区間と映像番号４の間に非音声区間の重複が見られるが、非音声区間が無音区間ならば、双方の音声を同時に再生しても音が重複することはない。非音声区間が雑音などを含むものの場合、非音声区間の再生を抑制するなどの処理を行なう。 FIG. 11B shows an example in which the processing of the flowchart of FIG. 3 is applied to FIG.
In FIG. 11 (b), there is an overlap of the non-voice section between the non-voice section of the video number 3 and the video number 4, but if the non-voice section is a silent section, the sound will be heard even if both voices are played back simultaneously. There is no duplication. When the non-speech section includes noise or the like, processing such as suppression of reproduction of the non-speech section is performed.

また、長すぎる音声を切断する場合、非音声区間での切断を行なう。切断方法の例を図１２に示す。図１２（ａ）のように、切断位置が音声区間である場合、図１２（ｂ）と図１２（ｃ）と２種類の切断方法がある。図１２（ｂ）は、重複する箇所が音声区間である場合、直前の非音声区間に切断位置をずらして切断するものである。これは、言葉の途中で音が切れることを防ぐ。図１２（ｃ）は、重複する箇所が音声区間である場合、切断位置より前の時間内に音声区間のみを集め、再生される音声区間の量を多くして切断するものである。 Further, when cutting a voice that is too long, the voice is cut in a non-voice section. An example of the cutting method is shown in FIG. When the cutting position is a voice section as shown in FIG. 12A, there are two types of cutting methods, FIG. 12B and FIG. 12C. FIG. 12B shows a case where the cut position is shifted to the immediately preceding non-voice section when the overlapping portion is a voice section. This prevents the sound from being cut off in the middle of a word. In FIG. 12C, when the overlapping portion is a voice section, only the voice sections are collected within the time before the cutting position, and the volume of the reproduced voice sections is increased to cut.

したがって、一枚ごとに映像を表示するスライドショーと異なり、複数の映像を開始および終了時間をずらした状態で表示することで、スライドショーの全体の所要時間を変えることなく、一つの映像の表示時間を長くすることができる。映像の表示時間内で、前後の音声データの位置関係を調整することで、音声データがない映像がある場合や短い音声データのある映像がある場合の時間を使って、長い音声データの再生上限時間を長くすることができる。 Therefore, unlike slideshows that display images one by one, multiple images are displayed with their start and end times shifted, so that the display time for one image can be reduced without changing the overall duration of the slideshow. Can be long. By adjusting the positional relationship between the previous and next audio data within the video display time, the upper limit for playback of long audio data is used by using the time when there is video without audio data or when there is video with short audio data. The time can be lengthened.

また、同時に複数の映像が見えており、再生される音声はこの内の一つの映像についてのものとなり、見えていない映像の音声を再生することはない。 In addition, a plurality of images can be seen at the same time, and the reproduced sound is for one of these images, and the sound of the invisible image is not reproduced.

また、複数表示される映像のすべての音声が開始位置と終了位置を調整して再生されるので、全ての映像について少なくても一部の音声は再生される。 In addition, since all the sounds of the plurality of displayed videos are reproduced with the start position and the end position adjusted, at least some of the sounds are reproduced for all the images.

（実施例１の変形例１）
実施例１では、音声の再生時間が重ならないように位置を調整しているが、音声の再生時間の一部が重なる時間幅を設けてもよい。 (Modification 1 of Example 1)
In the first embodiment, the position is adjusted so that the audio reproduction times do not overlap, but a time width in which a part of the audio reproduction times overlap may be provided.

尚、この時間幅においては、前方の音量を徐々に小さくして消し、後方の音量を徐々に大きくする音響効果を付与してもよい。 In this time range, an acoustic effect may be applied in which the front volume is gradually reduced and turned off, and the rear volume is gradually increased.

（実施例１の変形例２）
実施例１では、画像データについて、始めから順に配置して調整しているが、終りから順に配置して調整するようにしてもよい。 (Modification 2 of Example 1)
In the first embodiment, the image data is arranged and adjusted sequentially from the beginning, but may be arranged and adjusted sequentially from the end.

（実施例１の変形例３）
実施例１では、移動可能性について、直前のデータについてのみ判断対象としているが、直前のデータに対して更に直前データを判断対象とするというように再帰的に移動可能性を判断してもよい。 (Modification 3 of Example 1)
In the first embodiment, the possibility of movement is determined only for the immediately preceding data. However, the possibility of movement may be recursively determined such that the immediately preceding data is further determined with respect to the immediately preceding data. .

（実施例１の変形例４）
実施例１において述べた音声が再生されている映像を目で見て分かるように吹き出しや映像枠の強調表示以外に、耳で聞いても分かるように音声を複数チャンネルで位相をずらして出力するようにしてもよい。 (Modification 4 of Example 1)
In addition to speech balloons and video frame highlighting so that the video in which the audio described in the first embodiment is reproduced can be seen with eyes, the audio is output with a plurality of channels shifted in phase so that it can be seen with the ear. You may do it.

（実施例１の変形例５）
実施例１では、非音声区間の省いて、再生する音声区間を短くしているが、音声区間についても話速変換を用いて再生時間を短くするようにしてもよい。 (Modification 5 of Example 1)
In the first embodiment, the non-speech section is omitted and the speech section to be played back is shortened. However, the playback time may be shortened using speech speed conversion for the speech section as well.

また、途中の音声区間を省略し、開始部分と終了部分を必ず再生するようにしてもよい。 In addition, the middle voice segment may be omitted and the start part and end part may be reproduced without fail.

（実施例１の変形例６）
実施例１において、音声区間に対して音声認識を適用して文字化し、音声区間の再生時間を短くする場合には、画像の表示中に音声認識結果の文字を表示するようにしてもよい。 (Modification 6 of Example 1)
In the first embodiment, when voice recognition is applied to a voice section for characterization and the playback time of the voice section is shortened, the voice recognition result character may be displayed during image display.

（実施例２）
実施例１は、映像が静止画像のものの例であるが、実施例２は少なくとも一部は動画像を含むものの例である。図４（ｃ）に映像再生装置によって再生されるスライドショーの各映像の例を示す。映像番号２と映像番号７が音声付動画像であり、映像番号２は比較的短く、映像番号７は比較的長い。 (Example 2)
The first embodiment is an example in which the video is a still image, but the second embodiment is an example in which at least a part includes a moving image. FIG. 4C shows an example of each video of the slide show reproduced by the video reproduction device. Video number 2 and video number 7 are moving images with sound, video number 2 is relatively short, and video number 7 is relatively long.

映像が動画像の場合、動画像の再生時間もタイムチャートの配置対象となる。 When the video is a moving image, the playback time of the moving image is also an arrangement target of the time chart.

音声付動画像の場合、音声と動画像の再生タイミングをとる必要があるので、動画像の再生内容と独立に音声の再生内容を移動することはできない。動画像再生内容と音声再生内容の移動は同時に行ない、内容が移動できない場合には再生を開始終了する再生枠の移動を行なう。 In the case of a moving image with sound, it is necessary to take the reproduction timing of the sound and the moving image, and therefore the reproduced content of the sound cannot be moved independently of the reproduced content of the moving image. The moving image playback content and the audio playback content are moved simultaneously. If the content cannot be moved, the playback frame for starting and ending playback is moved.

本発明の実施例２のブロック図は、図１（ａ）のブロック図と同じなので説明を省略する。 The block diagram of the second embodiment of the present invention is the same as the block diagram of FIG.

フローチャートは図２、図３と同じであるが、動画像に対しての処理について説明を補足する。 Although the flowcharts are the same as those in FIGS. 2 and 3, a supplementary explanation will be given for the processing for moving images.

図２のフローチャートのステップＳ２０９〜Ｓ２１１は映像が動画像である場合の初期配置の処理である。ステップＳ２０９は動画像再生枠を設定する。再生時間の配置や移動方法についての記号説明を図１３に示す。図１３（ａ）が変数記号の説明であり、図１３（ｂ）、図１３（ｃ）が位置関係を図示したものである。動画像データの時間が表示時間よりも短い場合の図が図１３（ｂ）、動画像データの時間が表示時間よりも長い場合の図が図１３（ｃ）である。 Steps S209 to S211 in the flowchart of FIG. 2 are initial arrangement processing when the video is a moving image. In step S209, a moving image playback frame is set. FIG. 13 shows a symbol explanation for the arrangement and movement method of the reproduction time. FIG. 13A illustrates the variable symbols, and FIGS. 13B and 13C illustrate the positional relationship. FIG. 13B shows a case where the time of moving image data is shorter than the display time, and FIG. 13C shows a case where the time of moving image data is longer than the display time.

動画像データ（ＩＳ，ＩＥ）は表示時間（ＰＳ，ＰＥ）よりも長くなる可能性があるので、表示時間の中で、動画像を再生するための時間枠を動画像再生枠（ＭＳ，ＭＥ）として設ける。また、動画像再生枠で指定された範囲内で音声を再生する音声再生枠（ＶＳ，ＶＥ）を設ける。ＬＶＥは直前の音声の終了位置であり、図５（ｂ）には記述していないが、データを前へ移動可能かどうかの判定に使用する。 Since the moving image data (IS, IE) may be longer than the display time (PS, PE), the time frame for reproducing the moving image in the display time is set as the moving image reproduction frame (MS, ME). ). In addition, audio playback frames (VS, VE) for playing back audio within the range specified by the moving image playback frame are provided. LVE is the end position of the immediately preceding voice and is not described in FIG. 5B, but is used to determine whether data can be moved forward.

再生枠を設定するための初期配置の方法について図１４（ａ）と図１４（ｂ）に示す。 FIG. 14A and FIG. 14B show the initial arrangement method for setting the playback frame.

動画像再生枠の初期配置の方法として、次の３種類を上げるが、これ以外の方法でもよい。
１）動画像の中心位置を注目時間の中心位置に合わせる。
２）動画像の終了位置を注目時間の終了位置に合わせる。
３）動画像の開始位置を注目時間の開始位置に合わせる。 The following three types of moving image playback frame initial arrangement methods are listed, but other methods may be used.
1) Adjust the center position of the moving image to the center position of the attention time.
2) Match the end position of the moving image to the end position of the time of interest.
3) The start position of the moving image is matched with the start position of the attention time.

動画像の長さが表示時間よりも短い場合の例が図１４（ａ）、表示時間より長い場合の例が図１４（ｂ）である。初期配置位置に置いた時、表示時間の境界を越える部分については動画像再生が行なわれないように動画像再生枠を設定する。
次にステップＳ２１０で音声データがあるかどうかの判断を行ない、あると判断した場合には、ステップＳ２１１で音声再生枠設定を行なう。音声再生枠の初期最大長としては、動画像再生枠の範囲内で以下の３種類を想定するが、これ以外のものでもよい。
１）注目時間の長さにする。
２）動画像の再生時間に合わせる。
３）注目時間の２倍にする。 FIG. 14A shows an example in which the length of the moving image is shorter than the display time, and FIG. 14B shows an example in which the length of the moving image is longer than the display time. When placed at the initial arrangement position, the moving image playback frame is set so that the moving image playback is not performed for the portion exceeding the boundary of the display time.
Next, in step S210, it is determined whether there is audio data. If it is determined that there is audio data, an audio playback frame is set in step S211. As the initial maximum length of the audio reproduction frame, the following three types are assumed within the range of the moving image reproduction frame, but other initial lengths may be used.
1) Make the attention time length.
2) Match the playback time of the moving image.
3) Double the attention time.

音声再生枠の設定例も図１４（ａ）と図１４（ｂ）に示す。なお、ここでは動画像であるかの判定を行ない、それに音声データがあるかの判定を行なっているが、動画像と同期しない音声が入っている可能性がある。動画像自体には同期する音声がなく、無音の動画像に音声メモや音楽が付与されている場合には静止画像に付与されている音声データと同じ扱いとする。 An example of setting the audio playback frame is also shown in FIGS. 14 (a) and 14 (b). Here, it is determined whether or not it is a moving image, and it is determined whether or not there is audio data. However, there is a possibility that audio that is not synchronized with the moving image is included. If the moving image itself has no sound to be synchronized and a voice memo or music is added to the silent moving image, the moving image is handled in the same way as the audio data added to the still image.

図２のフローチャートの処理を適用後、図４（ｃ）のスライドショー映像の例では、図１６（ａ）のタイムチャートが生成される。映像番号２の動画像は比較的短いものなので、動画像再生枠はもとの動画像のサイズと同じである。映像番号７の動画像は比較的長いものなので、動画像再生枠は映像表示時間の枠内に納まるよう動画像データをトリミングする。図１６（ａ）の音声データの再生枠は、注目時間の２倍の長さにしてあり、映像番号２の音声再生枠は元のサイズと同じであるが、映像番号７の音声再生枠は注目時間の２倍にトリミングされる。 After applying the processing of the flowchart of FIG. 2, in the example of the slide show video of FIG. 4C, the time chart of FIG. 16A is generated. Since the moving image of the video number 2 is relatively short, the moving image reproduction frame is the same as the size of the original moving image. Since the moving image of the video number 7 is relatively long, the moving image data is trimmed so that the moving image playback frame fits within the frame of the video display time. The playback frame for audio data in FIG. 16A is twice as long as the time of interest, and the audio playback frame for video number 2 is the same as the original size, but the audio playback frame for video number 7 is Trimmed to twice the attention time.

図３のフローチャートの処理では、ステップＳ３１８、Ｓ３２７の直前音声が前へ動くかどうかの判定と、ステップＳ３２１の後へ動くかどうかの判定で、動画像特有の処理が必要である。 In the process of the flowchart of FIG. 3, a process specific to a moving image is necessary in determining whether or not the immediately preceding sound in steps S318 and S327 moves forward and in determining whether or not it moves after step S321.

移動量計算のフローチャートを図１５に示す。Ｓ２００１で動画像かどうかの判断を行ない、動画像でないと判断された場合にはステップＳ２００６に進み、実施例１で説明したものと同じ音声再生時間の移動量計算を行なって終了する。
ステップＳ２００１で動画像であると判断された場合、ステップＳ２００２へ進み、動画像再生枠が移動可能であるかの判断を行なう。動画像再生枠が移動可能かどうかは、動画像再生枠が映像表示時間内を移動可能かどうかで判断する。初期配置において動画像の長さが表示時間の前後の境界を越える場合には、動画像再生枠が表示時間に合わせて設定され、境界を越える移動は不可能となる。 A flowchart of the movement amount calculation is shown in FIG. In step S2001, it is determined whether or not the image is a moving image. If it is determined that the image is not a moving image, the process proceeds to step S2006, where the movement amount calculation for the same audio reproduction time as described in the first embodiment is performed and the process ends.
If it is determined in step S2001 that the image is a moving image, the process advances to step S2002 to determine whether the moving image reproduction frame can be moved. Whether or not the moving image playback frame can be moved is determined by whether or not the moving image playback frame can move within the video display time. When the length of the moving image exceeds the boundary before and after the display time in the initial arrangement, the moving image playback frame is set in accordance with the display time, and movement beyond the boundary becomes impossible.

ステップＳ２００２で動画像再生枠が移動可能と判断された場合には、ステップＳ２００３へ進み動画像再生枠の移動量計算を行なう。動画像再生枠が移動可能な場合の移動量の計算方法を図１４（ｃ）、（ｄ）に示す。図１４（ｃ）は前に移動する場合、図１４（ｄ）は後へ移動する場合である。前へ移動可能かどうかの判断基準として、次の４種類の例を挙げるが、これ以外の基準でもよい。
１）動画像再生枠の終了位置が映像の注目開始位置より後にある。
２）動画像再生枠の終了位置が映像の注目中心位置より後にある。
３）動画像再生枠の開始位置が映像の注目開始位置より後にある。
４）動画像再生枠の終了位置が映像の注目終了位置より後にある。 If it is determined in step S2002 that the moving image playback frame is movable, the process advances to step S2003 to calculate the moving image playback frame movement amount. FIGS. 14C and 14D show how to calculate the movement amount when the moving image playback frame is movable. FIG. 14C shows a case of moving forward, and FIG. 14D shows a case of moving backward. The following four types of examples are given as criteria for determining whether or not it is possible to move forward, but other criteria may be used.
1) The end position of the moving image playback frame is after the attention start position of the video.
2) The end position of the moving image playback frame is after the focus center position of the video.
3) The start position of the moving image playback frame is after the attention start position of the video.
4) The end position of the moving image playback frame is after the attention end position of the video.

移動可能量としては、音声データがそのデータの直前音声と重ならず、動画像データが映像表示時間より前にはみ出さない範囲内で設定する。移動量は、移動可能量と重なる区間の量の小さい方となる。 The movable amount is set within a range in which the audio data does not overlap with the audio immediately before the data and the moving image data does not protrude before the video display time. The movement amount is the smaller amount of the section overlapping the movable amount.

また、後へ移動可能かどうかの判断基準として、次の４種類の例を挙げるが、これ以外の基準でもよい。
１）動画像再生枠の開始位置が映像の注目終了位置より前にある。
２）動画像再生枠の開始位置が映像の注目中心位置より前にある。
３）動画像再生枠の終了位置が映像の注目終了位置より前にある。
４）動画像再生枠の開始位置が映像の注目開始位置より前にある。 The following four types of examples are given as criteria for determining whether or not it is possible to move backward, but other criteria may be used.
1) The start position of the moving image playback frame is before the notice end position of the video.
2) The start position of the moving image playback frame is ahead of the center position of interest in the video.
3) The end position of the moving image playback frame is before the target end position of the video.
4) The start position of the moving image playback frame is before the notice start position of the video.

移動可能量としては、動画像データが映像表示時間の境界を越えない範囲内で設定する。移動量は、移動可能量と重なる区間の量の小さい方となる。 The movable amount is set within a range in which the moving image data does not exceed the video display time boundary. The movement amount is the smaller amount of the section overlapping the movable amount.

実施例２では、動画像再生枠は映像表示時間内に限定しており、動画像再生枠の移動が可能なのは、動画像の再生時間が映像表示時間よりも短いものに限定される。この場合の移動は、データの配置時間の移動を意味する。図１６（ｂ）に示すように映像番号２の動画像データと音声データは前に移動可能である。 In the second embodiment, the moving image playback frame is limited within the video display time, and the moving image playback frame can be moved only when the moving image playback time is shorter than the video display time. The movement in this case means movement of data arrangement time. As shown in FIG. 16B, the moving image data and audio data of video number 2 can be moved forward.

ステップＳ２００３の動画像再生枠の移動量計算後、ステップＳ２００４で移動量が十分であるかどうかの判断を行ない、十分である場合には終了する。 After calculating the moving amount of the moving image playback frame in step S2003, it is determined in step S2004 whether the moving amount is sufficient, and if it is sufficient, the process ends.

ステップＳ２００４で移動量が十分でないと判断された場合と、ステップＳ２００２で動画像再生枠の移動が不可能と判断された場合には、ステップＳ２００５で音声再生枠の移動可能量の計算を行ない終了する。音声再生枠が移動可能な場合の移動可能量の計算方法を図１４（ｅ）、（ｆ）に示す。図１４（ｅ）は前に移動する場合、図１４（ｆ）は後に移動する場合である。前へ移動可能かどうかの判断基準として、次の４種類の例を挙げるが、これ以外の基準でもよい。
１）音声再生枠の終了位置が映像の注目開始位置より後にある。
２）音声再生枠の終了位置が映像の注目中心位置より後にある。
３）音声再生枠の開始位置が映像の注目開始位置より後にある。
４）音声再生枠の終了位置が映像の注目終了位置より後にある。 If it is determined in step S2004 that the amount of movement is not sufficient, or if it is determined in step S2002 that the moving image playback frame cannot be moved, the amount of movement of the audio playback frame is calculated in step S2005. To do. FIGS. 14E and 14F show a method for calculating the movable amount when the audio reproduction frame is movable. FIG. 14E shows the case of moving forward, and FIG. 14F shows the case of moving backward. The following four types of examples are given as criteria for determining whether or not it is possible to move forward, but other criteria may be used.
1) The end position of the audio playback frame is after the attention start position of the video.
2) The end position of the audio playback frame is after the focus center position of the video.
3) The start position of the audio playback frame is after the attention start position of the video.
4) The end position of the audio playback frame is after the target end position of the video.

移動可能量としては、音声再生枠がそのデータの直前音声と重ならず、音声再生枠が動画像再生枠より前に越えない範囲内で設定する。移動量は、移動可能量と重なる区間の量の小さい方となる。 The movable amount is set within a range in which the audio reproduction frame does not overlap with the immediately preceding audio of the data and the audio reproduction frame does not exceed the moving image reproduction frame. The movement amount is the smaller amount of the section overlapping the movable amount.

また、後へ移動可能かどうかの判断基準として、次の４種類の例を挙げるが、これ以外の基準でもよい。
１）音声再生枠の開始位置が映像の注目終了位置より前にある。
２）音声再生枠の開始位置が映像の注目中心位置より前にある。
３）音声再生枠の終了位置が映像の注目終了位置より前にある。
４）音声再生枠の開始位置が映像の注目開始位置より前にある。 The following four types of examples are given as criteria for determining whether or not it is possible to move backward, but other criteria may be used.
1) The start position of the audio playback frame is before the notice end position of the video.
2) The start position of the audio playback frame is before the focus center position of the video.
3) The end position of the audio playback frame is before the noticed end position of the video.
4) The start position of the audio playback frame is before the attention start position of the video.

移動可能量としては、音声再生枠が映像表示時間の境界を越えない範囲内で設定する。移動量は、移動可能量と重なる区間の量の小さい方となる。 The movable amount is set so that the audio playback frame does not exceed the video display time boundary. The movement amount is the smaller amount of the section overlapping the movable amount.

図１５のフローチャートでは、移動の種別と移動可能な量を計算するものであり、図３のフローチャートのステップＳ３１９、Ｓ３２８、Ｓ３２２では計算した結果に基づく移動を行なう。 In the flowchart of FIG. 15, the type of movement and the movable amount are calculated. In steps S319, S328, and S322 of the flowchart of FIG. 3, movement based on the calculated result is performed.

ステップＳ３１９、Ｓ３２８は直前音声を前へ移動する。図１６（ｃ）は図１６（ｂ）の映像番号２の音声データを前へ移動した結果である。 In steps S319 and S328, the immediately preceding voice is moved forward. FIG. 16C shows the result of moving the audio data of video number 2 in FIG. 16B forward.

ステップＳ３２２は対象の音声データを後へ移動する。図１６（ｄ）に示すように映像番号７の音声再生枠は直前の音声データと重複しているため、重複の調整を行なわなくてはならないが、動画像データは映像表示枠の境界を越えており、ステップＳ２００２の判断で動画像再生枠は移動が行なえない状態である。そこで、ステップＳ２００５で音声再生枠を後へ移動させる量の計算を行ない、その結果に基づいて、音声再生枠の移動を行なう。移動した後の状態を図１６（ｅ）に示す。移動後の図１６（ｅ）では、映像番号８の音声データについて、図３のフローチャートステップＳ３１２の処理で直前の映像番号７の音声再生枠が重複していると判断される。ステップＳ３１２からＳ３１３、Ｓ３１８と進み、映像番号７は前に移動不可能であるため、ステップＳ３２１へ進み、ステップＳ３２１で映像番号８が後へ移動可能かを判定する。この場合に移動可能なのでステップＳ３２２で映像番号８の音声データの後への移動を行なう。移動後の図１６（ｆ）でもまだ直前の映像番号７の音声再生枠との重複が起こっているので、ステップＳ３２３からＳ３２４へ進む。ステップＳ３２４で切断可能としてステップＳ３２５へ進み、映像番号７の音声再生枠の後ろを切断する。こうして調整した結果を図１６（ｇ）に示す。 In step S322, the target audio data is moved backward. As shown in FIG. 16 (d), the audio reproduction frame of the video number 7 overlaps with the immediately preceding audio data, so the overlap must be adjusted, but the moving image data exceeds the boundary of the video display frame. In step S2002, the moving image playback frame cannot be moved. Therefore, in step S2005, the amount of movement of the audio reproduction frame is calculated, and the audio reproduction frame is moved based on the result. The state after the movement is shown in FIG. In FIG. 16 (e) after the movement, it is determined that the audio reproduction frame of the immediately preceding video number 7 is overlapped in the processing of the flowchart step S312 of FIG. From step S312, the process proceeds from step S312 to S313 and S318. Since video number 7 cannot be moved forward, the process proceeds to step S321. In step S321, it is determined whether video number 8 can be moved backward. In this case, since movement is possible, the audio data of the video number 8 is moved backward in step S322. In FIG. 16 (f) after the movement, since there is still an overlap with the audio reproduction frame of the immediately preceding video number 7, the process proceeds from step S323 to S324. In step S324, it is determined that cutting is possible, and the process advances to step S325 to cut the back of the audio reproduction frame of video number 7. The adjustment result is shown in FIG.

こうして作成されたタイムチャートにより、映像番号７の映像はスライドショーに表示され始めた段階で動画像が再生されるが、音声は注目時間に入ってから再生され始める。この時再生される音声は再生されている動画像と同期するものである。また、映像番号２の映像は、スライドショーに表示された時点では、動画像表示は行なわれずに静止画像表示で開始され、動画像再生時間になって動画像の再生と音声の再生が始まる。動画像と音声の再生が終了してから映像表示時間が終了するまでは、静止画像が表示される。動画像データの時間が映像表示時間より短い場合には、上記のように静止画像で補填してもよいし、前後の動画像を繰り返すように再生してもよい。 With the time chart thus created, the moving image is played back when the video of the video number 7 starts to be displayed in the slide show, but the audio starts to be played after the attention time has entered. The sound reproduced at this time is synchronized with the moving image being reproduced. In addition, when the video of video number 2 is displayed in the slide show, the video is not displayed but the still image is displayed, and the video playback and the audio playback are started at the video playback time. The still image is displayed until the video display time ends after the reproduction of the moving image and the sound ends. When the time of moving image data is shorter than the video display time, it may be supplemented with a still image as described above, or may be reproduced so as to repeat the preceding and following moving images.

なお、図１４（ａ）と図１４（ｂ）にある初期配置の方法では、動画像の内容と関係なく全体での位置関係により動画像再生される部分が決められる。図２２に、中央位置を合わせる方法での動画像再生枠の設定例を示す。この例では映像表示時間が動画像データの時間よりも短いため、中間の部分だけが再生される。動画像の末尾に重要と思われるシーンがある場合にはそれを見逃してしまう可能性がある。
これを防ぐために、クライマックスシーンを検知し、その箇所を必ず表示するように配置する手段を設ける。このときの、情報処理装置のブロック図を図１（ｄ）に示す。 In the initial arrangement method shown in FIGS. 14A and 14B, the portion where the moving image is reproduced is determined by the overall positional relationship regardless of the content of the moving image. FIG. 22 shows a setting example of the moving image playback frame in the method of aligning the center position. In this example, since the video display time is shorter than the time of the moving image data, only the middle part is reproduced. If there is a scene that seems to be important at the end of the moving image, it may be missed.
In order to prevent this, a means for detecting the climax scene and arranging it so as to always display the location is provided. A block diagram of the information processing apparatus at this time is shown in FIG.

２３０８は、動画像の中のクライマックスシーンを検出するクライマックス検出手段である。 Reference numeral 2308 denotes climax detection means for detecting a climax scene in the moving image.

１０１〜１０７についてはブロック図１（ａ）と同様であるため、同符号を付し、その説明を省略する。 101 to 107 are the same as those in the block diagram (a) of FIG.

クライマックス検出には様々な方式がある。クライマックス部分を示すフラグを人手で予め付与しておきそのフラグ箇所を検出するものや、自動で音声を解析して拍手や歓声などの箇所を検出するもの、自動で映像を解析して動きのあった部分を検出するものなどである。ここではその方式を限定しない。 There are various methods for climax detection. A flag that indicates the climax part is manually assigned in advance to detect the flag part, a part that automatically analyzes the voice to detect a part such as applause or cheer, and a part that automatically analyzes the video to detect movement. For example, it detects the part of the image. Here, the method is not limited.

図１８にクライマックスシーンを検出して設定を行なった例を示す。動画像の末尾の箇所をクライマックスシーンとして検出した場合、その位置を注目時間の終了位置に合わせ、それより前に動画像再生枠と音声再生枠を設定する。この場合には、映像表示が開始した時点で、実際のデータの途中から動画像が再生される。注目時間に入る前に音声再生が始まり、注目時間が終わるときに動画像と音声の再生が終了する。動画像の再生が終わってから映像表示時間が終わるまでの時間には、動画像の終了時の静止画を表示しておくなどしてもよい。 FIG. 18 shows an example in which a climax scene is detected and set. When the last part of the moving image is detected as a climax scene, the position is matched with the end position of the attention time, and the moving image reproduction frame and the audio reproduction frame are set before that. In this case, the moving image is reproduced from the middle of the actual data when the video display is started. The audio playback starts before the attention time starts, and the playback of the moving image and the sound ends when the attention time ends. For example, a still image at the end of the moving image may be displayed during the time from the end of the playback of the moving image to the end of the video display time.

なお、動画像の音声データについても音声区間検出を適用し、音声再生枠の設定や切断について音声区間と非音声区間の切り分けを行なうようにしてもよい。その場合、図１（ｄ）のブロック図に図１（ｃ）の音声区間検出部１２０８が付与される。 Note that voice segment detection may also be applied to audio data of moving images, and voice segments and non-speech segments may be separated for setting and cutting of a voice playback frame. In that case, the speech section detection unit 1208 in FIG. 1C is added to the block diagram in FIG.

また、注目時間を設けずに、音声データが表示時間内に収まるような配置にしてもよい。そのための音声再生時間の初期配置と移動の判断基準の例を図１９、図２０に示す。再生時間の配置や移動方法についての記号説明が図１９である。図１９（ａ）が変数記号の説明であり、図１９（ｂ）、（ｃ）が位置関係を図示したものである。動画像データの時間が表示時間よりも短い場合の図が図１９（ｂ）、動画像データの時間が表示時間よりも長い場合の図が図１９（ｃ）である。初期配置において、動画像データの時間が映像表示時間より短い場合の例が図２０（ａ）、長い場合の例が図２０（ｂ）である。動画像データの時間が映像表示時間より短い場合において、前へ移動可能かどうかの判断基準を図２０（ｃ）、後へ移動可能かどうかの判断基準を図２０（ｄ）に示す。動画像データの時間が映像表示時間より長い場合において、前へ移動可能かどうかの判断基準を図２０（ｅ）、後へ移動可能かどうかの判断基準を図２０（ｆ）に示す。 Further, the audio data may be arranged within the display time without providing the attention time. 19 and 20 show examples of the initial arrangement of the audio reproduction time and the determination criterion for movement for that purpose. FIG. 19 is a symbol explanation for the arrangement and movement method of the reproduction time. FIG. 19A illustrates the variable symbols, and FIGS. 19B and 19C illustrate the positional relationship. FIG. 19B shows a case where the time of moving image data is shorter than the display time, and FIG. 19C shows a case where the time of moving image data is longer than the display time. In the initial arrangement, FIG. 20A shows an example in which the time of moving image data is shorter than the video display time, and FIG. 20B shows an example in which the time is long. When the moving image data time is shorter than the video display time, FIG. 20C shows a criterion for determining whether or not it is possible to move forward, and FIG. 20D shows a criterion for determining whether or not it can be moved backward. When the time of moving image data is longer than the video display time, FIG. 20 (e) shows a criterion for determining whether or not it can be moved forward, and FIG. 20 (f) shows a criterion for determining whether or not it can be moved backward.

なお、上記の例では各映像の映像表示時間を同じにしていたが、映像ごとに映像表示時間を指定できるようにしてもよい。指定方法は、手動で入力してもよいし、音声や画像を解析して映像の重要度を決定し定時時間に反映するようにしてもよく、その方法は限定しない。映像ごとに映像表示時間が異なる場合のタイムチャートの作成例を図２１に示す。図２１（ａ）が映像ごとに予め設定された映像表示時間と注目時間である。この設定に合わせて図４（ａ）の音声データを初期配置したものが図２１（ｂ）である。音声の重複箇所は見られるが、映像表示時間の長いものがあるため、音声データの間隔が空いている箇所がある。 In the above example, the video display time of each video is the same, but the video display time may be designated for each video. The designation method may be manually input, or the importance of the video may be determined by analyzing voice or image and reflected in the fixed time. The method is not limited. FIG. 21 shows an example of creating a time chart when the video display time is different for each video. FIG. 21A shows the video display time and attention time preset for each video. FIG. 21B shows the initial arrangement of the audio data in FIG. 4A in accordance with this setting. Although there are some overlapping parts of the audio, there are some parts where the interval of the audio data is free because there is a long video display time.

なお、上記の例では、初期配置と重複の調整を別々のフローチャートで行なっていたが、同じフローチャートの中で一つ配置しては調節することを繰り返すようにしてもよい。 In the above example, the initial arrangement and the overlap adjustment are performed in separate flowcharts, but one arrangement may be arranged in the same flowchart and the adjustment may be repeated.

また、上記の例では、映像表示時間は映像再生開始前に予め決定されているが、映像再生中に表示時間を変える手段を設けてもよい。この場合のブロック図を図１（ｅ）に示す。２９０１が表示時間変更部である。時間変更指定手段として図１（ｅ）の２８０１のボタンを使って数値入力してもよいし、ボタンの変わりにスライダーバーやタッチパネルを使って速度を変更するようにしてもよい。 In the above example, the video display time is determined in advance before the start of video playback, but means for changing the display time during video playback may be provided. A block diagram in this case is shown in FIG. Reference numeral 2901 denotes a display time changing unit. As the time change specifying means, numerical values may be input using the 2801 button in FIG. 1E, or the speed may be changed using a slider bar or a touch panel instead of the buttons.

映像再生中に速度を変更する場合には、予め全部の映像についてタイムチャートを作成するのではなく、映像を一つ配置しては重複を調節する方法を用いる。また、音声データは音声編集部１０４で切断するのでなく、再生時間調整部１０３で再生する時間を制御する。通常の再生時では、あらかじめ設定された表示時間で一定個数の映像のタイムチャートを作成し再生を行なう。２９０１から表示時間の変更操作があった時に、すでにタイムチャートを作成している表示中の映像と表示直前の映像についてタイムチャートを作成し直して映像再生を継続する。 When changing the speed during video playback, a time chart is not created for all the videos in advance, but a method of adjusting the overlap by arranging one video is used. In addition, the audio data is not cut by the audio editing unit 104, but the reproduction time is controlled by the reproduction time adjustment unit 103. During normal playback, a time chart of a certain number of videos is created and played back with a preset display time. When a display time change operation is performed from 2901, a time chart is re-created for the video currently being displayed and the video just before the display for which the time chart has already been created, and video playback is continued.

図２２に、作成途中で表示時間を変更するタイムチャートの例を示す。図２２（ａ）は、最初から６個の映像についてデフォルトの表示時間で作成したタイムチャートである。タイムチャートは現在表示している映像とその直後の映像を１つ含むタイミングで更新する。映像番号３の映像が注目位置にある時でかつ映像番号５の映像が表示開始された時に、表示時間を２倍にする変更３００１があった場合、図２２（ｂ）のようにタイムチャートを更新する。この例では変更指示は即座に反映させず、現在の映像が注目位置を外れるタイミングで表示時間の変更を反映するものである。図２２（ｂ）は、映像番号６の表示が開始された時点のタイムチャートであり、映像番号７までの再生時間を含む。しかし、表示速度の変更は直後に反映させるようにしても構わない。 FIG. 22 shows an example of a time chart for changing the display time during the creation. FIG. 22A is a time chart created for the six videos from the beginning with the default display time. The time chart is updated at a timing including the currently displayed image and one immediately following image. When there is a change 3001 that doubles the display time when the video of video number 3 is at the position of interest and the video of video number 5 starts to be displayed, a time chart is displayed as shown in FIG. Update. In this example, the change instruction is not reflected immediately, but the change of the display time is reflected at the timing when the current video deviates from the target position. FIG. 22B is a time chart when the display of the video number 6 is started, and includes the reproduction time up to the video number 7. However, the change in display speed may be reflected immediately after.

表示時間を２倍にした状態で映像を継続し、映像番号９までのタイムチャートを作成した例を図２２（ｃ）に示す。映像番号６の映像が注目位置にある時でかつ映像番号８の映像が表示開始された時に、表示時間を１／４にする変更３００２があった場合、図２２（ｄ）のようにタイムチャートを更新する。 FIG. 22C shows an example in which the video is continued with the display time doubled and a time chart up to video number 9 is created. When there is a change 3002 that reduces the display time to 1/4 when the video number 6 is at the position of interest and the video number 8 starts to be displayed, a time chart as shown in FIG. Update.

また、表示時間を変更した場合、表示時間が一定時間より短くなった場合には音声再生をしても内容が理解できない場合がある。表示時間が一定時間より短くなった場合、音声再生を無効にしてもよい。もしくは一定時間よりも短くする表示時間変更を無効にしてもよい。
なお、映像再生中に表示時間を変更した場合、すでに再生を開始している音声は前に移動することや後に移動できないという条件が、移動可能かどうかの判定に加わる。 In addition, when the display time is changed, if the display time is shorter than a certain time, the content may not be understood even if audio is played. When the display time becomes shorter than a certain time, the audio reproduction may be disabled. Or you may invalidate the display time change made shorter than fixed time.
Note that, when the display time is changed during video playback, the condition that the audio that has already started playback moves forward or cannot be moved later is added to the determination of whether or not movement is possible.

したがって、非音声区間を検出して前後を取り除くことにより音声データを短くすることができる。それでも、一つの映像の表示時間内に収まらない長い音声データがある場合には、非音声区間で切ることにより、言葉の途中で音が切れることを防ぐ。 Therefore, the voice data can be shortened by detecting the non-voice section and removing the front and rear. Still, when there is long audio data that does not fit within the display time of one video, the sound is prevented from being cut off in the middle of a word by cutting it in a non-voice segment.

また、クライマックス検出手段を設けて動画像の表示時間内にクライマックス位置を合わせることで、スライドショーで見る場合に長い動画像の重要な部分を見逃すことを防ぐ。 Further, by providing a climax detection means and aligning the climax position within the moving image display time, it is possible to prevent an important part of a long moving image from being missed when viewed in a slide show.

また、映像の映像表示の時間を映像ごとに予め指定する手段を設けることで、映像の内容に合わせて表示時間を調整できる。 Further, by providing means for designating the video display time for each video in advance, the display time can be adjusted according to the content of the video.

また、表示時映像再生中に表示時間を変更する手段を設けることで、再生途中での表示速度の変更が行え、かつ音声再生時間も調節される。 In addition, by providing means for changing the display time during display video playback, the display speed can be changed during playback, and the audio playback time can be adjusted.

（実施例２の変形例）
実施例２の初期設定の動画像再生枠は、表示時間を最大としたが、表示時間より小さな値を予め与えるようにしてもよい。 (Modification of Example 2)
The default moving image playback frame of the second embodiment maximizes the display time, but a value smaller than the display time may be given in advance.

（その他の実施例）
なお、本発明の目的は、前述した実施例の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体を、システム等に供給し、当該システム等が記憶媒体に格納されたプログラムコードを読み出し実行することによっても達成される。 (Other examples)
It is to be noted that the object of the present invention is to supply a storage medium storing software program codes for realizing the functions of the above-described embodiments to a system or the like, and the system or the like reads and executes the program codes stored in the storage medium. Can also be achieved.

この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。 In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention.

プログラムコードを供給するための記憶媒体としては、例えば、ＣＤ−ＲＯＭ等を用いることができる。 As a storage medium for supplying the program code, for example, a CD-ROM or the like can be used.

また、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）などが実際の処理の一部または全部を行ない、その処理によって前述した実施形態の機能が実現される場合も含まれる。 Further, based on the instruction of the program code, an OS (operating system) running on the computer may perform part or all of the actual processing, and the functions of the above-described embodiments may be realized by the processing. included.

Claims

An information processing apparatus capable of displaying each of a plurality of images on a display unit according to a predetermined order for a certain period of time and simultaneously displaying two or more images,
Adjustment that adjusts the start and end times of the output of the sound associated with each image so that the sound associated with each of the images is not output at the same time so that the time for displaying each image is not exceeded. Means,
An information processing apparatus comprising: output control means for outputting sound based on a result of adjustment by the adjustment means.

An information processing apparatus that displays a plurality of images according to a preset timing,
The time point PE (m) at which the display of the m (m> 0) -th image ends is after the time point PS (m + 1) at which the display of the m + 1-th image starts, and the time point at which the display of the m-th image starts. A point in time at which image display is started such that the time from PS (m) to PE (m) is equal to the time from PS (m + 1) to the point PE (m + 1) at which the display of the (m + 1) -th image ends. A determining means for determining the end point;
Acquisition means for acquiring data representing sound associated with each image;
The time when the reproduction of the sound associated with each image starts after the time when the display of the image starts, and the time when the reproduction of the sound associated with the image ends. A setting means for setting a time point at which sound reproduction starts and a time point to end so as to be before the time point when the display ends;
When the time point VS (m + 1) at which the reproduction of the sound associated with the m + 1st image is started is before the time point VE (m) at which the reproduction of the sound associated with the mth image is finished, VS (m + 1) Becomes the time after VE (m), and the time point VE (m + 1) at which the reproduction of the sound associated with the (m + 1) th image ends, the time point VS (m + 2) at which the reproduction of the sound associated with the (m + 2) th image starts. ) An information processing apparatus comprising: a changing unit that changes a time point of starting and ending reproduction of a sound associated with the (m + 1) th image so as to be before.

In the case where the changing unit has a time point VS (m + 1) at which the reproduction of the sound associated with the m + 1st image is started before a time point VE (m) at which the reproduction of the sound associated with the mth image is ended. , VE (m) is before VS (m + 1), and the time VS (m) at which the reproduction of the sound associated with the mth image is started is after the time at which the display of the mth image is started. The information processing apparatus according to claim 2, wherein the time point at which the reproduction of the sound associated with the mth image starts and the time point at which the sound ends are changed.

An information processing apparatus that displays a plurality of images according to a preset timing,
The time point PS (m) at which the display of the m (m> 2) th image starts is before the time point PE (m−1) at which the display of the m−1th image is finished, and the m−1th image. The time from the point PS (m−1) to the point PE (m−1) at which display of the image is started is equal to the time from the point PS (m) to the point PE (m) at which the display of the m-th image is ended. Determining means for determining the start time and end time of image display,
Acquisition means for acquiring data representing sound associated with each image;
The time when the reproduction of the sound associated with each image starts after the time when the display of the image starts, and the time when the reproduction of the sound associated with the image ends. Setting means for setting a time point to start and end sound reproduction so as to be before the display ends;
When the time point VE (m−1) at which the reproduction of the sound associated with the (m−1) th image is finished is after the time point VS (m) at which the reproduction of the sound associated with the mth image is started, VE (m−1) is before VS (m), and the time point VS (m−1) at which the reproduction of the sound associated with the m−1th image is started is associated with the m−2th image. Change means for changing the time point at which the reproduction of the sound associated with the (m-1) th image is started and the time point at which the reproduction is finished so that the reproduction is after the time point VE (m-2) at which the reproduction of the generated sound is completed. Information processing apparatus provided.

The determination means further includes a time point after the time point PS (m + 1) when PE (m) starts displaying the (m + 1) th image, a time period from PS (m) to PE (m), and PS (m + 1). To determine when to start and end the display of the image so that the time from the end of the display of the (m + 1) th image to the point of time PE (m + 1) is equal.
The changing means further includes a time point VS (m + 1) at which reproduction of the sound associated with the m + 1st image is started before a time point VE (m) at which reproduction of the sound associated with the mth image is terminated. In some cases, VS (m + 1) is after VE (m), and the reproduction of the sound associated with the m + 2nd image is the time point VE (m + 1) at which the reproduction of the sound associated with the m + 1st image is completed. 5. The information processing according to claim 4, wherein the playback start time and the playback end time of the sound associated with the (m + 1) th image are changed so as to be before the start time VS (m + 2). apparatus.

Furthermore, when VE (m + 1) becomes PE (m + 1) or later due to the change of the sound reproduction timing, the time for reproducing the sound associated with the m + 1st image is changed to the VS after the change. When (m + 1) to PE (m + 1) and VS (m-1) is before PS (m-1), the time for playing the sound associated with the m-1st image is PS ( The information processing apparatus according to any one of claims 2 to 5, further comprising a shortening unit configured to change from m-1) to VE (m-1) after being changed.

Furthermore, when the time which reproduces | regenerates a sound is shortened, the control means which changes the speed | rate which reproduces | regenerates a sound so that the sound corresponding to the data showing the said sound is reproduced | regenerated at the shortened time is provided. The information processing apparatus described.

An operation method of an information processing apparatus capable of displaying each of a plurality of images on a display unit according to a predetermined order for a certain period of time and simultaneously displaying two or more images,
Adjustment that adjusts the start and end times of the output of the sound associated with each image so that the sound associated with each of the images is not output at the same time so that the time for displaying each image is not exceeded. Process,
An output control step of outputting a sound based on a result of the adjustment in the adjustment step.

A program for causing a computer to execute the operation method according to claim 8.