JP2006270921A

JP2006270921A - Imaging apparatus, imaging method, output device, output method and program

Info

Publication number: JP2006270921A
Application number: JP2005362465A
Authority: JP
Inventors: Ichigaku Mino; 一学三野; Akira Yoda; 章依田; Yukita Gotoda; 祐己太後藤田
Original assignee: Fuji Photo Film Co Ltd
Current assignee: Fujifilm Holdings Corp
Priority date: 2005-02-25
Filing date: 2005-12-15
Publication date: 2006-10-05
Anticipated expiration: 2025-12-15
Also published as: JP4741362B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an output device capable of easily outputting the sound that does not make users tired with respect to a taken image. <P>SOLUTION: This output device comprises an image storage module for storing multiple shot images, an image output module for outputting any of the images stored in the image storage module, an image output control module for outputting any of the images stored in the image storage module to the image output module, a sound storage module for storing multiple recorded sounds, a sound output module for outputting any of the sounds stored in the sound storage module, and a sound output control module that selects and outputs the first sound of multiple sounds stored in the sound storage module to the sound output module, when an image is output via the image output module and selects and outputs the second sound (different from the first one) from among multiple sounds stored in the sound storage module to the sound output module, when the same image is output again via the image output module. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、撮像装置、撮像方法、出力装置、出力方法、及びプログラムに関する。特に本発明は、画像を撮像する撮像装置及び撮像方法、画像を出力する出力装置及び出力方法、並びに撮像装置及び出力装置用のプログラムに関する。 The present invention relates to an imaging device, an imaging method, an output device, an output method, and a program. In particular, the present invention relates to an imaging device and an imaging method for capturing an image, an output device and an output method for outputting an image, and a program for the imaging device and the output device.

従来、静止画だけでなく動画もメモリカードに記録することができ、また、静止画や動画の撮影記録時にマイクロホンで検出した音声を画像に対応させて記録できるデジタルスチルカメラがある（例えば、特許文献１参照。）。また、デジタルスチルカメラで撮影した静止画や動画を表示しながら、画像に対応させて記録された音声を出力することができる電子フォトスタンドが知られている。
特開平７−１５４７３４号公報 Conventionally, there is a digital still camera that can record not only a still image but also a moving image on a memory card, and can record a sound detected by a microphone at the time of shooting and recording a still image or a moving image corresponding to the image (for example, patents). Reference 1). There is also known an electronic photo stand that can output sound recorded corresponding to an image while displaying a still image or a moving image taken by a digital still camera.
JP-A-7-154734

しかしながら、このようなカメラを用いて撮像された画像を電子フォトスタンドで再生すると、同じ画像に対しては、いつも同じ音声が再生されてしまう。このため、ユーザは画像を再生したときの音声に飽きてしまうという課題があった。また、ユーザにとっては、画像と音声の編集等の煩雑な作業をすることなく、容易に飽きの来ない音声を画像とともに鑑賞することができることが望ましい。また、特許文献１には、複数の画像から合成された画像に音声を対応させて記録する技術について開示されていない。 However, when an image captured using such a camera is reproduced on an electronic photo stand, the same sound is always reproduced for the same image. Therefore, there is a problem that the user gets bored with the sound when the image is reproduced. In addition, it is desirable for the user to be able to easily appreciate the sound that does not get tired together with the image without performing complicated operations such as editing of the image and the sound. Further, Patent Document 1 does not disclose a technique for recording audio corresponding to an image synthesized from a plurality of images.

そこで本発明は、上記の課題を解決することができる撮像装置、撮像方法、出力装置、出力方法、及びプログラムを提供することを目的とする。この目的は特許請求の範囲における独立項に記載の特徴の組み合わせにより達成される。また従属項は本発明の更なる有利な具体例を規定する。 Accordingly, an object of the present invention is to provide an imaging device, an imaging method, an output device, an output method, and a program that can solve the above-described problems. This object is achieved by a combination of features described in the independent claims. The dependent claims define further advantageous specific examples of the present invention.

本発明の第１の形態における出力装置は、撮像された複数の画像を格納する画像格納部と、画像格納部が格納している画像を出力する画像出力部と、画像格納部が格納している画像を画像出力部に出力させる画像出力制御部と、録音された複数の音声を格納する音声格納部と、音声格納部が格納している音声を出力する音声出力部と、画像出力部が画像を出力しているときに、音声格納部が格納している複数の音声のうちから第１の音声を選択して音声出力部に出力させ、画像出力部が同一の画像を再度出力するときに、音声格納部が格納している複数の音声のうちから第１の音声とは異なる第２の音声を選択して音声出力部に出力させる音声出力制御部とを備える。 An output device according to a first aspect of the present invention includes an image storage unit that stores a plurality of captured images, an image output unit that outputs an image stored in the image storage unit, and an image storage unit that stores An image output control unit that outputs an image output unit to the image output unit, an audio storage unit that stores a plurality of recorded sounds, an audio output unit that outputs audio stored in the audio storage unit, and an image output unit When outputting an image, when the first sound is selected from a plurality of sounds stored in the sound storage unit and output to the sound output unit, and the image output unit outputs the same image again And a sound output control unit that selects a second sound different from the first sound from a plurality of sounds stored in the sound storage unit and outputs the second sound to the sound output unit.

音声格納部は、画像格納部が格納している複数の画像を撮像した撮像装置の録音機能によって録音された複数の音声を格納してよい。音声格納部は、画像格納部が格納している複数の画像がそれぞれ撮像された複数の時刻を含む期間である撮像期間を含み、撮像期間より長い期間である録音期間において録音された複数の音声を格納してよい。音声格納部が格納している複数の音声の合計の時間は、画像出力部が１つの画像を出力する予め設定された出力時間に、画像格納部が格納している複数の画像の数を乗じた時間より長くてよい。 The sound storage unit may store a plurality of sounds recorded by a recording function of an imaging apparatus that has captured a plurality of images stored in the image storage unit. The audio storage unit includes an imaging period that is a period including a plurality of times when a plurality of images stored in the image storage unit are respectively captured, and a plurality of audios recorded in a recording period that is longer than the imaging period May be stored. The total time of the plurality of sounds stored in the sound storage unit is obtained by multiplying the preset output time for the image output unit to output one image by the number of the plurality of images stored in the image storage unit. It may be longer than the time.

画像格納部は、複数の画像のそれぞれに対応づけて、複数の画像が撮像されたタイミングを示す情報をそれぞれ格納し、音声格納部は、複数の音声のそれぞれに対応づけて、複数の音声が録音されたタイミングを示す情報をそれぞれ格納し、音声出力制御部は、録音されたタイミングが、画像が撮像されたタイミングから近い順に音声を選択してよい。 The image storage unit stores information indicating the timing at which the plurality of images are captured in association with each of the plurality of images, and the sound storage unit associates each of the plurality of sounds with a plurality of sounds. Information indicating the recorded timing may be stored, and the audio output control unit may select the audio in the order in which the recorded timing is closer to the timing at which the image was captured.

画像格納部は、複数の画像のそれぞれに対応づけて、複数の画像が撮像された時刻をそれぞれ格納し、音声格納部は、複数の音声のそれぞれに対応づけて、複数の音声が録音された時刻をそれぞれ格納し、音声出力制御部は、録音された時刻が、画像が撮像された時刻から近い順に音声を選択してよい。音声出力制御部は、音量が大きい順に音声を選択してよい。 The image storage unit stores the time at which the plurality of images are captured in association with each of the plurality of images, and the sound storage unit records the plurality of sounds in association with each of the plurality of sounds. Each time is stored, and the sound output control unit may select the sound in the order in which the recorded time is closer to the time when the image was captured. The voice output control unit may select voices in descending order of volume.

音声格納部が格納している複数の音声が音声出力部に出力された回数である出力回数を計数して保持する出力回数保持部と、音声格納部が格納している複数の音声が音声出力部に出力されるべき回数である目標回数を格納する目標回数格納部とをさらに備え、音声出力制御部は、目標回数から出力回数を引いた値が大きい順に音声を選択してよい。 An output count holding unit that counts and holds the number of times that a plurality of voices stored in the voice storage unit are output to the voice output unit, and a plurality of voices stored in the voice storage unit are output as voices A target number storage unit that stores a target number of times that should be output to the unit, and the sound output control unit may select the sound in descending order of a value obtained by subtracting the output number from the target number of times.

音声格納部が格納している複数の音声が音声出力部に出力された回数である出力回数を計数して保持する出力回数保持部と、音声格納部が格納している複数の音声が音声出力部に出力されるべき回数の比率である出力比率を格納する出力比率格納部とをさらに備え、音声出力制御部は、出力回数保持部が保持する出力回数の比率が、出力比率格納部が格納している出力比率に近づくように音声を選択してよい。 An output count holding unit that counts and holds the number of times that a plurality of voices stored in the voice storage unit are output to the voice output unit, and a plurality of voices stored in the voice storage unit are output as voices An output ratio storage unit that stores an output ratio that is a ratio of the number of times to be output to the unit, and the audio output control unit stores the ratio of the output number held by the output number holding unit in the output ratio storage unit The voice may be selected so as to approach the output ratio being used.

音声格納部は、撮像装置が撮像動作を受け付ける状態の動作モードである撮像モード、及び撮像装置が撮像動作を受け付けない状態の動作モードである非撮像モードの場合の双方において録音機能によって録音された複数の音声のそれぞれを、音声が録音されたときの動作モードに対応づけて格納し、音声出力制御部は、撮像装置が撮像モードのときに録音された音声を、非撮像モードのときに録音された音声より優先的に選択してよい。 The audio storage unit was recorded by the recording function in both the imaging mode, which is an operation mode in which the imaging device accepts an imaging operation, and the non-imaging mode, which is an operation mode in which the imaging device does not accept an imaging operation. Each of the plurality of sounds is stored in association with the operation mode when the sound is recorded, and the sound output control unit records the sound recorded when the image pickup apparatus is in the image pickup mode, when it is in the non-image pickup mode. You may select preferentially over the recorded voice.

音声出力部からの音声の出力中に、当該音声の出力が制限された回数を格納する制限回数格納部をさらに備え、音声出力制御部は、制限回数格納部が格納する回数がより少ない音声をより優先的に選択してよい。 The sound output unit further includes a limit number storage unit that stores the number of times that the output of the sound is limited during the output of the sound, and the sound output control unit is configured to output a voice with a smaller number of times stored in the limit number storage unit. You may select more preferentially.

画像格納部が格納している複数の画像を画像出力部に出力させるべき旨の指示を受け付ける出力指示受付部と、出力指示受付部が指示を受け付けた時刻を検出する出力時刻検出部とをさらに備え、画像格納部は、複数の画像のそれぞれに対応づけて、複数の画像が撮像された時刻を格納し、音声格納部は、複数の音声のそれぞれに対応づけて、複数の音声が録音された時刻を格納し、音声出力制御部は、出力時刻検出部が検出した時刻と、画像格納部が格納している複数の画像が撮像された時刻との差に基づいて、画像格納部が格納している複数の画像が撮像された時刻と、音声格納部が格納している複数の音声から選択する音声が録音された時刻との差の許容範囲を設定してよい。 An output instruction reception unit that receives an instruction to output the plurality of images stored in the image storage unit to the image output unit, and an output time detection unit that detects a time when the output instruction reception unit receives the instruction The image storage unit stores the time when the plurality of images are captured in association with each of the plurality of images, and the sound storage unit records a plurality of sounds in association with each of the plurality of sounds. The audio output control unit stores the time based on the difference between the time detected by the output time detection unit and the time when a plurality of images stored in the image storage unit are captured. The allowable range of the difference between the time when the plurality of images taken and the time when the sound selected from the plurality of sounds stored in the sound storage unit is recorded may be set.

音声出力制御部は、出力時刻検出部が検出した時刻と、画像格納部が格納している複数の画像が撮像された時刻との差がより大きい場合に、画像格納部が格納している複数の画像が撮像された時刻と、音声格納部が格納している複数の音声から選択する音声が録音された時刻との差の許容範囲をより大きく設定してよい。 When the difference between the time detected by the output time detection unit and the time when the plurality of images stored in the image storage unit are captured is larger, the audio output control unit is stored in the plurality of images stored in the image storage unit. The allowable range of the difference between the time when the image is captured and the time when the sound selected from the plurality of sounds stored in the sound storage unit is recorded may be set larger.

本発明の第２の形態における出力方法は、撮像された複数の画像を格納する画像格納段階と、画像格納段階において格納される画像を出力する画像出力段階と、画像格納段階において格納される画像を画像出力段階において出力させる画像出力制御段階と、録音された複数の音声を格納する音声格納段階と、音声格納段階において格納される音声を出力する音声出力段階と、画像出力段階において画像が出力されているときに、音声格納段階において格納される複数の音声のうちから第１の音声を選択して音声出力段階において出力させ、画像出力段階が同一の画像を再度出力するときに、音声格納段階において格納される複数の音声のうちから第１の音声とは異なる第２の音声を選択して音声出力段階において出力させる音声出力制御段階とを備える。 The output method according to the second aspect of the present invention includes an image storage stage for storing a plurality of captured images, an image output stage for outputting an image stored at the image storage stage, and an image stored at the image storage stage. Output in the image output stage, an audio storage stage for storing a plurality of recorded sounds, an audio output stage for outputting the audio stored in the audio storage stage, and an image output in the image output stage If the first sound is selected from the plurality of sounds stored in the sound storage stage and output in the sound output stage, and the same image is output again in the image output stage, the sound is stored. A voice output control stage for selecting a second voice different from the first voice from a plurality of voices stored in the stage and outputting the second voice in the voice output stage. Equipped with a.

本発明の第３の形態によると、画像を出力する出力装置用のプログラムであって、出力装置を、撮像された複数の画像を格納する画像格納部、画像格納部が格納している画像を出力する画像出力部、画像格納部が格納している画像を画像出力部に出力させる画像出力制御部、録音された複数の音声を格納する音声格納部、音声格納部が格納している音声を出力する音声出力部、画像出力部が画像を出力しているときに、音声格納部が格納している複数の音声のうちから第１の音声を選択して音声出力部に出力させ、画像出力部が同一の画像を再度出力するときに、音声格納部が格納している複数の音声のうちから第１の音声とは異なる第２の音声を選択して音声出力部に出力させる音声出力制御部として機能させる。 According to the third aspect of the present invention, there is provided a program for an output device that outputs an image, the output device storing an image storage unit that stores a plurality of captured images, and an image stored in the image storage unit. An image output unit for outputting, an image output control unit for outputting an image stored in the image storage unit to the image output unit, a sound storage unit for storing a plurality of recorded sounds, and a sound stored in the sound storage unit When the audio output unit to output and the image output unit output an image, the first audio is selected from the plurality of audios stored in the audio storage unit, and is output to the audio output unit to output the image. Audio output control for selecting the second audio different from the first audio from the plurality of audios stored in the audio storage unit and outputting the same to the audio output unit when the unit outputs the same image again Function as a part.

本発明の第４の形態における撮像装置は、撮像部と、撮像部が撮像した画像に含まれるオブジェクトを抽出するオブジェクト抽出部と、オブジェクト抽出部が抽出したオブジェクトに関連する音声を取得する音声取得部と、撮像部が撮像した画像における、音声取得部が取得した音声に関連するオブジェクトの位置を特定するオブジェクト位置特定部と、オブジェクト位置特定部が特定したオブジェクトの位置に対応づけて、音声取得部が取得した音声を格納する音声格納部とを備える。 An imaging apparatus according to a fourth aspect of the present invention includes an imaging unit, an object extraction unit that extracts an object included in an image captured by the imaging unit, and audio acquisition that acquires audio related to the object extracted by the object extraction unit. Sound acquisition in association with the position of the object, the object position specifying unit for specifying the position of the object related to the sound acquired by the sound acquisition unit, and the object position specified by the object position specifying unit in the image captured by the imaging unit A voice storage unit that stores the voice acquired by the unit.

撮像部の周囲の音声を録音する録音部をさらに備え、音声取得部は、オブジェクト抽出部が抽出したオブジェクトに関連する音声を、録音部が録音した音声から抽出し、オブジェクト位置特定部は、撮像部が撮像した画像における、音声取得部が抽出した音声に関連するオブジェクトの位置を特定し、音声格納部は、オブジェクト位置特定部が特定したオブジェクトの位置に対応づけて、音声取得部が抽出した音声を格納してよい。 A sound recording unit that records sound around the imaging unit is further provided, the sound acquisition unit extracts sound related to the object extracted by the object extraction unit from the sound recorded by the recording unit, and the object position specifying unit captures the image. In the image captured by the unit, the position of the object related to the voice extracted by the voice acquisition unit is specified, and the voice storage unit is extracted by the voice acquisition unit in association with the position of the object specified by the object position specification unit. Audio may be stored.

本発明の第５の形態における撮像方法は、撮像段階と、撮像段階において撮像された画像に含まれるオブジェクトを抽出するオブジェクト抽出段階と、オブジェクト抽出段階において抽出されたオブジェクトに関連する音声を取得する音声取得段階と、撮像段階において撮像された画像における、音声取得段階において取得された音声に関連するオブジェクトの位置を特定するオブジェクト位置特定段階と、オブジェクト位置特定段階において特定されたオブジェクトの位置に対応づけて、音声取得段階が取得した音声を格納する音声格納段階とを備える。 An imaging method according to a fifth aspect of the present invention acquires an imaging stage, an object extraction stage for extracting an object included in an image captured in the imaging stage, and a sound related to the object extracted in the object extraction stage. Corresponding to the sound acquisition stage, the object position specifying stage for specifying the position of the object related to the sound acquired in the sound acquisition stage in the image captured in the imaging stage, and the object position specified in the object position specifying stage In addition, the voice acquisition stage includes a voice storage stage for storing the acquired voice.

本発明の第６の形態によると、画像を撮像する撮像装置用のプログラムであって、撮像装置を、撮像部、撮像部が撮像した画像に含まれるオブジェクトを抽出するオブジェクト抽出部、オブジェクト抽出部が抽出したオブジェクトに関連する音声を取得する音声取得部、撮像部が撮像した画像における、音声取得部が取得した音声に関連するオブジェクトの位置を特定するオブジェクト位置特定部、オブジェクト位置特定部が特定したオブジェクトの位置に対応づけて、音声取得部が取得した音声を格納する音声格納部として機能させる。 According to a sixth aspect of the present invention, there is provided a program for an imaging device that captures an image, the imaging device including an imaging unit, an object extraction unit that extracts an object included in the image captured by the imaging unit, and an object extraction unit The sound acquisition unit that acquires sound related to the object extracted by the object, the object position specifying unit that specifies the position of the object related to the sound acquired by the sound acquisition unit in the image captured by the imaging unit, and the object position specifying unit specifies In association with the position of the object, the voice acquisition unit functions as a voice storage unit that stores the voice acquired.

本発明の第７の形態における出力装置は、画像を格納する画像格納部と、画像格納部が格納している画像及び当該画像における位置に対応づけて、音声を格納する音声格納部と、画像格納部が格納している画像における少なくとも一部を含む部分領域の範囲を取得する部分領域範囲取得部と、画像格納部が格納している画像における部分領域範囲取得部が取得した部分領域の範囲の画像から出力画像を生成する出力画像生成部と、画像格納部が格納している画像において部分領域範囲取得部が取得した部分領域の範囲が存在する位置である全体画像内位置に対応づけて音声格納部が格納している音声から出力音声を生成する出力音声生成部と、出力画像生成部が生成した出力画像と出力音声生成部が生成した出力音声とが同期して出力されるべく、当該出力画像と当該出力音声とを対応づけて出力する画像出力部とを備える。 An output device according to a seventh aspect of the present invention includes an image storage unit that stores an image, an audio storage unit that stores audio in association with an image stored in the image storage unit and a position in the image, and an image A partial region range acquisition unit that acquires a partial region range including at least a part of an image stored in the storage unit, and a partial region range acquired by the partial region range acquisition unit in the image stored in the image storage unit An output image generation unit that generates an output image from the image of the image, and a position in the entire image that is a position where the range of the partial region acquired by the partial region range acquisition unit exists in the image stored in the image storage unit The output sound generation unit that generates output sound from the sound stored in the sound storage unit, the output image generated by the output image generation unit, and the output sound generated by the output sound generation unit are output in synchronization. Ku, and an image output unit for outputting in association with the said output image and the output audio.

出力画像生成部は、画像格納部が格納している画像における部分領域範囲取得部が取得した部分領域の範囲の画像と、画像格納部が格納している他の画像とを合成して出力画像を生成し、出力音声生成部は、部分領域範囲取得部が取得した部分領域の範囲が存在する位置である全体画像内位置に対応づけて音声格納部が格納している音声と、出力画像に含まれる他の画像に対応づけて音声格納部が格納している音声とから出力音声を生成し、画像出力部は、出力画像生成部が生成した出力画像と出力音声生成部が生成した出力音声とが同期して出力されるべく、当該出力画像と当該出力音声とを対応づけて出力してよい。 The output image generation unit combines the image of the partial region range acquired by the partial region range acquisition unit in the image stored in the image storage unit with the other image stored in the image storage unit, and outputs an image. The output sound generation unit associates the sound stored in the sound storage unit with the position in the entire image that is the position where the range of the partial region acquired by the partial region range acquisition unit exists, and the output image The output sound is generated from the sound stored in the sound storage unit in association with the other images included, and the image output unit outputs the output image generated by the output image generation unit and the output sound generated by the output sound generation unit. And the output image and the output sound may be output in association with each other.

オブジェクトの種類に対応づけて音声を格納する音声データベースをさらに備え、音声格納部は、画像格納部が格納している画像に対応づけられた位置に存在するオブジェクトの種類に対応づけて音声データベースが格納している音声を取得して格納してよい。 An audio database that stores audio in association with the object type is further provided, and the audio storage unit stores the audio database in association with the type of object existing at a position associated with the image stored in the image storage unit. The stored voice may be acquired and stored.

出力音声生成部は、出力画像においてより大きい面積を占めるオブジェクトが存在する位置である全体画像内位置及び当該オブジェクトを含む画像に対応づけて音声格納部が格納している音声を、より強調した出力音声を生成してよい。出力音声生成部は、出力画像においてより大きい面積を占めるオブジェクトが存在する位置である全体画像内位置及び当該オブジェクトを含む画像に対応づけて音声格納部が格納している音声を、より大きい音量で合成した出力音声を生成してよい。 The output sound generation unit emphasizes the sound stored in the sound storage unit in association with the position in the entire image where the object occupying a larger area exists in the output image and the image including the object. Voice may be generated. The output sound generation unit outputs the sound stored in the sound storage unit in association with the position in the entire image, which is a position where an object occupying a larger area in the output image, and the image including the object, at a higher volume. A synthesized output speech may be generated.

出力音声生成部は、出力画像においてより前面に配置された画像内のオブジェクトが存在する位置である全体画像内位置及び当該オブジェクトを含む画像に対応づけて音声格納部が格納している音声を、より強調した出力音声を生成してよい。出力音声生成部は、出力画像においてより前面に配置された画像内のオブジェクトが存在する位置である全体画像内位置及び当該オブジェクトを含む画像に対応づけて音声格納部が格納している音声を、より大きい音量で合成した出力音声を生成してよい。 The output sound generation unit stores the sound stored in the sound storage unit in association with the position in the entire image, which is the position where the object in the image arranged in front of the output image exists, and the image including the object, A more emphasized output sound may be generated. The output sound generation unit stores the sound stored in the sound storage unit in association with the position in the entire image, which is the position where the object in the image arranged in front of the output image exists, and the image including the object, Output speech synthesized at a higher volume may be generated.

本発明の第８の形態における出力方法は、画像を格納する画像格納段階と、画像格納段階において格納されている画像及び当該画像における位置に対応づけて、音声を格納する音声格納段階と、画像格納段階において格納されている画像における少なくとも一部を含む部分領域の範囲を取得する部分領域範囲取得段階と、画像格納段階において格納されている画像における部分領域範囲取得段階において取得された部分領域の範囲の画像から出力画像を生成する出力画像生成段階と、画像格納段階において格納されている画像において部分領域範囲取得段階において取得された部分領域の範囲が存在する位置である全体画像内位置に対応づけて音声格納段階において格納されている音声から出力音声を生成する出力音声生成段階と、出力画像生成段階において生成された出力画像と出力音声生成段階において生成された出力音声とが同期して出力されるべく、当該出力画像と当該出力音声とを対応づけて出力する画像出力段階とを備える。 An output method according to an eighth aspect of the present invention includes an image storage stage for storing an image, an audio storage stage for storing audio in association with the image stored at the image storage stage and a position in the image, and an image A partial region range acquisition step for acquiring a partial region range including at least a part of the image stored in the storage step; and a partial region acquired in the partial region range acquisition step for the image stored in the image storage step. Corresponds to the position in the whole image, which is the position where the partial area range acquired in the partial area range acquisition stage exists in the image stored in the image storage stage and the output image generation stage that generates the output image from the range image An output sound generation stage for generating output sound from the sound stored in the sound storage stage, and an output image To output sound generated by the generated output image and the output sound generating step in formation stage is outputted in synchronization, and an image output step of outputting in association with the said output image and the output audio.

本発明の第９の形態によると、画像を出力する出力装置用のプログラムであって、出力装置を、画像を格納する画像格納部、画像格納部が格納している画像及び当該画像における位置に対応づけて、音声を格納する音声格納部、画像格納部が格納している画像における少なくとも一部を含む部分領域の範囲を取得する部分領域範囲取得部、画像格納部が格納している画像における部分領域範囲取得部が取得した部分領域の範囲の画像から出力画像を生成する出力画像生成部、画像格納部が格納している画像において部分領域範囲取得部が取得した部分領域の範囲が存在する位置である全体画像内位置に対応づけて音声格納部が格納している音声から出力音声を生成する出力音声生成部、出力画像生成部が生成した出力画像と出力音声生成部が生成した出力音声とが同期して出力されるべく、当該出力画像と当該出力音声とを対応づけて出力する画像出力部として機能させる。 According to a ninth aspect of the present invention, there is provided a program for an output device that outputs an image, wherein the output device is placed in an image storage unit that stores an image, an image stored in the image storage unit, and a position in the image. Correspondingly, an audio storage unit that stores audio, a partial region range acquisition unit that acquires a range of a partial region including at least a part of an image stored in the image storage unit, and an image stored in the image storage unit An output image generation unit that generates an output image from an image of a partial region range acquired by the partial region range acquisition unit, and there is a partial region range acquired by the partial region range acquisition unit in the image stored in the image storage unit An output sound generation unit that generates output sound from the sound stored in the sound storage unit in association with the position in the entire image as a position, an output image generated by the output image generation unit, and an output sound generation unit To an output audio form is outputted in synchronism to function as an image output unit for outputting in association with the said output image and the output audio.

本発明の第１０の形態における出力装置は、複数の画像を格納する画像格納部と、画像格納部が格納する複数の画像のそれぞれに対応づけて複数の音声を格納する音声格納部と、画像格納部が格納している複数の画像を合成して出力画像を生成する出力画像生成部と、出力画像生成部が生成した出力画像に含まれる第１画像及び第２画像のそれぞれに対応づけて音声格納部が格納する第１音声及び第２音声を用いて出力音声を生成する出力音声生成部と、出力画像生成部が生成した出力画像と出力音声生成部が生成した出力音声とが同期して出力されるべく、当該出力画像と当該出力音声とを対応づけて出力する画像出力部とを備え、出力音声生成部は、出力画像生成部が生成した出力画像において第１画像が第２画像より強調されている場合に、第１音声を第２音声より強調して合成した出力音声を生成する。 An output device according to a tenth aspect of the present invention includes an image storage unit that stores a plurality of images, a sound storage unit that stores a plurality of sounds in association with each of the plurality of images stored in the image storage unit, and an image An output image generation unit that generates an output image by combining a plurality of images stored in the storage unit, and a first image and a second image that are included in the output image generated by the output image generation unit The output sound generation unit that generates the output sound using the first sound and the second sound stored in the sound storage unit, the output image generated by the output image generation unit, and the output sound generated by the output sound generation unit are synchronized. An output unit that outputs the output image and the output sound in association with each other, and the output sound generation unit is configured such that the first image is the second image in the output image generated by the output image generation unit. More emphasized place To generate an output speech in which the first audio stressed to synthesize than the second sound.

出力音声生成部は、出力画像生成部が生成した出力画像において第１画像が第２画像より大きい場合に、第１音声を第２音声より強調して合成した出力音声を生成してよい。出力音声生成部は、出力画像生成部が生成した出力画像において第１画像が第２画像より前面にある場合に、第１音声を第２音声より強調して合成した出力音声を生成してよい。 When the first image is larger than the second image in the output image generated by the output image generation unit, the output sound generation unit may generate an output sound obtained by emphasizing the first sound over the second sound. The output sound generation unit may generate an output sound in which the first sound is emphasized and synthesized from the second sound when the first image is in front of the second image in the output image generated by the output image generation unit. .

出力音声生成部は、出力画像生成部が生成した出力画像において第１画像が第２画像より中央に存在する場合に、第１音声を第２音声より強調して合成した出力音声を生成してよい。出力音声生成部は、出力画像生成部が生成した出力画像において第１画像が第２画像より強調されている場合に、第１音声の音量を第２音声の音量より大きく合成した出力音声を生成してよい。 The output sound generation unit generates an output sound obtained by emphasizing the first sound from the second sound and synthesizing the first image in the output image generated by the output image generation unit. Good. The output sound generation unit generates an output sound in which the volume of the first sound is greater than the volume of the second sound when the first image is emphasized from the second image in the output image generated by the output image generation unit. You can do it.

本発明の第１１の形態における出力方法は、複数の画像を格納する画像格納段階と、画像格納段階において格納されている複数の画像のそれぞれに対応づけて複数の音声を格納する音声格納段階と、画像格納段階において格納されている複数の画像を合成して出力画像を生成する出力画像生成段階と、出力画像生成段階において生成された出力画像に含まれる第１画像及び第２画像のそれぞれに対応づけて音声格納段階において格納されている第１音声及び第２音声を用いて出力音声を生成する出力音声生成段階と、出力画像生成段階において生成された出力画像と出力音声生成段階において生成された出力音声とが同期して出力されるべく、当該出力画像と当該出力音声とを対応づけて出力する画像出力段階とを備え、出力音声生成段階は、出力画像生成段階において生成された出力画像において第１画像が第２画像より強調されている場合に、第１音声を第２音声より強調して合成した出力音声を生成する。 An output method according to an eleventh aspect of the present invention includes an image storage stage for storing a plurality of images, and an audio storage stage for storing a plurality of sounds in association with each of the plurality of images stored in the image storage stage. An output image generation step of generating an output image by combining a plurality of images stored in the image storage step, and a first image and a second image included in the output image generated in the output image generation step, respectively. An output sound generation step for generating output sound using the first sound and the second sound stored in the sound storage step in association with each other, an output image generated in the output image generation step, and an output sound generation step The output sound generation stage includes an image output stage for outputting the output image and the output sound in association with each other so that the output sound is output in synchronization with each other. First image if it is emphasized from the second image to generate an output speech in which the first audio stressed to synthesize than the second audio in the output image generated by the output image generating step.

本発明の第１２の形態によると、画像を出力する出力装置用のプログラムであって、出力装置を、複数の画像を格納する画像格納部、画像格納部が格納する複数の画像のそれぞれに対応づけて複数の音声を格納する音声格納部、画像格納部が格納している複数の画像を合成して出力画像を生成する出力画像生成部、出力画像生成部が生成した出力画像に含まれる第１画像及び第２画像のそれぞれに対応づけて音声格納部が格納する第１音声及び第２音声を用いて出力音声を生成する出力音声生成部、出力画像生成部が生成した出力画像と出力音声生成部が生成した出力音声とが同期して出力されるべく、当該出力画像と当該出力音声とを対応づけて出力する画像出力部として機能させ、出力音声生成部に、出力画像生成部が生成した出力画像において第１画像が第２画像より強調されている場合に、第１音声を第２音声より強調して合成した出力音声を生成させる。 According to a twelfth aspect of the present invention, there is provided a program for an output device that outputs an image, the output device corresponding to each of an image storage unit that stores a plurality of images and a plurality of images that the image storage unit stores. A voice storage unit that stores a plurality of sounds, an output image generation unit that generates an output image by combining a plurality of images stored in the image storage unit, and an output image generated by the output image generation unit. An output sound generation unit that generates output sound using the first sound and the second sound stored in the sound storage unit in association with each of the one image and the second image, and the output image and output sound generated by the output image generation unit In order to output the output sound generated by the generation unit in synchronism, the output image is generated by the output sound generation unit by causing the output sound generation unit to function as an image output unit that outputs the output image in association with the output sound. Output image If the Oite first image is emphasized from the second image to generate an output speech the first speech synthesized emphasized than the second sound.

なお上記の発明の概要は、本発明の必要な特徴の全てを列挙したものではなく、これらの特徴群のサブコンビネーションもまた発明となりうる。 Note that the above summary of the invention does not enumerate all the necessary features of the present invention, and sub-combinations of these feature groups can also be the invention.

本発明によれば、撮像画像に対してユーザが飽きない音声を出力する出力装置を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the output device which outputs the audio | voice which a user does not get tired with respect to a captured image can be provided.

以下、発明の実施形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲に係る発明を限定するものではなく、また実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 Hereinafter, the present invention will be described through embodiments of the invention. However, the following embodiments do not limit the claimed invention, and all combinations of features described in the embodiments are inventions. It is not always essential to the solution.

図１は、本発明の一実施形態に係る音声出力システムの一例を示す。音声出力システムは、撮像装置１００、出力装置１４０、及び音声データベース１９０を備える。この例では、撮像装置１００は、海岸に遊びにきている人の画像を撮像する。また、撮像装置１００は、撮像装置１００の周囲の音をマイクロホン１０２で録音する。撮像装置１００は、撮像装置１００が撮像した画像及び録音した音声を、インターネット等の通信回線１５０を通じて出力装置１４０に送信する。出力装置１４０は、撮像装置１００から受け取った画像を出力するときに、撮像装置１００から受け取った音声を同期させて出力する。このとき、出力装置１４０は、同じ画像を再度出力するときには、前回出力した音声とは異なる音声を出力する。このため、ユーザ１８０は、画像を出力する毎に異なる音声を楽しむことができるので、飽きることなく画像を鑑賞することができる。 FIG. 1 shows an example of an audio output system according to an embodiment of the present invention. The audio output system includes an imaging device 100, an output device 140, and an audio database 190. In this example, the imaging apparatus 100 captures an image of a person who is visiting the beach. Further, the imaging apparatus 100 records sounds around the imaging apparatus 100 with the microphone 102. The imaging device 100 transmits the image captured by the imaging device 100 and the recorded sound to the output device 140 through a communication line 150 such as the Internet. When outputting the image received from the imaging device 100, the output device 140 synchronizes and outputs the sound received from the imaging device 100. At this time, when outputting the same image again, the output device 140 outputs a sound different from the sound output last time. For this reason, since the user 180 can enjoy different sounds every time an image is output, the user 180 can appreciate the image without getting tired.

出力装置１４０は、例えば、ＨＤＴＶ、電子フォトスタンド、コンピュータ等の、画像及び音声を出力する装置であってよい。また、出力装置１４０は、音声を文字として出力してもよい。例えば、出力装置１４０は、液晶等の表示デバイスに画像を表示するときに、音声を文字として表示デバイスに表示させる。なお、出力装置１４０は、画像を表示させる表示デバイスに文字を表示させてよく、画像を表示させる表示デバイスとは別の表示デバイスに文字を表示させてもよい。他にも、出力装置１４０は、プリンタ等の画像を印刷する印刷装置であってもよく、画像を印刷するとともに音声を文字として印刷してもよい。 The output device 140 may be a device that outputs images and sounds, such as an HDTV, an electronic photo stand, and a computer. Further, the output device 140 may output the voice as characters. For example, when displaying an image on a display device such as a liquid crystal, the output device 140 displays sound on the display device as characters. Note that the output device 140 may display characters on a display device that displays an image, and may display characters on a display device that is different from the display device that displays an image. In addition, the output device 140 may be a printing device that prints an image, such as a printer, and may print the image and sound as characters.

撮像装置１００は、例えば、デジタルスチルカメラ、カメラ付携帯電話等であってよい。また、撮像装置１００は、出力装置１４０が有する、画像又は音声を出力する機能を有してもよい。また、撮像装置１００が画像及び音声データを記録媒体に記録して、出力装置１４０は当該記録媒体からデータを受け取ることによって、画像及び音声を出力してもよい。また、撮像装置１００は、画像及び音声データを、通信回線１５０に接続されたサーバの、ユーザ１８０毎にそれぞれ設けられたディレクトリ、例えば撮像装置１００と関連付けられたディレクトリに格納してもよい。そして出力装置１４０は、ユーザ１８０毎にサーバに格納された画像及び音声データを受け取ってもよい。 The imaging device 100 may be, for example, a digital still camera, a camera-equipped mobile phone, or the like. Further, the imaging apparatus 100 may have a function of outputting an image or sound that the output apparatus 140 has. Further, the image capturing apparatus 100 may record image and sound data on a recording medium, and the output apparatus 140 may output the image and sound by receiving data from the recording medium. In addition, the imaging apparatus 100 may store the image and audio data in a directory provided for each user 180 of the server connected to the communication line 150, for example, a directory associated with the imaging apparatus 100. The output device 140 may receive image and sound data stored in the server for each user 180.

また、撮像装置１００は、撮像した画像に写っている犬、鳥等のオブジェクトを抽出して、抽出したオブジェクトの種類を特定する。そして、撮像装置１００は、犬、鳥等のオブジェクトの種類別にオブジェクトの代表的な音声を格納している音声データベース１９０から、特定した種類のオブジェクトの代表的な音声を取得する。そして、撮像装置１００は、取得した音声を、撮像した画像に関連づけて出力装置１４０に提供する。出力装置１４０は、ユーザ１８０による画像の編集指示を受け付ける。例えば、出力装置１４０は、ユーザ１８０から指定された複数の画像を、ユーザ１８０から指定されたレイアウトで合成して出力画像を生成する。このとき、出力装置１４０は、出力画像の生成に用いた各画像に関連づけられた音声を、出力画像において各画像が占める面積比と同じ音量比で合成された出力音声を生成する。そして、出力装置１４０は、生成した出力音声を、出力画像の表示と同期して再生する。このため、出力装置１４０は、例えばユーザ１８０が犬を含む画像と鳥を含む画像とを合成した出力画像を、犬と鳥の鳴き声が合成された出力音声とともに鑑賞することができる。したがってユーザ１８０は、編集後の出力画像の画像内容に応じた望ましい音声を、出力装置１４０を用いて容易に鑑賞することができる。 Further, the imaging apparatus 100 extracts objects such as dogs and birds that appear in the captured image, and specifies the type of the extracted object. Then, the imaging apparatus 100 acquires the representative voice of the specified type of object from the voice database 190 that stores the representative voice of the object for each type of object such as a dog or bird. Then, the imaging device 100 provides the acquired sound to the output device 140 in association with the captured image. The output device 140 receives an image editing instruction from the user 180. For example, the output device 140 combines a plurality of images designated by the user 180 with a layout designated by the user 180 to generate an output image. At this time, the output device 140 generates output sound in which the sound associated with each image used to generate the output image is synthesized with the same volume ratio as the area ratio occupied by each image in the output image. Then, the output device 140 reproduces the generated output sound in synchronization with the display of the output image. For this reason, for example, the output device 140 allows the user 180 to view an output image obtained by synthesizing an image including a dog and an image including a bird together with an output sound obtained by synthesizing a dog and bird call. Therefore, the user 180 can easily appreciate the desired sound corresponding to the image content of the edited output image using the output device 140.

図２は、出力装置１４０のブロック構成の一例を示す。本図は、同じ画像に対して続けて同じ音声が出力されないように制御する出力装置１４０のブロック構成の一例を示す。出力装置１４０は、画像格納部２１０、画像出力制御部２１２、画像出力部２１４、出力指示受付部２４０、出力許容時間設定部２４２、出力時刻検出部２４４、音声格納部２２０、音声出力制御部２２２、音声出力部２２４、出力回数保持部２３０、目標回数格納部２３２、出力比率格納部２３４、及び制限回数格納部２３６を備える。 FIG. 2 shows an example of a block configuration of the output device 140. This figure shows an example of a block configuration of an output device 140 that controls so that the same sound is not continuously output for the same image. The output device 140 includes an image storage unit 210, an image output control unit 212, an image output unit 214, an output instruction reception unit 240, an output allowable time setting unit 242, an output time detection unit 244, an audio storage unit 220, and an audio output control unit 222. , An audio output unit 224, an output number holding unit 230, a target number storage unit 232, an output ratio storage unit 234, and a limit number storage unit 236.

画像格納部２１０は、撮像された複数の画像を格納する。画像格納部２１０は、複数の画像のそれぞれに対応づけて、複数の画像が撮像されたタイミングを示す情報をそれぞれ格納する。具体的には、画像格納部２１０は、複数の画像のそれぞれに対応づけて、複数の画像が撮像された時刻をそれぞれ格納する。 The image storage unit 210 stores a plurality of captured images. The image storage unit 210 stores information indicating the timing at which the plurality of images are captured in association with each of the plurality of images. Specifically, the image storage unit 210 stores the times at which a plurality of images are captured in association with each of the plurality of images.

出力指示受付部２４０は、画像格納部２１０が格納している複数の画像を画像出力部２１４に出力させるべき旨の指示を受け付ける。画像出力制御部２１２は、出力指示受付部２４０の指示に基づいて、画像格納部２１０が格納している画像を画像出力部２１４に出力させる。画像出力部２１４は、例えば画像を出力する液晶等の表示デバイスであってよく、画像を印刷する印刷デバイスであってもよい。 The output instruction receiving unit 240 receives an instruction to the image output unit 214 to output a plurality of images stored in the image storage unit 210. The image output control unit 212 causes the image output unit 214 to output the image stored in the image storage unit 210 based on an instruction from the output instruction reception unit 240. The image output unit 214 may be a display device such as a liquid crystal that outputs an image, or may be a printing device that prints an image.

音声格納部２２０は、録音された複数の音声を格納する。例えば、音声格納部２２０は、画像格納部２１０が格納している複数の画像を撮像した撮像装置１００の録音機能によって録音された複数の音声を格納する。具体的には、音声格納部２２０は、画像格納部２１０が格納している複数の画像が撮像されたときに撮像装置１００の録音機能によって録音された音声、及び画像格納部２１０が格納している複数の画像が撮像されていないときに撮像装置１００の録音機能によって録音された音声を格納する。 The voice storage unit 220 stores a plurality of recorded voices. For example, the sound storage unit 220 stores a plurality of sounds recorded by the recording function of the imaging apparatus 100 that has captured a plurality of images stored in the image storage unit 210. Specifically, the sound storage unit 220 stores the sound recorded by the recording function of the imaging device 100 when a plurality of images stored in the image storage unit 210 are captured, and the image storage unit 210 stores the sound. The sound recorded by the recording function of the imaging device 100 when a plurality of images are not captured is stored.

また、音声格納部２２０は、複数の音声のそれぞれに対応づけて、複数の音声が録音されたタイミングを示す情報をそれぞれ格納する。具体的には、音声格納部２２０は、複数の音声のそれぞれに対応づけて、複数の音声が録音された時刻をそれぞれ格納する。なお、録音された時刻とは、録音を開始した時刻であってよく、録音を終了した時刻であってもよい。他にも、音声格納部２２０は、タイミングを示す情報として、例えば複数の音声の録音を開始した時刻と、当該複数の音声が録音された順番を示す情報を格納してもよい。 In addition, the voice storage unit 220 stores information indicating the timing at which a plurality of voices are recorded in association with each of the plurality of voices. Specifically, the voice storage unit 220 stores the times when a plurality of voices are recorded in association with each of the plurality of voices. The recorded time may be the time when recording is started or the time when recording is ended. In addition, the voice storage unit 220 may store, as information indicating the timing, for example, information indicating the recording start time of a plurality of sounds and the order in which the plurality of sounds are recorded.

音声格納部２２０は、画像格納部２１０が格納している複数の画像がそれぞれ撮像された複数の時刻を含む期間である撮像期間を含み、撮像期間より長い期間である録音期間において録音された複数の音声を格納する。なお、音声格納部２２０が格納している複数の音声の合計の時間は、画像出力部２１４が１つの画像を出力する予め設定された出力時間に、画像格納部２１０が格納している複数の画像の数を乗じた時間より長い。 The audio storage unit 220 includes an imaging period that is a period including a plurality of times at which a plurality of images stored in the image storage unit 210 are captured, and a plurality of audio recordings that are recorded in a recording period that is longer than the imaging period. To store the voice. It should be noted that the total time of the plurality of sounds stored in the sound storage unit 220 is the preset output time for the image output unit 214 to output one image. Longer than the time multiplied by the number of images.

音声出力部２２４は、音声格納部２２０が格納している音声を出力する。具体的には、音声出力部２２４は、音声を再生する再生デバイスであってよい。また、音声出力部２２４は、例えば液晶等の、文字を出力する表示デバイスを含んでよく、音声格納部２２０が格納している音声を表示デバイスに文字として出力させてもよい。また、音声出力部２２４は、印字デバイス等の、文字として印刷する印刷デバイスを含んでよく、音声格納部２２０が格納している音声を文字として印刷してもよい。 The audio output unit 224 outputs the audio stored in the audio storage unit 220. Specifically, the audio output unit 224 may be a reproduction device that reproduces audio. The voice output unit 224 may include a display device that outputs characters, such as a liquid crystal display, and may cause the display device to output the voice stored in the voice storage unit 220 as characters. The voice output unit 224 may include a printing device that prints as characters, such as a printing device, and may print the voice stored in the voice storage unit 220 as characters.

音声出力制御部２２２は、画像出力部２１４が画像を出力しているときに、音声格納部２２０が格納している複数の音声のうちから第１の音声を選択して音声出力部２２４に出力させ、画像出力部２１４が同一の画像を再度出力するときに、音声格納部２２０が格納している複数の音声のうちから第１の音声とは異なる第２の音声を選択して音声出力部２２４に出力させる。このため、ユーザ１８０は画像を出力する毎に異なる音声を楽しむことができる。 The audio output control unit 222 selects the first audio from the plurality of audios stored in the audio storage unit 220 and outputs it to the audio output unit 224 when the image output unit 214 outputs an image. When the image output unit 214 outputs the same image again, the second sound different from the first sound is selected from the plurality of sounds stored in the sound storage unit 220, and the sound output unit 224 to output. For this reason, the user 180 can enjoy different sounds each time an image is output.

なお、音声出力制御部２２２は、録音されたタイミングが、画像が撮像されたタイミングから近い順に音声を選択する。例えば、音声出力制御部２２２は、録音された時刻が、画像が撮像された時刻から近い順に音声を選択する。他にも、音声出力制御部２２２は、音量が大きい順に音声を選択してもよい。このため、ユーザ１８０は、画像を撮像したときの特徴的な音から順に音声を楽しむことができる。 Note that the audio output control unit 222 selects audio in the order in which the recorded timing is closer to the timing at which the image was captured. For example, the audio output control unit 222 selects audio in the order in which the recorded time is closer to the time when the image was captured. In addition, the audio output control unit 222 may select audio in descending order of volume. For this reason, the user 180 can enjoy the sound in order from the characteristic sound when the image is captured.

出力回数保持部２３０は、音声格納部２２０が格納している複数の音声が音声出力部２２４に出力された回数である出力回数を計数して保持する。目標回数格納部２３２は、音声格納部２２０が格納している複数の音声が音声出力部２２４に出力されるべき回数である目標回数を格納する。そして、音声出力制御部２２２は、目標回数から出力回数を引いた値が大きい順に音声を選択してもよい。 The output number holding unit 230 counts and holds the number of outputs, which is the number of times a plurality of sounds stored in the sound storage unit 220 are output to the sound output unit 224. The target number of times storage unit 232 stores a target number of times that is the number of times that a plurality of sounds stored in the sound storage unit 220 should be output to the sound output unit 224. And the audio | voice output control part 222 may select an audio | voice in order with the value which subtracted the output frequency from the target frequency.

出力比率格納部２３４は、音声格納部２２０が格納している複数の音声が音声出力部２２４に出力されるべき回数の比率である出力比率を格納する。そして、音声出力制御部２２２は、出力回数保持部２３０が保持する出力回数の比率が、出力比率格納部２３４が格納している出力比率に近づくように音声を選択してもよい。このため、ユーザ１８０は、画像を撮像したときの特徴的な音声をより多く楽しむことができる。 The output ratio storage unit 234 stores an output ratio that is a ratio of the number of times that a plurality of sounds stored in the sound storage unit 220 should be output to the sound output unit 224. Then, the sound output control unit 222 may select the sound so that the ratio of the number of outputs held by the output number holding unit 230 approaches the output ratio stored in the output ratio storage unit 234. For this reason, the user 180 can enjoy more characteristic sound when an image is captured.

なお、音声格納部２２０は、撮像装置１００が撮像動作を受け付ける状態の動作モードである撮像モードの場合、及び撮像装置１００が撮像動作を受け付けない状態の動作モードである非撮像モードの場合の双方において録音機能によって録音された複数の音声のそれぞれを、音声が録音されたときの動作モードに対応づけて格納する。そして、音声出力制御部２２２は、撮像装置１００が撮像モードのときに録音された音声を、非撮像モードのときに録音された音声より優先的に選択する。 Note that the audio storage unit 220 is both in an imaging mode, which is an operation mode in a state where the imaging device 100 accepts an imaging operation, and in a non-imaging mode, which is an operation mode in which the imaging device 100 does not accept an imaging operation. Each of the plurality of voices recorded by the recording function is stored in association with the operation mode when the voice is recorded. Then, the audio output control unit 222 preferentially selects the audio recorded when the imaging apparatus 100 is in the imaging mode over the audio recorded when in the non-imaging mode.

制限回数格納部２３６は、音声格納部２２０が格納している音声が音声出力部２２４から出力されているときに当該音声の出力が制限された場合に、制限された回数を計数して格納する。そして、音声出力制御部２２２は、制限回数格納部２３６が格納する回数がより少ない音声をより優先的に選択する。なお、音声出力制御部２２２は、出力回数保持部２３０が保持している出力回数と制限回数格納部２３６が格納している回数とから、音声の出力が制限された制限比率を算出して、算出した制限比率がより小さい音声をより優先的に選択してもよい。 The limited number storage unit 236 counts and stores the limited number of times when the sound stored in the sound storage unit 220 is output from the sound output unit 224 and the output of the sound is limited. . Then, the audio output control unit 222 preferentially selects an audio with a smaller number of times stored in the limit number storage unit 236. The audio output control unit 222 calculates a limit ratio in which the output of audio is limited from the number of outputs held by the output number holding unit 230 and the number of times stored by the limit number storage unit 236, A voice with a smaller calculated restriction ratio may be selected with higher priority.

出力時刻検出部２４４は、出力指示受付部２４０が指示を受け付けた時刻を検出する。そして、出力許容時間設定部２４２は、出力時刻検出部２４４が検出した時刻と、画像格納部２１０が格納している複数の画像が撮像された時刻との差に基づいて、画像格納部２１０が格納している複数の画像が撮像された時刻と、音声格納部２２０が格納している複数の音声から選択する音声が録音された時刻との差の許容範囲を設定する。そして、音声出力制御部２２２は、出力許容時間設定部２４２が設定した許容範囲の範囲内で録音された音声の中から、音声出力部２２４に出力させる音声を選択する。 The output time detection unit 244 detects the time when the output instruction reception unit 240 receives the instruction. Then, based on the difference between the time detected by the output time detection unit 244 and the time when the plurality of images stored in the image storage unit 210 are captured, the output allowable time setting unit 242 An allowable range of a difference between the time when the plurality of stored images are captured and the time when the sound selected from the plurality of sounds stored in the sound storage unit 220 is recorded is set. Then, the audio output control unit 222 selects the audio to be output to the audio output unit 224 from the audio recorded within the allowable range set by the output allowable time setting unit 242.

具体的には、出力許容時間設定部２４２は、出力時刻検出部２４４が検出した時刻と、画像格納部２１０が格納している複数の画像が撮像された時刻との差がより大きい場合に、画像格納部２１０が格納している複数の画像が撮像された時刻と、音声格納部２２０が格納している複数の音声から選択する音声が録音された時刻との差の許容範囲をより大きく設定する。このため、出力装置１４０はより近い過去に撮像された画像を、撮像時刻の近くで録音された音声の中から選択して出力するので、画像とともに出力される音声がユーザ１８０にとって不自然なものになることを防ぐことができる。また、出力装置１４０は、遠い過去の画像を出力するときには、より広い時間範囲で録音された音声の中から選択される音声を出力することができるので、ユーザ１８０はより特徴的な音声を楽しむことができる。 Specifically, the output allowable time setting unit 242 has a larger difference between the time detected by the output time detection unit 244 and the time when the plurality of images stored in the image storage unit 210 are captured. A larger allowable range of the difference between the time when the plurality of images stored in the image storage unit 210 are captured and the time when the sound selected from the plurality of sounds stored in the sound storage unit 220 is recorded To do. For this reason, since the output device 140 selects and outputs an image captured in the near past from the sound recorded near the imaging time, the sound output together with the image is unnatural for the user 180. Can be prevented. In addition, when the output device 140 outputs a distant past image, it can output a sound selected from sounds recorded in a wider time range, so that the user 180 enjoys a more characteristic sound. be able to.

以上説明した出力装置１４０によれば、同じ画像を再度出力するときでも、複数の音声の中から出力する音声を選択して出力するので、ユーザ１８０は飽きることなく音声と画像とを鑑賞することができる。また、ユーザ１８０は、出力装置１４０に出力させる画像が撮像されたときのより特徴的な音声を、当該画像とともに鑑賞することができる。 According to the output device 140 described above, even when the same image is output again, the output sound is selected and output from a plurality of sounds, so that the user 180 can appreciate the sound and the image without getting bored. Can do. Further, the user 180 can appreciate a more characteristic sound when an image to be output to the output device 140 is captured together with the image.

図３は、音声格納部２２０が格納するデータの一例をテーブル形式で示す。音声格納部２２０は、複数の音声データに対応付けて、複数の音声データを識別する音声ＩＤ及び音声が録音された時刻を格納する。なお、音声が録音された時刻とは、録音開始時刻であってよく、録音終了時刻であってもよい。また、音声が録音された時刻には、音声が録音された日付を含んでよい。 FIG. 3 shows an example of data stored in the voice storage unit 220 in a table format. The voice storage unit 220 stores a voice ID for identifying a plurality of voice data and a time when the voice is recorded in association with the plurality of voice data. Note that the time when the sound is recorded may be a recording start time or a recording end time. The time when the sound is recorded may include the date when the sound is recorded.

図４は、音声の録音と画像の撮像との時間関係の一例を示す。撮像装置１００は、動作モードとして、待機モード、撮像モード、及び出力モードを有する。そして、出力装置１４０は、撮像装置１００が待機モード、撮像モード、出力モードのいずれかである期間に録音した音声を格納する。なお、撮像モードは、撮像装置１００が撮像及び／又は録音することのできる動作モードであってよい。例えば、撮像モードとは、撮像装置１００がユーザ１８０によるレリーズボタンの押下によって画像を撮像することができる動作モードであってよい。また、出力モードは、撮像装置１００が画像及び／又は音声を出力することのできる動作モードであってよい。例えば、出力モードとは、撮像装置１００がメモリ等に格納している画像を読み出して、モニタ画面等の表示デバイスに表示することのできる動作モードであってよい。 FIG. 4 shows an example of the time relationship between audio recording and image capturing. The imaging apparatus 100 has a standby mode, an imaging mode, and an output mode as operation modes. The output device 140 stores the sound recorded during the period when the imaging device 100 is in the standby mode, the imaging mode, or the output mode. Note that the imaging mode may be an operation mode in which the imaging apparatus 100 can capture and / or record. For example, the imaging mode may be an operation mode in which the imaging apparatus 100 can capture an image when the user 180 presses the release button. The output mode may be an operation mode in which the imaging apparatus 100 can output an image and / or sound. For example, the output mode may be an operation mode in which the image capturing apparatus 100 reads an image stored in a memory or the like and can display the image on a display device such as a monitor screen.

なお、撮像装置１００は、起動された直後には待機モードに設定される。なお、撮像装置１００は、動作モードが待機モード又は出力モードに設定されている場合に、ユーザ１８０によって撮像動作又は録音動作に関する操作がなされた場合に撮像モードに遷移する。撮像動作に関する操作は、例えば、画像を撮像する操作、シャッタスピード、焦点距離等の、撮像条件を調整する操作等を含む。また、録音動作に関する操作は、例えば、音声を録音する操作、録音感度の調整等の、録音条件を調整する操作等を含む。また、撮像装置１００は、動作モードが待機モード又は撮像モードに設定されている場合に、ユーザ１８０によって撮像装置１００の出力動作に関する操作がなされた場合に、出力モードに遷移する。出力動作に関する操作は、例えば、画像を出力する操作、出力する画像を選択する操作、出力速度の調節等の、出力条件を調整する操作等を含む。なお、撮像装置１００は、撮像装置１００が撮像モード又は出力モードに設定されている場合に、ユーザ１８０による撮像装置１００の操作が所定の期間操作されなかったことを条件として、待機モードに遷移してよい。このように、撮像装置１００は、動作モードが撮像モード以外の待機モード又は出力モードに設定されている間においてユーザ１８０によるレリーズボタンの押下を受け付けると、画像を撮像せずに動作モードを撮像モードに遷移させる。その他、待機モード又は出力モードとは、撮像装置１００がユーザ１８０によるレリーズボタンの押下を受け付けない動作モードであってよい。 Note that the imaging apparatus 100 is set to the standby mode immediately after being activated. Note that when the operation mode is set to the standby mode or the output mode, the imaging apparatus 100 transitions to the imaging mode when the user 180 performs an operation related to the imaging operation or the recording operation. The operation related to the imaging operation includes, for example, an operation for imaging an image, an operation for adjusting imaging conditions such as a shutter speed and a focal length, and the like. The operation related to the recording operation includes, for example, an operation for adjusting recording conditions such as an operation for recording sound and an adjustment of recording sensitivity. In addition, when the operation mode is set to the standby mode or the imaging mode, the imaging device 100 transitions to the output mode when the user 180 performs an operation related to the output operation of the imaging device 100. The operation related to the output operation includes, for example, an operation for outputting an image, an operation for selecting an image to be output, an operation for adjusting output conditions such as adjustment of an output speed, and the like. Note that when the imaging apparatus 100 is set to the imaging mode or the output mode, the imaging apparatus 100 transitions to the standby mode on condition that the operation of the imaging apparatus 100 by the user 180 is not operated for a predetermined period. You can. As described above, when the operation mode is set to the standby mode or the output mode other than the imaging mode and the user presses the release button pressed by the user 180, the imaging apparatus 100 changes the operation mode to the imaging mode without capturing an image. Transition to. In addition, the standby mode or the output mode may be an operation mode in which the imaging apparatus 100 does not accept the user 180 pressing the release button.

音声格納部２２０は、撮像装置１００が撮像モードに設定されている期間（ｔ４２２〜ｔ４２８）に撮像装置１００によって録音された音声＃４０４及び＃４０６を格納している。また、音声格納部２２０は、撮像装置１００が出力モードに設定されている期間（ｔ４２８〜ｔ４３０）に撮像装置１００によって録音された音声（＃４０８及び＃４１０）、及び撮像装置１００が待機モードに設定されている期間（ｔ４２０〜ｔ４２２及びｔ４３０〜ｔ４３２）に録音された音声（＃４０２、＃４０３、及び＃４１２）をも格納している。 The voice storage unit 220 stores voices # 404 and # 406 recorded by the imaging device 100 during a period (t422 to t428) when the imaging device 100 is set to the imaging mode. In addition, the audio storage unit 220 includes audio (# 408 and # 410) recorded by the imaging apparatus 100 during the period (t428 to t430) in which the imaging apparatus 100 is set to the output mode, and the imaging apparatus 100 is in the standby mode. Voices (# 402, # 403, and # 412) recorded during the set periods (t420 to t422 and t430 to t432) are also stored.

また、音声格納部２２０は、画像出力部２１４が１つの画像を出力する予め設定された出力時間（Δｔ４５２）に、画像格納部２１０が格納する複数の画像の数を乗じた時間より長い音声を格納している。例えば、音声格納部２２０は１つの画像を出力する出力時間がΔｔ４５２であるとき、音声格納部２２０は、時刻ｔ４２４及び時刻ｔ４２６に撮像された２つの画像を出力する音声として、例えば音声の長さが出力時間Δｔ４５２を持つ２つ以上の音声（＃４０４、＃４０６、＃４０８、＃４１０、＃４１２）及び出力時間Δｔ４５２の半分の長さΔｔ４５１を有する音声（＃４０２及び＃４０３）を格納している。このため、音声出力制御部２２２は、画像出力部２１４から画像が出力される度に、異なる音声を複数の音声の中から選択して出力することができる。 The audio storage unit 220 also outputs audio longer than the time obtained by multiplying the preset output time (Δt452) for the image output unit 214 to output one image by the number of images stored in the image storage unit 210. Storing. For example, when the audio storage unit 220 outputs one image at Δt452, the audio storage unit 220 outputs, for example, the length of the audio as audio that outputs two images captured at time t424 and time t426. Stores two or more voices (# 404, # 406, # 408, # 410, # 412) having an output time Δt452 and voices (# 402 and # 403) having a length Δt451 which is half the output time Δt452. ing. For this reason, every time an image is output from the image output unit 214, the audio output control unit 222 can select and output different sounds from a plurality of sounds.

なお、音声出力制御部２２２は、音声出力部２２４に出力させる音声を選択する場合に、選択される音声の合計の時間が、画像出力部２１４が１つの画像を出力する予め設定された出力時間となるように、複数の音声を選択して音声出力部２２４に出力させてもよい。例えば、音声出力制御部２２２は、時刻ｔ４２４に撮像された画像が出力されるときに、出力時間Δｔ４５２の半分の長さを有する音声＃４０２及び音声＃４０３を選択して、音声出力部２２４に出力させてよい。また、音声出力制御部２２２は、出力時間Δｔ４５２よりも長い音声を音声格納部２２０が格納している場合には、当該音声を出力時間Δｔ４５２の長さに分割された音声＃４０８又は＃４１０を選択して、音声出力部２２４に出力させてもよい。 When the audio output control unit 222 selects audio to be output to the audio output unit 224, the total time of the selected audio is the preset output time for the image output unit 214 to output one image. A plurality of sounds may be selected and output to the sound output unit 224 so that For example, when the image captured at time t424 is output, the audio output control unit 222 selects the audio # 402 and the audio # 403 having a length that is half the output time Δt452, and outputs the audio # 402 to the audio output unit 224. It may be output. In addition, when the audio storage unit 220 stores audio longer than the output time Δt452, the audio output control unit 222 outputs the audio # 408 or # 410 obtained by dividing the audio into the length of the output time Δt452. You may select and make it output to the audio | voice output part 224.

なお、画像出力制御部２１２は、画像出力部２１４から出力される画像の数と、出力すべき複数の音声の合計の時間に基づいて、画像出力部２１４から画像を出力させる出力時間を調整してもよい。例えば、ユーザ１８０から出力すべき複数の音声及び画像を出力指示受付部２４０が受け付けたときに、画像出力制御部２１２は、当該出力すべき複数の音声の合計の時間を出力すべき画像の数で除した期間、各画像を画像出力部２１４から出力させる。 The image output control unit 212 adjusts the output time for outputting the image from the image output unit 214 based on the number of images output from the image output unit 214 and the total time of a plurality of sounds to be output. May be. For example, when the output instruction receiving unit 240 receives a plurality of sounds and images to be output from the user 180, the image output control unit 212 outputs the total time of the plurality of sounds to be output. Each image is output from the image output unit 214 during the period divided by.

なお、音声出力制御部２２２は、時刻ｔ４２４において撮像された画像を出力する旨の指示を出力指示受付部２４０から受け取った場合に、当該画像が出力される毎に、当該画像が撮像された時刻から近い順に出力する。例えば、音声出力制御部２２２は、時刻ｔ４２４に撮像された画像を出力する場合には、音声＃４０４、＃４０６、＃４０２、＃４０３、＃４０８、＃４１０、及び＃４１２の順に選択して、音声出力部２２４に出力させる。また、音声出力制御部２２２は、音量が大きい順に、例えば音声＃４０６、＃４０４、＃４０８、＃４１０、＃４０３、＃４０２、＃４１２の順に選択して出力する。なお、音声出力制御部２２２は、予め定めた閾値音量よりも大きい音量の音声を選択してもよい。このとき、音声出力制御部２２２は、選択される音声の合計の長さが、予め定めた出力時間よりも長くなるように閾値音量を設定してもよい。 In addition, when the audio output control unit 222 receives an instruction to output the image captured at time t424 from the output instruction reception unit 240, the time when the image is captured every time the image is output. Output in ascending order. For example, when outputting an image captured at time t424, the audio output control unit 222 selects audio # 404, # 406, # 402, # 403, # 408, # 410, and # 412 in this order. The audio output unit 224 is made to output. In addition, the audio output control unit 222 selects and outputs, in order of increasing volume, for example, audio # 406, # 404, # 408, # 410, # 403, # 402, # 412. Note that the audio output control unit 222 may select a sound having a volume larger than a predetermined threshold volume. At this time, the audio output control unit 222 may set the threshold volume so that the total length of the selected audio is longer than a predetermined output time.

なお、撮像装置１００は、撮像装置１００の周囲の音声のうちで、予め設定された設定音量より大きい音声を録音してもよい。例えば、撮像装置１００は、予め設定された閾値音量より大きい音声を録音してもよい。また、撮像装置１００は、録音感度を変更することによって変更された設定音量より大きい音声を録音してもよい。そして、撮像装置１００は、当該設定音量に対応づけて録音した音声を格納してよい。そして、画像格納部２１０は、撮像装置１００が録音した音声を設定音量に対応づけて取得して格納する。そして、音声出力制御部２２２は、音声に対応づけられている設定音量に対する音声の音量の比を算出し、当該比が大きい順に音声を選択してよい。これにより、音声出力制御部２２２は、設定音量に対する音声の音量の比率がより大きい順に選択することができる。他にも、音声出力制御部２２２は、録音された音量と録音時の設定音量との差の、設定音量に対する比がより大きい音量の順に、音声を選択してもよい。これにより、音声の音量の絶対値が小さくても、録音時に閾値音量を下げて録音した音声については、出力装置１４０において再生され易くなる。例えばユーザ１８０が虫を撮像しながら閾値音量を下げて虫の小さな鳴き声を録音した場合には、小さな虫の鳴き声が出力装置１４０から再生され易くなる。このように、出力装置１４０によれば、ユーザ１８０の撮像時の意図を反映した音声を優先的に再生することができる。 Note that the imaging apparatus 100 may record a voice that is larger than a preset sound volume among voices around the imaging apparatus 100. For example, the imaging apparatus 100 may record a sound that is larger than a preset threshold volume. In addition, the imaging apparatus 100 may record a sound that is larger than the set volume changed by changing the recording sensitivity. And the imaging device 100 may store the sound recorded in association with the set sound volume. The image storage unit 210 acquires and stores the sound recorded by the imaging apparatus 100 in association with the set volume. Then, the audio output control unit 222 may calculate the ratio of the sound volume to the set sound volume associated with the sound, and may select the sound in descending order of the ratio. As a result, the audio output control unit 222 can select in the descending order of the ratio of the sound volume to the set sound volume. In addition, the sound output control unit 222 may select the sound in the order of the volume in which the difference between the recorded volume and the set volume at the time of recording is larger than the set volume. As a result, even if the absolute value of the sound volume is small, the sound that is recorded with the threshold sound volume lowered during recording is easily reproduced on the output device 140. For example, when the user 180 records a small cry of insects by lowering the threshold volume while imaging the worm, the small cry of insects is easily reproduced from the output device 140. As described above, according to the output device 140, it is possible to preferentially reproduce the sound reflecting the intention of the user 180 at the time of imaging.

図５は、音声格納部２２０が格納するデータの他の一例をテーブル形式で示す。音声格納部２２０は、音声データに対応付けて、音声データを識別する音声ＩＤ、及び音声が録音された時刻、及びインデックスを格納する。なお、音声格納部２２０が格納する時刻とは、例えば録音開始時刻であってよい。音声格納部２２０が格納するインデックスとは、例えば音声が録音されたタイミングを示す値であってよい。例えば、音声格納部２２０は、録音開始時刻からの、それぞれの音声が録音された順番を示す値を、インデックスとして格納する。音声出力制御部２２２は、音声格納部２２０が格納する時刻及びインデックスから、音声が録音されたタイミングを判断する。そして音声出力制御部２２２は、音声が録音されたタイミングが、画像出力部２１４から出力される画像が撮像されたタイミングから近い順に音声を選択してもよい。 FIG. 5 shows another example of data stored in the voice storage unit 220 in a table format. The voice storage unit 220 stores a voice ID for identifying voice data, a time when the voice was recorded, and an index in association with the voice data. Note that the time stored in the voice storage unit 220 may be, for example, a recording start time. The index stored in the voice storage unit 220 may be a value indicating the timing at which voice is recorded, for example. For example, the voice storage unit 220 stores, as an index, a value indicating the order in which each voice is recorded from the recording start time. The audio output control unit 222 determines the timing at which audio is recorded from the time and index stored in the audio storage unit 220. Then, the audio output control unit 222 may select the audio in the order in which the timing at which the audio is recorded is closer to the timing at which the image output from the image output unit 214 is captured.

なお、音声格納部２２０は、音声データの音量に対応づけて音声データを格納してよい。ここでいう音量とは、音声データの音量の時間的平均値である平均音量であってよく、音声データの最大音量又は最小音量であってもよい。また、音量とは、最大音量と最小音量の平均値である中間音量であってもよい。これにより、音声出力制御部２２２は、音量の大きさの順で音声を順次選択して再生する場合に、速やかに音声を選択することができる。 Note that the voice storage unit 220 may store the voice data in association with the volume of the voice data. Here, the volume may be an average volume that is a temporal average value of the volume of the audio data, and may be a maximum volume or a minimum volume of the audio data. Further, the volume may be an intermediate volume that is an average value of the maximum volume and the minimum volume. As a result, the audio output control unit 222 can quickly select the audio when the audio is sequentially selected and reproduced in the order of the volume.

また、音声格納部２２０は、音声の音量の時間変化に対応づけて音声データを格納してよい。音量の時間変化とは、音声の音量の時間変化そのものであってよいし、音量の大きさの変化を示す情報（例えば、増加速度又は減少速度等）であってよい。そして、音声出力制御部２２２は、画像出力部２１４が表示する画像の大きさ又は大きさの変化に応じて、音声格納部２２０から音声を選択して音声出力部２２４に出力させてよい。例えば、音声出力制御部２２２は、画像出力部２１４が画像を拡大させながら表示する場合には、音量が増大する音声を音声格納部２２０が格納する音声から選択して音声出力部２２４に出力させ、画像出力部２１４が画像を縮小させながら表示する場合には、音量が減少する音声を音声格納部２２０が格納する音声から選択して音声出力部２２４に出力させてよい。 In addition, the voice storage unit 220 may store voice data in association with a time change in the volume of the voice. The time change of the volume may be the time change itself of the sound volume, or may be information (for example, an increase speed or a decrease speed) indicating a change in the volume level. Then, the audio output control unit 222 may select audio from the audio storage unit 220 and output the audio to the audio output unit 224 in accordance with the size of the image displayed by the image output unit 214 or a change in the size. For example, when the image output unit 214 displays the image while enlarging the image, the audio output control unit 222 selects the audio whose volume is increased from the audio stored in the audio storage unit 220 and causes the audio output unit 224 to output it. When the image output unit 214 displays the image while reducing the image, the sound whose volume is reduced may be selected from the sound stored in the sound storage unit 220 and output to the sound output unit 224.

また、音声格納部２２０は、ステレオ録音された音声について、右及び左の音量の時間変化に対応づけて音声データを格納してよい。右及び左の音量の時間変化とは、右及び左の音量の時間変化そのものであってよいし、音量の大きさの変化を示す情報（例えば、増加速度又は減少程度）であってよい。そして、音声出力制御部２２２は、画像出力部２１４が表示する画像の位置又は位置の変化に応じて、音声格納部２２０から音声を選択して出力してよい。例えば、音声出力制御部２２２は、画像出力部２１４が画像を右から左にスライドさせて表示する場合には、左の音量が増大し、かつ、右の音量が減少する音声を、音声格納部２２０が格納する音声から選択して音声出力部２２４に出力させてよい。これにより、出力装置１４０は、表示される画像の位置、大きさに応じた望ましい音楽を再生することができる。 In addition, the audio storage unit 220 may store audio data in association with time changes in the right and left sound volumes for audio recorded in stereo. The time change of the right and left volume may be the time change of the right and left volume itself, or may be information indicating the change in the volume level (for example, an increase rate or a decrease degree). Then, the audio output control unit 222 may select and output audio from the audio storage unit 220 in accordance with the position of the image displayed by the image output unit 214 or a change in the position. For example, when the image output unit 214 slides and displays an image from the right to the left, the audio output control unit 222 displays the audio whose left volume increases and the right volume decreases as an audio storage unit. 220 may be selected from the audio stored in the audio and output to the audio output unit 224. Thereby, the output device 140 can reproduce desired music according to the position and size of the displayed image.

図６は、目標回数格納部２３２が格納するデータの一例をテーブル形式で示す。目標回数格納部２３２は、音声格納部２２０が格納する音声ＩＤに対応づけて、当該音声ＩＤで識別される音声が音声出力部２２４に出力されるべき回数である目標回数を格納する。なお、出力回数保持部２３０は、音声格納部２２０が格納する音声ＩＤに対応づけて、音声出力部２２４が出力された音声が出力された出力回数を格納している。そして、音声出力制御部２２２は、目標回数から、出力回数保持部２３０が保持する出力回数を引いた値を計算して、当該値が大きい順に音声を選択して音声出力部２２４に出力させる。このため、例えば撮像装置１００が撮像したときのより特徴的な音声に対して目標回数をより多く設定することによって、画像を出力するときに、撮像したときの特徴的な音声を多く出力させることができる。そして、撮像したときの特徴的な音声が何度も出力された後には他の音声も時々出力されていくので、ユーザ１８０は飽きることなく画像を鑑賞することができる。 FIG. 6 illustrates an example of data stored in the target number storage unit 232 in a table format. The target number of times storage unit 232 stores the target number of times that is the number of times that the voice identified by the voice ID should be output to the voice output unit 224 in association with the voice ID stored in the voice storage unit 220. Note that the output count holding unit 230 stores the number of times the voice output from the voice output unit 224 is output in association with the voice ID stored in the voice storage unit 220. Then, the sound output control unit 222 calculates a value obtained by subtracting the number of outputs held by the output number holding unit 230 from the target number of times, and selects the sound in the descending order of the value and causes the sound output unit 224 to output it. For this reason, for example, by setting a larger number of target times with respect to more characteristic sound when the image capturing apparatus 100 captures an image, when outputting an image, a large amount of characteristic sound when the image is captured is output. Can do. Then, after the characteristic sound at the time of imaging is output many times, other sounds are also output from time to time, so that the user 180 can appreciate the image without getting bored.

なお、目標回数格納部２３２は、ユーザ１８０によって設定された目標回数を格納してよい。他にも、目標回数格納部２３２は、音声格納部２２０が格納する音声の持つ音量に基づいて目標回数を設定してもよい。例えば、目標回数格納部２３２は、音量のより大きい音声に対して目標回数をより多く設定してもよい。他にも、目標回数格納部２３２は、人の声が含まれる音声が出力されるべき目標回数を、人の声が含まれない音声が出力されるべき目標回数よりも多く設定してもよい。 The target number storage unit 232 may store the target number set by the user 180. In addition, the target number storage unit 232 may set the target number of times based on the volume of the sound stored in the sound storage unit 220. For example, the target number storage unit 232 may set a larger target number for a sound with a higher volume. In addition, the target number of times storage unit 232 may set the target number of times that a voice including a human voice should be output more than the target number of times that a voice that does not include a human voice should be output. .

また、目標回数格納部２３２は、複数の音声が出力されるべき回数である目標回数を、画像格納部２１０が格納する画像毎に格納してよい。具体的には、目標回数格納部２３２は、画像格納部２１０が格納する画像ＩＤ、音声ＩＤ、及び目標回数を格納する。そして音声出力制御部２２２は、画像出力部２１４から画像が出力されるときに、当該画像を識別する画像ＩＤに対応付けて格納された複数の音声の中から、目標回数から出力回数を引いた値が大きい順に音声を選択して、音声出力部２２４に出力させる。 Further, the target number storage unit 232 may store a target number of times that a plurality of sounds should be output for each image stored in the image storage unit 210. Specifically, the target number storage unit 232 stores the image ID, the sound ID, and the target number of times that the image storage unit 210 stores. When the image output unit 214 outputs an image, the audio output control unit 222 subtracts the output number from the target number of times from among a plurality of sounds stored in association with the image ID for identifying the image. Voices are selected in descending order of value and output to the voice output unit 224.

図７は、出力比率格納部２３４が格納するデータの一例をテーブル形式で示す。出力比率格納部２３４は、音声格納部２２０が格納する音声ＩＤに対応づけて、当該音声ＩＤで識別される音声が音声出力部２２４に出力されるべき回数の比率である出力比率を格納する。そして、音声出力制御部２２２は、出力回数保持部２３０が保持する出力回数に基づいてそれぞれの音声が出力された回数の比率を計算して、出力回数の比率が、出力比率格納部２３４が格納する出力比率に近づくように音声を選択する。このため、例えば撮像装置１００によって撮像したときの特徴的な音声に対して出力比率を大きく設定すると、撮像したときのより特徴的な音声をより多く出力させることができる。このため、ユーザ１８０は、画像を鑑賞ながらいろいろな音声を楽しみつつ、撮像したときの特徴的な音声を何度も楽しむことができる。 FIG. 7 shows an example of data stored in the output ratio storage unit 234 in a table format. The output ratio storage unit 234 stores an output ratio that is a ratio of the number of times that the voice identified by the voice ID is to be output to the voice output unit 224 in association with the voice ID stored in the voice storage unit 220. Then, the sound output control unit 222 calculates the ratio of the number of times each sound is output based on the number of outputs held by the output number holding unit 230, and the output ratio storage unit 234 stores the ratio of the number of output times. Select the audio so that it approaches the output ratio. For this reason, for example, when the output ratio is set to be large with respect to the characteristic sound when the image is captured by the image capturing apparatus 100, more characteristic sound when the image is captured can be output more. For this reason, the user 180 can enjoy a variety of sounds while viewing the image, and can enjoy the characteristic sounds when the image is captured many times.

なお、出力比率格納部２３４は、ユーザ１８０によって設定された出力比率を格納してよい。他にも、出力比率格納部２３４は、音声格納部２２０が格納する音声の持つ音量に基づいて出力比率を設定してもよい。例えば、出力比率格納部２３４は、音量のより大きい音声を出力する出力比率をより大きく設定してもよい。他にも、出力比率格納部２３４は、音声格納部２２０が格納する音声のうち、人の声が含まれる音声が出力されるべき出力比率を、人の声が含まれない音声が出力される出力比率よりも多く設定してもよい。 The output ratio storage unit 234 may store the output ratio set by the user 180. In addition, the output ratio storage unit 234 may set the output ratio based on the volume of the sound stored in the sound storage unit 220. For example, the output ratio storage unit 234 may set a larger output ratio for outputting sound with a higher volume. In addition, the output ratio storage unit 234 outputs the output ratio at which the voice including the human voice is to be output among the voices stored in the voice storage unit 220, and the voice that does not include the human voice is output. You may set more than an output ratio.

また、画像格納部２１０は、複数の音声が出力されるべき回数の比率である出力比率を、画像格納部２１０が格納する画像毎に格納してよい。具体的には、出力比率格納部２３４は、画像格納部２１０が格納する画像ＩＤ、音声ＩＤ、及び出力比率を格納する。そして音声出力制御部２２２は、画像出力部２１４から画像が出力されるときに、出力される画像を識別する画像ＩＤに対応付けて格納された複数の音声の中から、出力比率格納部２３４が格納する出力比率に出力回数の比率が近づくように音声を選択して、音声出力部２２４に出力させてよい。 The image storage unit 210 may store an output ratio, which is a ratio of the number of times a plurality of sounds should be output, for each image stored in the image storage unit 210. Specifically, the output ratio storage unit 234 stores the image ID, audio ID, and output ratio stored in the image storage unit 210. Then, when the image is output from the image output unit 214, the audio output control unit 222 has the output ratio storage unit 234 select from the plurality of sounds stored in association with the image ID for identifying the output image. Audio may be selected and output to the audio output unit 224 such that the ratio of the number of output approaches the output ratio to be stored.

なお、制限回数格納部２３６は、音声格納部２２０が格納している音声が音声出力部２２４から出力されているときに音声の出力が制限された制限回数を、画像ＩＤに対応づけて格納する。例えば、制限回数格納部２３６は、音声出力部２２４が音声を再生している場合に、ユーザ１８０による音声の早送り操作等によって音声の再生がキャンセルされる毎に、当該音声の音声ＩＤに対応づけて格納している制限回数を１増加させる。また、制限回数格納部２３６は、ユーザ１８０によるボリュームの操作によって音声出力部２２４が再生している音声の音量が低下させられる毎に、当該音声の音声ＩＤに対応づけて格納する制限回数を増加させてもよい。また、制限回数格納部２３６は、音声出力部２２４が再生している音声の音量の低下量に応じて、格納している制限回数を増加させてもよい。例えば、制限回数格納部２３６は、音声出力部２２４が再生している音声の音量の低下量が予め定められた基準低下量より大きいことを条件として、格納している制限回数を１増加させてよい。そして、制限回数格納部２３６は、音量の低下量が予め定められた基準低下量より小さい場合には、音量の低下量に応じて予め定められた増加回数（例えば、０より大きい、１未満の増加回数）だけ、格納している制限回数を増加させてよい。 The limit number storage unit 236 stores the limit number of times that the output of sound is limited when the sound stored in the sound storage unit 220 is output from the sound output unit 224 in association with the image ID. . For example, when the audio output unit 224 is reproducing audio, the limit count storage unit 236 associates the audio number with the audio ID of the audio every time the audio 180 is canceled due to a fast-forward operation of the audio by the user 180 or the like. Increase the stored limit count by one. In addition, every time the volume of the sound reproduced by the sound output unit 224 is reduced by the volume operation by the user 180, the limit number storage unit 236 increases the limit number of times stored in association with the sound ID of the sound. You may let them. Further, the limit number storage unit 236 may increase the stored limit number in accordance with the amount of decrease in the volume of the sound reproduced by the audio output unit 224. For example, the limit count storage unit 236 increases the stored limit count by 1 on condition that the volume reduction amount of the sound reproduced by the audio output unit 224 is larger than a predetermined reference decrease amount. Good. When the volume reduction amount is smaller than a predetermined reference reduction amount, the limit number storage unit 236 increases a predetermined number of times according to the volume reduction amount (for example, greater than 0 and less than 1). The stored limit number may be increased by the (increase number).

そして、目標回数格納部２３２は、制限回数格納部２３６が格納する回数がより少ない音声の音声ＩＤに対応づけて格納している目標回数をより大きく設定する。また、出力比率格納部２３４は、制限回数格納部２３６が格納する回数がより少ない音声の音声ＩＤに対応づけて格納している出力比率をより大きく設定する。これにより、音声出力制御部２２２は、制限回数格納部２３６が格納する回数がより少ない音声を音声出力部２２４からより高い頻度で出力させることができる。なお、目標回数格納部２３２又は出力比率格納部２３４は、制限回数格納部２３６が格納している回数を出力回数保持部２３０が保持している出力回数で除した値である制限比率を算出して、算出した制限比率がより小さい音声の音声ＩＤに対応づけて格納する目標回数又は出力比率をより大きく設定してもよい。 Then, the target number storage unit 232 sets the target number of times stored in association with the voice ID of the voice whose number of times stored by the limited number of times storage unit 236 is smaller. Further, the output ratio storage unit 234 sets a larger output ratio stored in association with the voice ID of the voice having the smaller number of times the limit number storage unit 236 stores. Thereby, the audio output control unit 222 can cause the audio output unit 224 to output audio with a lower number of times stored in the limit number storage unit 236 at a higher frequency. The target number storage unit 232 or the output ratio storage unit 234 calculates a limit ratio that is a value obtained by dividing the number of times stored in the limit number storage unit 236 by the number of outputs stored in the output number storage unit 230. Thus, the target number of times or the output ratio stored in association with the voice ID of the voice having a smaller calculated limit ratio may be set larger.

なお、目標回数格納部２３２は、音声格納部２２０が撮像モードに対応づけて格納している音声の目標回数をより大きく設定して格納してよい。また、出力比率格納部２３４は、音声格納部２２０が撮像モードに対応づけて格納している音声の出力比率をより大きく設定して格納してもよい。これにより、音声出力制御部２２２は、撮像装置１００が撮像モードに設定されている間に録音された音声を、待機モード及び出力モードに設定されている間に録音された音声より高い頻度で音声出力部２２４から出力させることができる。なお、目標回数格納部２３２及び出力比率格納部２３４は、制限回数の逆数で示される重み付け係数で重み付けされた目標回数及び出力比率をそれぞれ格納してよい。また、目標回数格納部２３２及び出力比率格納部２３４は、撮像モードに対応づけて格納される音声の目標回数及び出力比率を、待機モード又は出力モードに対応づけて格納される音声より大きい重み付け係数で重み付けして算出してもよい。 Note that the target number of times storage unit 232 may set the target number of times of the sound stored in the sound storage unit 220 in association with the imaging mode to be set larger. In addition, the output ratio storage unit 234 may set the output ratio of the audio stored in the audio storage unit 220 in association with the imaging mode to be set larger. As a result, the audio output control unit 222 causes the audio recorded while the imaging apparatus 100 is set to the imaging mode to be output more frequently than the audio recorded while the standby mode and the output mode are set. The data can be output from the output unit 224. Note that the target number storage unit 232 and the output ratio storage unit 234 may store the target number of times and the output ratio weighted by the weighting coefficient indicated by the reciprocal of the limit number of times, respectively. In addition, the target number of times storage unit 232 and the output ratio storage unit 234 set the target number of times and the output ratio of the sound stored in association with the imaging mode to a weighting coefficient larger than the sound stored in association with the standby mode or the output mode. It is also possible to calculate by weighting.

図８は、音声出力制御部２２２が音声を選択する時間範囲の一例を示す。例えば、ユーザ１８０から、時刻ｔ８０４で撮像された画像を出力する指示を時刻ｔ８０６において受け付けた場合に、出力許容時間設定部２４２は、出力を指示された時刻と出力される画像が撮像された時刻との差（ｔ８０６−ｔ８０４）に基づいて、音声出力部２２４から出力させる音声を選択させる許容範囲Δｔ８５２を決定する。そして、音声出力制御部２２２は、音声格納部２２０に格納されている音声のうち、時刻ｔ８０４からΔｔ８５２だけ前又は後の時間範囲（時刻ｔ８０４―Δｔ８５２〜時刻ｔ８０４＋Δｔ８５２）に録音された音声（＃８４１〜＃８４９）の中から音声を選択して、音声出力部２２４に出力させる。 FIG. 8 shows an example of a time range in which the audio output control unit 222 selects audio. For example, when an instruction to output the image captured at time t804 is received from the user 180 at time t806, the output allowable time setting unit 242 indicates the time when the output is instructed and the time when the output image is captured. Based on the difference (t806 to t804), an allowable range Δt852 for selecting a sound to be output from the sound output unit 224 is determined. Then, the audio output control unit 222 records the audio (# 841) recorded in the time range (time t804−Δt852 to time t804 + Δt852) before or after Δt852 from the time t804 among the audio stored in the audio storage unit 220. To # 849), the voice is selected and output to the voice output unit 224.

なお、音声出力制御部２２２は、時刻ｔ８０４から許容範囲Δｔ８５２だけ前の時刻から時刻ｔ８０４までの間に録音された音声を選択してもよいし、時刻ｔ８０４から許容範囲Δｔ８５２だけ後の時刻までの間に録音された音声を選択してもよい。 Note that the audio output control unit 222 may select the audio recorded from the time t804 before the allowable range Δt852 to the time t804, or from the time t804 to the time after the allowable range Δt852. Voices recorded in between may be selected.

また、出力許容時間設定部２４２は、画像格納部２１０が格納する撮像画像が撮像された時刻と、出力する指示を受け付けた時刻との差が大きいほど、音声出力部２２４から出力させる音声を選択させる許容範囲をより大きく設定する。図８の例では、出力許容時間設定部２４２は、時刻ｔ８０４よりも前の時刻ｔ８０２に撮像された画像を出力するよう時刻ｔ８０６において指示された場合には、許容範囲Δｔ８５２に比べて時間的により長い許容範囲Δｔ８５０を設定する。そして、音声出力制御部２２２は、時刻（ｔ８０２−Δｔ８５０）から時刻（ｔ８０２＋Δｔ８５０）までの時間範囲内で録音された音声（＃８１１〜＃８３４）の中から音声を選択して、音声出力部２２４に出力させる。 Further, the allowable output time setting unit 242 selects the sound to be output from the audio output unit 224 as the difference between the time when the captured image stored in the image storage unit 210 is captured and the time when the instruction to output is received is larger. Set a larger allowable range. In the example of FIG. 8, when the output allowable time setting unit 242 is instructed at time t806 to output an image captured at time t802 prior to time t804, the output allowable time setting unit 242 is more temporally compared with the allowable range Δt852. A long allowable range Δt850 is set. Then, the audio output control unit 222 selects the audio from the audios (# 811 to # 834) recorded within the time range from the time (t802-Δt850) to the time (t802 + Δt850), and the audio output unit 224 To output.

なお、出力許容時間設定部２４２は、撮像された時刻と出力を指示された時刻との間の時間を予め定められた数で割って得られた期間を許容範囲として設定してよい。例えば、音声出力制御部２２２は、１０日前に撮像した画像を出力するときには、撮像した時刻の前後１日の間に録音された音声の中から、出力する音声を選択する。また、小学校３年生のときの運動会の画像を４０年後に出力する場合には、撮像した時刻の前後４年の間に録音された音声から選択する。この場合、小学生時代の運動会の様子を鑑賞しながら、小学校への入学式、卒業式等の、より特徴的な音声が出力されるので、ユーザ１８０はより楽しく画像を鑑賞することができる。 Note that the output allowable time setting unit 242 may set a period obtained by dividing the time between the time when the image is taken and the time when the output is instructed by a predetermined number as an allowable range. For example, when outputting the image captured 10 days ago, the audio output control unit 222 selects the audio to be output from the audio recorded for one day before and after the imaging time. In addition, when an image of an athletic meet at the third grade of elementary school is output after 40 years, it is selected from voices recorded for four years before and after the time when the image was taken. In this case, while watching the state of the athletic meet in elementary school, more characteristic sounds such as an entrance ceremony to the elementary school and a graduation ceremony are output, so that the user 180 can enjoy the image more happily.

以上説明した出力装置１４０によれば、同じ画像を再度出力するときでも、複数の音声の中から出力する音声を選択して出力するので、ユーザ１８０は飽きることなく音声と画像とを容易に楽しむことができる。 According to the output device 140 described above, even when the same image is output again, the audio to be output is selected and output from among a plurality of sounds, so that the user 180 can easily enjoy the sound and the image without getting bored. be able to.

図９は、撮像装置１００のブロック構成を示す。図１０は、出力装置１４０の他の実施例におけるブロック構成を示す。本実施例における出力装置１４０及び撮像装置１００は、複数の画像又は画像をトリミングした複数のトリミング画像を用いて生成された出力画像に対して、適切な出力音声を生成して同期して出力する。撮像装置１００は、撮像部９１０、録音部９２０、オブジェクト抽出部９３０、オブジェクト位置特定部９４０、及び音声取得部９５０を備える。 FIG. 9 shows a block configuration of the imaging apparatus 100. FIG. 10 shows a block configuration in another embodiment of the output device 140. The output device 140 and the imaging device 100 according to the present exemplary embodiment generate and output an appropriate output sound in synchronization with an output image generated using a plurality of images or a plurality of trimmed images obtained by trimming the images. . The imaging apparatus 100 includes an imaging unit 910, a recording unit 920, an object extraction unit 930, an object position specifying unit 940, and a sound acquisition unit 950.

撮像部９１０は画像を撮像する。オブジェクト抽出部９３０は、撮像部９１０が撮像した画像に含まれるオブジェクトを抽出する。オブジェクト位置特定部９４０は、撮像部９１０が撮像した画像における、音声取得部９５０が取得した音声に関連するオブジェクトの位置を特定する。 The imaging unit 910 captures an image. The object extraction unit 930 extracts an object included in the image captured by the imaging unit 910. The object position specifying unit 940 specifies the position of an object related to the sound acquired by the sound acquisition unit 950 in the image captured by the image capturing unit 910.

音声取得部９５０は、オブジェクト抽出部９３０が抽出したオブジェクトに関連する音声を取得する。具体的には、音声取得部９５０は、オブジェクト抽出部９３０が抽出したオブジェクトの種類に関連する音声を、オブジェクトの種類に対応づけて音声を格納している音声データベース１９０から取得する。そして、音声格納部９６０は、オブジェクト位置特定部９４０が特定したオブジェクトの位置に対応づけて、音声取得部９５０が取得した音声を格納する。 The sound acquisition unit 950 acquires sound related to the object extracted by the object extraction unit 930. Specifically, the voice acquisition unit 950 acquires the voice related to the object type extracted by the object extraction unit 930 from the voice database 190 that stores the voice in association with the object type. The voice storage unit 960 stores the voice acquired by the voice acquisition unit 950 in association with the position of the object specified by the object position specifying unit 940.

なお、録音部９２０は、撮像部９１０の周囲の音声を録音する。なお、図１に関連して説明したマイクロホン１０２は、録音部９２０の一部であってよい。そして、音声取得部９５０は、オブジェクト抽出部９３０が抽出したオブジェクトに関連する音声を、録音部９２０が録音した音声から抽出してもよい。この場合、オブジェクト位置特定部９４０は、撮像部９１０が撮像した画像における、音声取得部９５０が抽出した音声に関連するオブジェクトの位置を特定する。そして、音声格納部９６０は、オブジェクト位置特定部９４０が特定したオブジェクトの位置に対応づけて、音声取得部９５０が抽出した音声を格納する。 Note that the recording unit 920 records the sound around the imaging unit 910. Note that the microphone 102 described with reference to FIG. 1 may be a part of the recording unit 920. Then, the voice acquisition unit 950 may extract the voice related to the object extracted by the object extraction unit 930 from the voice recorded by the recording unit 920. In this case, the object position specifying unit 940 specifies the position of the object related to the sound extracted by the sound acquisition unit 950 in the image captured by the image capturing unit 910. Then, the voice storage unit 960 stores the voice extracted by the voice acquisition unit 950 in association with the position of the object specified by the object position specifying unit 940.

出力装置１４０は、画像格納部１０１０、オブジェクト抽出部１０３０、オブジェクト位置特定部１０４０、音声取得部１０５０、音声格納部１０６０、部分領域範囲取得部１０２０、出力音声生成部１０７０、出力画像生成部１０７５、画像出力部１０８０、及び音声データベース１０９０を備える。 The output device 140 includes an image storage unit 1010, an object extraction unit 1030, an object position specifying unit 1040, an audio acquisition unit 1050, an audio storage unit 1060, a partial area range acquisition unit 1020, an output audio generation unit 1070, an output image generation unit 1075, An image output unit 1080 and an audio database 1090 are provided.

画像格納部１０１０は、画像を格納する。具体的には、画像格納部１０１０は、撮像装置１００が撮像した撮像画像を撮像装置１００から受け取って格納する。音声格納部１０６０は、画像格納部１０１０が格納している画像及び当該画像における位置に対応づけて、音声を格納する。具体的には、音声格納部１０６０は、撮像装置１００の音声格納部１０６０から撮像装置１００によって撮像された画像及び当該画像における位置に対応づけて記録された音声を取得して格納する。 The image storage unit 1010 stores an image. Specifically, the image storage unit 1010 receives a captured image captured by the imaging device 100 from the imaging device 100 and stores it. The audio storage unit 1060 stores audio in association with the image stored in the image storage unit 1010 and the position in the image. Specifically, the audio storage unit 1060 acquires and stores the image captured by the imaging device 100 from the audio storage unit 1060 of the imaging device 100 and the audio recorded in association with the position in the image.

部分領域範囲取得部１０２０は、画像格納部１０１０が格納している画像における少なくとも一部を含む部分領域の範囲を取得する。例えば、部分領域範囲取得部１０２０は、画像格納部１０１０が格納している画像に対するユーザ１８０によるトリミング操作を受け付けて、当該トリミング操作で示されるトリミング範囲を部分領域の範囲として取得する。 The partial area range acquisition unit 1020 acquires a partial area range including at least a part of the image stored in the image storage unit 1010. For example, the partial region range acquisition unit 1020 receives a trimming operation by the user 180 for the image stored in the image storage unit 1010, and acquires the trimming range indicated by the trimming operation as a partial region range.

出力画像生成部１０７５は、画像格納部１０１０が格納している画像における部分領域範囲取得部１０２０が取得した部分領域の範囲の画像から出力画像を生成する。出力音声生成部１０７０は、画像格納部１０１０が格納している画像において部分領域範囲取得部１０２０が取得した部分領域の範囲が存在する位置である全体画像内位置に対応づけて音声格納部１０６０が格納している音声から出力音声を生成する。 The output image generation unit 1075 generates an output image from the partial area range image acquired by the partial area range acquisition unit 1020 in the image stored in the image storage unit 1010. The output sound generation unit 1070 has the sound storage unit 1060 associated with the position in the entire image where the partial region range acquired by the partial region range acquisition unit 1020 exists in the image stored in the image storage unit 1010. Output sound is generated from the stored sound.

そして、画像出力部１０８０は、出力画像生成部１０７５が生成した出力画像と出力音声生成部１０７０が生成した出力音声とが同期して出力されるべく、当該出力画像と当該出力音声とを対応づけて出力する。なお、画像出力部１０８０は、出力画像と出力音声とを対応づけて記録媒体に記録してよい。また、画像出力部１０８０は、ディスプレイ等の表示デバイスに出力画像を表示するのと同期して、スピーカ等の再生デバイスから出力音声が出力されるように、出力画像と出力音声とを対応づけて出力してよい。このため、出力装置１４０は、ユーザ１８０のトリミング操作によって得られるトリミング画像を表示する場合に、トリミング画像に含まれるオブジェクトの代表的な音声等の適切な音声をトリミング画像に同期して再生することができる。 Then, the image output unit 1080 associates the output image with the output sound so that the output image generated by the output image generation unit 1075 and the output sound generated by the output sound generation unit 1070 are output in synchronization. Output. Note that the image output unit 1080 may record the output image and the output sound in a recording medium in association with each other. Further, the image output unit 1080 associates the output image with the output sound so that the output sound is output from the reproduction device such as a speaker in synchronization with the display of the output image on the display device such as a display. You may output. For this reason, when displaying the trimmed image obtained by the trimming operation of the user 180, the output device 140 reproduces an appropriate voice such as a representative voice of the object included in the trimmed image in synchronization with the trimmed image. Can do.

なお、出力画像生成部１０７５は、画像格納部１０１０が格納している画像における部分領域範囲取得部１０２０が取得した部分領域の範囲の画像と、画像格納部１０１０が格納している他の画像とを合成して出力画像を生成してよい。この場合、出力音声生成部１０７０は、部分領域範囲取得部１０２０が取得した部分領域の範囲が存在する位置である全体画像内位置に対応づけて音声格納部１０６０が格納している音声と、出力画像に含まれる他の画像に対応づけて音声格納部１０６０が格納している音声とから出力音声を生成する。このため、出力装置１４０は、複数の画像を編集して得られる画像を表示する場合に、編集に用いた画像に関連する音声を合成して得られる音声を、編集後の画像の表示に同期して再生することができる。 Note that the output image generation unit 1075 includes an image of the partial area range acquired by the partial area range acquisition unit 1020 in the image stored in the image storage unit 1010, and other images stored in the image storage unit 1010. May be combined to generate an output image. In this case, the output sound generation unit 1070 outputs the sound stored in the sound storage unit 1060 in association with the position in the entire image where the partial region range acquired by the partial region range acquisition unit 1020 exists, and the output An output sound is generated from the sound stored in the sound storage unit 1060 in association with another image included in the image. Therefore, when displaying an image obtained by editing a plurality of images, the output device 140 synchronizes the sound obtained by synthesizing the sound related to the image used for editing with the display of the edited image. Can be played.

音声データベース１０９０は、オブジェクトの種類に対応づけて音声を格納している。そして、音声格納部１０６０は、画像格納部１０１０が格納している画像に対応づけられた位置に存在するオブジェクトの種類に対応づけて音声データベース１０９０が格納している音声を取得して格納する。なお、音声格納部１０６０は、画像格納部１０１０が格納している画像に対応づけられた位置に存在するオブジェクトの種類に対応づけて出力装置１４０の外部の音声データベース１９０が格納している音声を取得して格納してもよい。 The audio database 1090 stores audio in association with object types. Then, the sound storage unit 1060 acquires and stores the sound stored in the sound database 1090 in association with the type of object existing at the position associated with the image stored in the image storage unit 1010. Note that the audio storage unit 1060 stores the audio stored in the audio database 190 outside the output device 140 in association with the type of object existing at the position associated with the image stored in the image storage unit 1010. You may acquire and store.

そして、出力音声生成部１０７０は、出力画像においてより大きい面積を占めるオブジェクトが存在する位置である全体画像内位置及び当該オブジェクトを含む画像に対応づけて音声格納部１０６０が格納している音声を、より強調した出力音声を生成してよい。具体的には、出力音声生成部１０７０は、出力画像においてより大きい面積を占めるオブジェクトが存在する位置である全体画像内位置及び当該オブジェクトを含む画像に対応づけて音声格納部１０６０が格納している音声を、より大きい音量で合成した出力音声を生成してよい。 Then, the output sound generation unit 1070 stores the sound stored in the sound storage unit 1060 in association with the position in the entire image where the object occupying a larger area in the output image and the image including the object are stored. A more emphasized output sound may be generated. Specifically, the output sound generation unit 1070 stores the position in the entire image where the object occupying a larger area in the output image exists and the sound storage unit 1060 in association with the image including the object. You may produce | generate the output audio | voice which synthesize | combined the audio | voice with a louder volume.

また、出力音声生成部１０７０は、出力画像においてより前面に配置された画像内のオブジェクトが存在する位置である全体画像内位置及び当該オブジェクトを含む画像に対応づけて音声格納部１０６０が格納している音声を、より強調した出力音声を生成してよい。具体的には、出力音声生成部１０７０は、出力画像においてより前面に配置された画像内のオブジェクトが存在する位置である全体画像内位置及び当該オブジェクトを含む画像に対応づけて音声格納部１０６０が格納している音声を、より大きい音量で合成した出力音声を生成する。 In addition, the output sound generation unit 1070 is stored in the sound storage unit 1060 in association with the position in the entire image, which is the position where the object in the image arranged in the foreground in the output image exists, and the image including the object. The output voice may be generated with more emphasis on the existing voice. Specifically, the output sound generation unit 1070 is configured so that the sound storage unit 1060 associates the position in the entire image, which is the position where the object in the image arranged in front with the output image exists, and the image including the object. An output sound is generated by synthesizing the stored sound with a larger volume.

なお、出力画像生成部１０７５は、出力画像の生成に用いた複数の画像のそれぞれに対応づけて音声格納部１０６０が格納している複数の音声が、連続して出力される出力音声を生成してよい。この場合、出力音声生成部１０７０は、出力画像においてより大きい面積を占めるオブジェクトが存在する位置である全体画像内位置及び当該オブジェクトを含む画像に対応づけて音声格納部１０６０が格納している音声がより長い時間出力される出力音声を生成してもよい。また、出力音声生成部１０７０は、出力画像においてより前面に配置された画像内のオブジェクトが存在する位置である全体画像内位置に対応づけて音声格納部１０６０が格納している音声がより長い時間出力される出力音声を生成してもよい。 Note that the output image generation unit 1075 generates output sound in which a plurality of sounds stored in the sound storage unit 1060 are associated with each of the plurality of images used to generate the output image and are output continuously. It's okay. In this case, the output sound generation unit 1070 stores the sound stored in the sound storage unit 1060 in association with the position in the entire image where the object occupying a larger area exists in the output image and the image including the object. An output sound that is output for a longer time may be generated. In addition, the output sound generation unit 1070 has a longer time for the sound stored in the sound storage unit 1060 to be associated with the position in the entire image, which is the position where the object in the image arranged in the foreground exists in the output image. An output sound to be output may be generated.

以上説明したように、出力装置１４０は、ユーザ１８０が自由に画像をトリミングすることによって得られたトリミング画像を用いて作成された画像を、当該画像に含まれるオブジェクトの音声とともに再生することができる。このため、トリミングによって除外されたオブジェクトに関連する音声が出力画像とともに再生される等、不適切な音声が再生されることを未然に防ぐことができる。また、出力装置１４０は、合成画像においてより強調されたオブジェクトに関連する音声を強調した音声を、合成画像とともにユーザ１８０に提供することができる。 As described above, the output device 140 can reproduce an image created by using the trimmed image obtained by the user 180 trimming the image freely together with the sound of the object included in the image. . For this reason, it is possible to prevent an inappropriate sound from being reproduced, such as a sound related to an object excluded by trimming being reproduced together with the output image. In addition, the output device 140 can provide the user 180 with the voice that emphasizes the voice related to the object that is more emphasized in the synthesized image, together with the synthesized image.

なお、音声格納部１０６０は、画像格納部１０１０が格納する複数の画像のそれぞれに対応づけて複数の音声を格納してよい。そして、出力画像生成部１０７５は、画像格納部１０１０が格納している複数の画像を合成して出力画像を生成してよい。例えば、出力画像生成部１０７５は、ユーザ１８０が選択した複数の画像を、ユーザ１８０から指定されたレイアウトで配置することによって出力画像を生成する。この場合、出力音声生成部１０７０は、出力画像生成部１０７５が生成した出力画像に含まれる第１画像及び第２画像のそれぞれに対応づけて音声格納部１０６０が格納する第１音声及び第２音声を用いて出力音声を生成する。このとき、出力音声生成部１０７０は、出力画像生成部１０７５が生成した出力画像において第１画像が第２画像より強調されている場合に、第１音声を第２音声より強調して合成した出力音声を生成する。このため、出力装置１４０は、ユーザ１８０が出力画像において強調してレイアウトした画像に関連する音声が強調された出力音声を、出力画像に同期して出力することができる。 Note that the sound storage unit 1060 may store a plurality of sounds in association with each of the plurality of images stored in the image storage unit 1010. Then, the output image generation unit 1075 may generate an output image by combining a plurality of images stored in the image storage unit 1010. For example, the output image generation unit 1075 generates an output image by arranging a plurality of images selected by the user 180 in a layout designated by the user 180. In this case, the output sound generation unit 1070 stores the first sound and the second sound stored in the sound storage unit 1060 in association with each of the first image and the second image included in the output image generated by the output image generation unit 1075. Is used to generate output speech. At this time, when the first image is emphasized from the second image in the output image generated by the output image generation unit 1075, the output sound generation unit 1070 outputs the first sound emphasized from the second sound and synthesized. Generate audio. For this reason, the output device 140 can output the output sound in which the sound related to the image emphasized and laid out by the user 180 in the output image is enhanced in synchronization with the output image.

具体的には、出力音声生成部１０７０は、出力画像生成部１０７５が生成した出力画像において第１画像が第２画像より大きい場合に、第１音声を第２音声より強調して合成した出力音声を生成する。また、出力音声生成部１０７０は、出力画像生成部１０７５が生成した出力画像において第１画像が第２画像より前面にある場合に、第１音声を第２音声より強調して合成した出力音声を生成する。また、出力音声生成部１０７０は、出力画像生成部１０７５が生成した出力画像において第１画像が第２画像より中央に存在する場合に、第１音声を第２音声より強調して合成した出力音声を生成する。なお、出力音声生成部１０７０は、出力画像生成部１０７５が生成した出力画像において第１画像が第２画像より強調されている場合に、第１音声の音量を第２音声の音量より大きく合成した出力音声を生成してよい。 Specifically, the output sound generation unit 1070 outputs and synthesizes the first sound more emphasized than the second sound when the first image is larger than the second image in the output image generated by the output image generation unit 1075. Is generated. Also, the output sound generation unit 1070 generates an output sound obtained by emphasizing and synthesizing the first sound from the second sound when the first image is in front of the second image in the output image generated by the output image generation unit 1075. Generate. Also, the output sound generation unit 1070 outputs the synthesized output by emphasizing the first sound from the second sound when the first image is present in the center of the second image in the output image generated by the output image generation unit 1075. Is generated. The output sound generation unit 1070 synthesizes the volume of the first sound larger than the volume of the second sound when the first image is emphasized from the second image in the output image generated by the output image generation unit 1075. Output speech may be generated.

図１１は、音声データベース１０９０が格納するデータの一例を示す図である。音声データベース１０９０は、オブジェクトの種類及び音声データを格納する。例えば音声データベース１０９０は、犬、鳥、波等のオブジェクトの種類のそれぞれに対応づけて、オブジェクト毎の代表的な音声である犬の鳴き声、鳥の鳴き声、波の音等を格納する。なお、音声データベース１９０は、本図の例における音声データベース１０９０と同様のデータを格納してよい。 FIG. 11 is a diagram illustrating an example of data stored in the voice database 1090. The audio database 1090 stores object types and audio data. For example, the voice database 1090 stores dog calls, bird calls, wave sounds, and the like, which are typical sounds for each object, in association with the types of objects such as dogs, birds, and waves. The voice database 190 may store the same data as the voice database 1090 in the example of this figure.

図１２は、画像格納部１０１０が格納している画像１２００の一例を示す図である。本図の画像１２００を例に挙げて、音声取得部１０５０が音声を取得する場合の動作を説明すると、オブジェクト抽出部１０３０は、画像１２００から、エッジ抽出等によって犬１２１０、鳥１２２０等のオブジェクトの輪郭を抽出する。そして、オブジェクト抽出部１０３０は、犬、鳥等のオブジェクトの種類毎に予め記憶しているオブジェクトのパターンと、抽出した輪郭とのパターンマッチングによって、予め定められた一致度より高く、かつ、最も一致度の高いオブジェクトの種類を特定する。そして、音声取得部１０５０は、特定したオブジェクトの種類に対応づけて音声データベース１０９０又は音声データベース１９０が格納する音声を取得する。 FIG. 12 is a diagram illustrating an example of an image 1200 stored in the image storage unit 1010. Taking the image 1200 of this figure as an example, the operation when the sound acquisition unit 1050 acquires sound will be described. The object extraction unit 1030 detects an object such as a dog 1210 or a bird 1220 from the image 1200 by edge extraction or the like. Extract contours. Then, the object extraction unit 1030 has the highest matching degree that is higher than a predetermined matching degree by pattern matching between the object pattern stored in advance for each type of object such as a dog or bird and the extracted contour. Identify high-quality object types. Then, the voice acquisition unit 1050 acquires the voice stored in the voice database 1090 or the voice database 190 in association with the identified object type.

なお、撮像装置１００の音声取得部９５０は、音声取得部１０５０と同様の動作によって、撮像部９１０が撮像した画像のオブジェクトに対応づけて音声データベース１９０から音声を取得することができる。また、音声取得部９５０は、オブジェクトの種類に対応づけて音声の特徴量を予め記憶しており、記憶している音声の特徴量と、録音部９２０によって録音された音声の特徴量とを比較して、予め定められた一致度より高く、かつ、最も一致度の高い特徴量を持つ音声を、録音部９２０によって録音された音声から抽出してもよい。なお、音声の特徴量とは、音声の特徴的な周波数スペクトルであったり、当該周波数スペクトルの特徴的な時間変化パターンであってよい。 Note that the sound acquisition unit 950 of the imaging apparatus 100 can acquire sound from the sound database 190 in association with the object of the image captured by the image capturing unit 910 by the same operation as the sound acquisition unit 1050. Also, the voice acquisition unit 950 stores voice feature values in advance in association with the object types, and compares the stored voice feature values with the voice feature values recorded by the recording unit 920. Then, a voice having a feature amount higher than a predetermined matching level and having the highest matching level may be extracted from the voice recorded by the recording unit 920. Note that the voice feature amount may be a characteristic frequency spectrum of the voice or a characteristic time change pattern of the frequency spectrum.

図１３は、音声格納部９６０又は音声格納部１０６０が格納するデータの一例を示す。以下、音声格納部１０６０が格納するデータを例に挙げて説明すると、音声格納部１０６０は、撮像部９１０が撮像した画像を識別する画像ＩＤ、当該画像に含まれるオブジェクトの位置である全体画像内位置、及び音声取得部１０５０が取得した音声データを格納する。なお、オブジェクト位置特定部１０４０は、オブジェクト抽出部１０３０が抽出したオブジェクトの重心位置を特定する。そして、音声格納部１０６０は、オブジェクト位置特定部１０４０が特定したオブジェクトの重心位置を、オブジェクトの全体画像内位置として格納する。なお、音声格納部１０６０は、全体画像内位置を示す、画像の横幅及び縦幅の長さに対する相対値を格納してよい。具体的には、音声格納部１０６０は、画像の左下角を座標の原点として、画像の横幅及び縦幅の長さに対する相対的な座標を格納する。 FIG. 13 shows an example of data stored in the voice storage unit 960 or the voice storage unit 1060. Hereinafter, the data stored in the sound storage unit 1060 will be described as an example. The sound storage unit 1060 includes an image ID that identifies an image captured by the image capturing unit 910, and a position of an object included in the image. The position and voice data acquired by the voice acquisition unit 1050 are stored. The object position specifying unit 1040 specifies the position of the center of gravity of the object extracted by the object extracting unit 1030. Then, the sound storage unit 1060 stores the position of the center of gravity of the object specified by the object position specifying unit 1040 as the position in the entire image of the object. Note that the audio storage unit 1060 may store relative values with respect to the horizontal and vertical lengths of the image, which indicate the position within the entire image. Specifically, the audio storage unit 1060 stores relative coordinates with respect to the horizontal and vertical lengths of the image with the lower left corner of the image as the origin of coordinates.

なお、音声格納部１０６０は、全体画像内位置に対応づけることなく、画像に対応づけて音声を格納してよい。本図の例では、音声格納部１０６０は、画像ＩＤ＃ＡＡＡに対応づけて音声データ１３を格納しており、画像ＩＤ＃ＢＢＢに対応づけて音声データ２２を格納している。この場合、音声格納部１０６０は、音声データ１３及び２２に対する全体画像内位置として、全体画像内位置に対応づけられていない旨を示す値（ＮＵＬＬ値等）を格納する。以上、音声格納部１０６０が格納するデータについて説明したが、音声格納部９６０は、音声格納部１０６０が格納するデータと同様のデータを格納してよい。 Note that the sound storage unit 1060 may store sound in association with an image without associating with the position in the entire image. In the example of this figure, the audio storage unit 1060 stores the audio data 13 in association with the image ID #AAA, and stores the audio data 22 in association with the image ID #BBB. In this case, the sound storage unit 1060 stores a value (NULL value or the like) indicating that the position is not associated with the position within the entire image as the position within the entire image with respect to the sound data 13 and 22. Although the data stored in the voice storage unit 1060 has been described above, the voice storage unit 960 may store data similar to the data stored in the voice storage unit 1060.

図１４は、出力画像生成部１０７５が生成する出力画像の一例を示す。本図の例では、出力装置１４０は、画像ＩＤ＃ＡＡＡで識別される画像１２００と、画像ＩＤ＃ＢＢＢで示される画像１４００とから出力画像１４５０を生成して表示する。この例では、出力画像生成部１０７５は、ユーザ１８０からのトリミング指示によって部分領域範囲取得部１０２０が取得した範囲の部分画像１４１１及び部分画像１４１２を、ユーザ１８０の指示で示されるレイアウトで配置して出力画像１４５０を生成する。 FIG. 14 shows an example of an output image generated by the output image generation unit 1075. In the example of this figure, the output device 140 generates and displays an output image 1450 from the image 1200 identified by the image ID #AAA and the image 1400 denoted by the image ID #BBB. In this example, the output image generation unit 1075 arranges the partial image 1411 and the partial image 1412 in the range acquired by the partial region range acquisition unit 1020 according to the trimming instruction from the user 180 in the layout indicated by the user 180 instruction. An output image 1450 is generated.

このとき、出力音声生成部１０７０は、画像１２００及び１４００が出力画像１４５０において含まれる画像の面積を算出する。そして、出力音声生成部１０７０は、画像１２００及び１４００に対応づけて音声格納部１０６０が格納する音声１３及び２２の音量を、それぞれの画像の出力画像１４５０における面積に比例する大きさの音量で合成して出力音声を生成する。これにより、出力画像１４５０において面積の大きい画像１４００（＃ＢＢＢ）を撮像したときに録音された子供の声等を含む音声（音声データ２２）が大きく再生される。したがって、出力画像１４５０に含まれる面積の小さい画像１２００（＃ＡＡＡ）を撮像したときに録音された音声が大きく再生されることがないので、ユーザ１８０は違和感なく出力画像１４５０を鑑賞することができる。 At this time, the output sound generation unit 1070 calculates the area of the image in which the images 1200 and 1400 are included in the output image 1450. Then, the output sound generation unit 1070 synthesizes the sound volumes of the sounds 13 and 22 stored in the sound storage unit 1060 in association with the images 1200 and 1400 with a volume that is proportional to the area of the output image 1450 of each image. To generate output sound. As a result, the sound (audio data 22) including the child's voice recorded when the image 1400 (#BBB) having a large area in the output image 1450 is captured is greatly reproduced. Therefore, since the sound recorded when the image 1200 (#AAA) with a small area included in the output image 1450 is captured is not greatly reproduced, the user 180 can appreciate the output image 1450 without a sense of incongruity. .

また、出力音声生成部１０７０は、出力画像１４５０におけるオブジェクトの面積に応じて、出力音声を生成してもよい。具体的には、出力音声生成部１０７０は、部分領域範囲取得部１０２０によって取得された範囲に含まれるオブジェクト（例えば、犬を示すオブジェクト１４２１、海を示すオブジェクト１４２２等）の、出力画像における面積を算出する。そして、出力音声生成部１０７０は、各オブジェクト１４２１、１４２２の全体画像内位置、及び画像１２００及び１４００の画像ＩＤに対応づけて音声格納部１０６０が格納する音声データ１１及び２１を取得して、取得した音声データ１１及び１２を、オブジェクトの面積に比例した大きさの音量で合成して出力音声を生成する。なお、出力音声生成部１０７０は、出力音声そのものに代えて、出力音声に用いる音声データを識別する識別情報及び音声データの音量の大きさを示す音量情報とを、出力音声として生成してもよい。以上説明したように、出力装置１４０によって出力画像１４５０が表示されるとき、例えば波の音の音声が犬の鳴き声より大きい音量で再生される。このように、ユーザ１８０は、出力装置１４０を用いることによって、ユーザ１８０が自由に編集して作製した画像を、当該画像の画像内容に対して違和感を感じることのない音声とともに鑑賞することができる。 Further, the output sound generation unit 1070 may generate an output sound according to the area of the object in the output image 1450. Specifically, the output sound generation unit 1070 calculates the area in the output image of the objects (for example, the object 1421 indicating the dog, the object 1422 indicating the sea) included in the range acquired by the partial region range acquisition unit 1020. calculate. Then, the output sound generation unit 1070 acquires and acquires the sound data 11 and 21 stored in the sound storage unit 1060 in association with the positions in the entire image of the objects 1421 and 1422 and the image IDs of the images 1200 and 1400. The synthesized audio data 11 and 12 are synthesized at a volume level proportional to the area of the object to generate output audio. Note that the output sound generation unit 1070 may generate, as output sound, identification information for identifying sound data used for the output sound and volume information indicating the volume level of the sound data, instead of the output sound itself. . As described above, when the output image 1450 is displayed by the output device 140, for example, the sound of the wave sound is reproduced at a volume higher than that of the dog. In this way, the user 180 can use the output device 140 to view an image that is freely edited by the user 180 together with a sound that does not give a sense of incongruity to the image content of the image. .

なお、本図において、出力音声生成部１０７０が、出力画像１４５０における画像又はオブジェクトの面積の大きさに応じた音量で音声を合成する場合について説明したが、出力音声生成部１０７０は、面積に大きさの他に、画像又はオブジェクトの出力画像１４５０における配置に応じて音声の合成比率を決定してよい。例えば、出力音声生成部１０７０は、出力画像１４５０の中央との間の距離の逆数に比例する重み付け係数で重み付けされた音声の合成比率で音声を合成してよい。また、出力音声生成部１０７０は、出力画像１４５０においてより前面に配置される画像又はオブジェクトに対応する音声の重み付け係数をより大きくしてよい。なお、音声の合成比率とは、本図に関連して説明したような音量の合成比率であってよいし、音声を再生する時間に対する合成比率であってもよい。また、出力音声生成部１０７０は、出力画像１４５０における画像又はオブジェクトの面積が最も大きい画像又はオブジェクトに対応する音声を、出力音声として生成してよい。その他、出力音声生成部１０７０は、出力画像１４５０における画像又はオブジェクトのうち、最も前面に配置された画像又は画像内のオブジェクトに対応する音声を、出力音声として生成してもよい。 In this figure, the case where the output sound generation unit 1070 synthesizes sound at a volume corresponding to the size of the area of the image or object in the output image 1450 has been described. However, the output sound generation unit 1070 has a large area. In addition, the voice synthesis ratio may be determined according to the arrangement of the image or object in the output image 1450. For example, the output speech generation unit 1070 may synthesize speech with a speech synthesis ratio weighted with a weighting coefficient proportional to the inverse of the distance to the center of the output image 1450. In addition, the output sound generation unit 1070 may increase the sound weighting coefficient corresponding to the image or object arranged in front of the output image 1450. Note that the voice synthesis ratio may be a volume synthesis ratio as described in relation to this drawing, or may be a synthesis ratio with respect to time for reproducing voice. The output sound generation unit 1070 may generate sound corresponding to the image or object having the largest area of the image or object in the output image 1450 as output sound. In addition, the output sound generation unit 1070 may generate sound corresponding to the image arranged in the foreground or the object in the image among the images or objects in the output image 1450 as output sound.

図１５は、撮像装置１００及び出力装置１４０に係るコンピュータ１５００のハードウェア構成の一例を示す。コンピュータ１５００は、ホスト・コントローラ１５８２により相互に接続されるＣＰＵ１５０５、ＲＡＭ１５２０、グラフィック・コントローラ１５７５、及び表示装置１５８０を有するＣＰＵ周辺部と、入出力コントローラ１５８４によりホスト・コントローラ１５８２に接続される通信インターフェイス１５３０、ハードディスクドライブ１５４０、及びＣＤ−ＲＯＭドライブ１５６０を有する入出力部と、入出力コントローラ１５８４に接続されるＲＯＭ１５１０、フレキシブルディスク・ドライブ１５５０、及び入出力チップ１５７０を有するレガシー入出力部とを備える。 FIG. 15 illustrates an example of a hardware configuration of a computer 1500 related to the imaging device 100 and the output device 140. The computer 1500 includes a CPU peripheral unit having a CPU 1505, a RAM 1520, a graphic controller 1575, and a display device 1580 connected to each other by a host controller 1582, and a communication interface 1530 connected to the host controller 1582 by an input / output controller 1584. An input / output unit having a hard disk drive 1540 and a CD-ROM drive 1560, and a legacy input / output unit having a ROM 1510, a flexible disk drive 1550, and an input / output chip 1570 connected to the input / output controller 1584.

ホスト・コントローラ１５８２は、ＲＡＭ１５２０と、高い転送レートでＲＡＭ１５２０をアクセスするＣＰＵ１５０５、及びグラフィック・コントローラ１５７５とを接続する。ＣＰＵ１５０５は、ＲＯＭ１５１０、及びＲＡＭ１５２０に格納されたプログラムに基づいて動作し、各部の制御を行う。グラフィック・コントローラ１５７５は、ＣＰＵ１５０５等がＲＡＭ１５２０内に設けたフレーム・バッファ上に生成する画像データを取得し、表示装置１５８０上に表示させる。これに代えて、グラフィック・コントローラ１５７５は、ＣＰＵ１５０５等が生成する画像データを格納するフレーム・バッファを、内部に含んでもよい。 The host controller 1582 connects the RAM 1520, the CPU 1505 that accesses the RAM 1520 at a high transfer rate, and the graphic controller 1575. The CPU 1505 operates based on programs stored in the ROM 1510 and the RAM 1520 and controls each unit. The graphic controller 1575 acquires image data generated by the CPU 1505 and the like on a frame buffer provided in the RAM 1520 and displays the image data on the display device 1580. Alternatively, the graphic controller 1575 may include a frame buffer that stores image data generated by the CPU 1505 or the like.

入出力コントローラ１５８４は、ホスト・コントローラ１５８２と、比較的高速な入出力装置であるハードディスクドライブ１５４０、通信インターフェイス１５３０、ＣＤ−ＲＯＭドライブ１５６０を接続する。ハードディスクドライブ１５４０は、コンピュータ１５００内のＣＰＵ１５０５が使用するプログラム、及びデータを格納する。通信インターフェイス１５３０は、ネットワークを介して出力装置１４０と通信し、出力装置１４０にプログラム、及びデータを提供する。ＣＤ−ＲＯＭドライブ１５６０は、ＣＤ−ＲＯＭ１５９５からプログラムまたはデータを読み取り、ＲＡＭ１５２０を介してハードディスクドライブ１５４０、及び通信インターフェイス１５３０に提供する。 The input / output controller 1584 connects the host controller 1582 to the hard disk drive 1540, the communication interface 1530, and the CD-ROM drive 1560, which are relatively high-speed input / output devices. The hard disk drive 1540 stores programs and data used by the CPU 1505 in the computer 1500. The communication interface 1530 communicates with the output device 140 via a network, and provides a program and data to the output device 140. The CD-ROM drive 1560 reads a program or data from the CD-ROM 1595 and provides it to the hard disk drive 1540 and the communication interface 1530 via the RAM 1520.

また、入出力コントローラ１５８４には、ＲＯＭ１５１０と、フレキシブルディスク・ドライブ１５５０、及び入出力チップ１５７０の比較的低速な入出力装置とが接続される。ＲＯＭ１５１０は、コンピュータ１５００が起動時に実行するブート・プログラムや、コンピュータ１５００のハードウェアに依存するプログラム等を格納する。フレキシブルディスク・ドライブ１５５０は、フレキシブルディスク１５９０からプログラムまたはデータを読み取り、ＲＡＭ１５２０を介してハードディスクドライブ１５４０、及び通信インターフェイス１５３０に提供する。入出力チップ１５７０は、フレキシブルディスク・ドライブ１５５０や、例えばパラレル・ポート、シリアル・ポート、キーボード・ポート、マウス・ポート等を介して各種の入出力装置を接続する。 The input / output controller 1584 is connected to the ROM 1510, the flexible disk drive 1550, and the relatively low-speed input / output device of the input / output chip 1570. The ROM 1510 stores a boot program executed when the computer 1500 is started up, a program depending on the hardware of the computer 1500, and the like. The flexible disk drive 1550 reads a program or data from the flexible disk 1590 and provides it to the hard disk drive 1540 and the communication interface 1530 via the RAM 1520. The input / output chip 1570 connects various input / output devices via a flexible disk drive 1550 and, for example, a parallel port, a serial port, a keyboard port, a mouse port, and the like.

ＲＡＭ１５２０を介して通信インターフェイス１５３０に提供されるプログラムは、フレキシブルディスク１５９０、ＣＤ−ＲＯＭ１５９５、またはＩＣカード等の記録媒体に格納されて利用者によって提供される。プログラムは、記録媒体から読み出され、ＲＡＭ１５２０を介して通信インターフェイス１５３０に提供され、ネットワークを介して出力装置１４０に送信される。出力装置１４０に送信されたプログラムは出力装置１４０においてインストールされて実行される。 A program provided to the communication interface 1530 via the RAM 1520 is stored in a recording medium such as the flexible disk 1590, the CD-ROM 1595, or an IC card and provided by the user. The program is read from the recording medium, provided to the communication interface 1530 via the RAM 1520, and transmitted to the output device 140 via the network. The program transmitted to the output device 140 is installed and executed in the output device 140.

出力装置１４０にインストールされて実行されるプログラムは、出力装置１４０を、図１から図１４に関連して説明した出力装置１４０として機能させる。また、撮像装置１００にインストールされて実行されるプログラムは、撮像装置１００を、図１から図１４に関連して説明した撮像装置１００として機能させる。 The program installed and executed in the output device 140 causes the output device 140 to function as the output device 140 described with reference to FIGS. The program installed and executed in the imaging apparatus 100 causes the imaging apparatus 100 to function as the imaging apparatus 100 described with reference to FIGS.

以上に示したプログラムは、外部の記憶媒体に格納されてもよい。記憶媒体としては、フレキシブルディスク１５９０、ＣＤ−ＲＯＭ１５９５の他に、ＤＶＤやＰＤ等の光学記録媒体、ＭＤ等の光磁気記録媒体、テープ媒体、ＩＣカード等の半導体メモリ等を用いることができる。また、専用通信ネットワークやインターネットに接続されたサーバシステムに設けたハードディスクまたはＲＡＭ等の記憶装置を記録媒体として使用し、ネットワークを介してプログラムをコンピュータ１５００に提供してもよい。 The program shown above may be stored in an external storage medium. As the storage medium, in addition to the flexible disk 1590 and the CD-ROM 1595, an optical recording medium such as a DVD or PD, a magneto-optical recording medium such as an MD, a tape medium, a semiconductor memory such as an IC card, or the like can be used. Further, a storage device such as a hard disk or a RAM provided in a server system connected to a dedicated communication network or the Internet may be used as a recording medium, and the program may be provided to the computer 1500 via the network.

以上、実施形態を用いて本発明を説明したが、本発明の技術的範囲は上記実施形態に記載の範囲には限定されない。上記実施形態に、多様な変更又は改良を加えることができる。そのような変更又は改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. Various modifications or improvements can be added to the above embodiment. It is apparent from the scope of the claims that the embodiments added with such changes or improvements can be included in the technical scope of the present invention.

音声出力システムの一例を示す図である。It is a figure which shows an example of an audio | voice output system. 出力装置１４０のブロック構成の一例を示す図である。3 is a diagram illustrating an example of a block configuration of an output device 140. FIG. 音声格納部２２０が格納するデータの一例をテーブル形式で示す図である。It is a figure which shows an example of the data which the audio | voice storage part 220 stores in a table format. 音声の録音と画像の撮像との時間関係の一例を示す図である。It is a figure which shows an example of the time relationship between audio | voice recording and the imaging of an image. 音声格納部２２０が格納するデータの他の一例をテーブル形式で示す図である。It is a figure which shows another example of the data which the audio | voice storage part 220 stores in a table format. 目標回数格納部２３２が格納するデータの一例をテーブル形式で示す図である。It is a figure which shows an example of the data which the target frequency storage part 232 stores in a table format. 出力比率格納部２３４が格納するデータの一例をテーブル形式で示す図である。It is a figure which shows an example of the data which the output ratio storage part 234 stores in a table format. 音声出力制御部２２２が音声を選択する時間範囲の一例を示す図である。It is a figure which shows an example of the time range which the audio | voice output control part 222 selects an audio | voice. 撮像装置１００のブロック構成の一例を示す図である。1 is a diagram illustrating an example of a block configuration of an imaging apparatus 100. FIG. 出力装置１４０の他の実施例におけるブロック構成を示す図である。It is a figure which shows the block structure in the other Example of the output device. 音声データベース１０９０が格納するデータの一例を示す図である。It is a figure which shows an example of the data which the audio | voice database 1090 stores. 画像格納部１０１０が格納する画像の一例を示す図である。It is a figure which shows an example of the image which the image storage part 1010 stores. 音声格納部９６０又は音声格納部１０６０が格納するデータの一例を示す図である。It is a figure which shows an example of the data which the audio | voice storage part 960 or the audio | voice storage part 1060 stores. 出力画像生成部１０７５が生成する出力画像の一例を示す図である。It is a figure which shows an example of the output image which the output image generation part 1075 produces | generates. 撮像装置１００及び出力装置１４０に係るコンピュータ１５００のハードウェア構成の一例を示す図である。FIG. 3 is a diagram illustrating an example of a hardware configuration of a computer 1500 related to the imaging device 100 and the output device 140.

Explanation of symbols

１００撮像装置
１４０出力装置
１５０通信回線
１８０ユーザ
１９０音声データベース
２１０画像格納部
２１２画像出力制御部
２１４画像出力部
２２０音声格納部
２２２音声出力制御部
２２４音声出力部
２３０出力回数保持部
２３２目標回数格納部
２３４出力比率格納部
２３６制限回数格納部
２４０出力指示受付部
２４２出力許容時間設定部
２４４出力時刻検出部
９１０撮像部
９２０録音部
９３０オブジェクト抽出部
９４０オブジェクト位置特定部
９５０音声取得部
９６０音声格納部
１０１０画像格納部
１０２０部分領域範囲取得部
１０３０オブジェクト抽出部
１０４０オブジェクト位置特定部
１０５０音声取得部
１０６０音声格納部
１０７０出力音声生成部
１０７５出力画像生成部
１０８０画像出力部
１０９０音声データベース
100 imaging device 140 output device 150 communication line 180 user 190 audio database 210 image storage unit 212 image output control unit 214 image output unit 220 audio storage unit 222 audio output control unit 224 audio output unit 230 output number holding unit 232 target number storage unit 234 Output ratio storage unit 236 Limit number of times storage unit 240 Output instruction reception unit 242 Output allowable time setting unit 244 Output time detection unit 910 Imaging unit 920 Recording unit 930 Object extraction unit 940 Object position specification unit 950 Audio acquisition unit 960 Audio storage unit 1010 Image storage unit 1020 Partial region range acquisition unit 1030 Object extraction unit 1040 Object position specification unit 1050 Audio acquisition unit 1060 Audio storage unit 1070 Output audio generation unit 1075 Output image generation unit 1080 Image output unit 1090 Audio data Database

Claims

An image storage unit for storing a plurality of captured images;
An image output unit for outputting an image stored in the image storage unit;
An image output control unit that causes the image output unit to output an image stored in the image storage unit;
A voice storage unit for storing a plurality of recorded voices;
An audio output unit for outputting the audio stored in the audio storage unit;
When the image output unit is outputting an image, a first sound is selected from a plurality of sounds stored in the sound storage unit and is output to the sound output unit, and the image output unit is When outputting the same image again, an audio output for selecting the second audio different from the first audio from the plurality of audios stored in the audio storage unit and outputting the selected second audio to the audio output unit An output device comprising a control unit.

The output device according to claim 1, wherein the sound storage unit stores a plurality of sounds recorded by a recording function of an imaging device that has captured a plurality of images stored in the image storage unit.

The sound storage unit captures the sound recorded by the recording function when a plurality of images stored in the image storage unit are captured, and the plurality of images stored in the image storage unit. The output device according to claim 2, wherein the sound recorded by the recording function when there is not is stored.

The sound storage unit includes an imaging period that is a period including a plurality of times when the plurality of images stored in the image storage unit are respectively captured, and was recorded in a recording period that is longer than the imaging period. The output device according to claim 2 which stores a plurality of voices.

The total time of the plurality of sounds stored in the sound storage unit is a preset output time for the image output unit to output one image. The output device according to claim 1, wherein the output device is longer than a time multiplied by a number.

The image storage unit stores information indicating timing at which the plurality of images are captured in association with each of the plurality of images,
The sound storage unit stores information indicating timings when the plurality of sounds are recorded in association with the plurality of sounds, respectively.
The output device according to claim 1, wherein the audio output control unit selects audio in the order in which recording timing is closer to timing at which an image is captured.

The image storage unit stores times when the plurality of images are captured in association with the plurality of images,
The voice storage unit stores the times at which the plurality of voices were recorded in association with each of the plurality of voices,
The output device according to claim 6, wherein the sound output control unit selects the sound in the order in which the recorded time is closer to the time when the image is captured.

The output device according to claim 1, wherein the sound output control unit selects sound in descending order of volume.

An output number holding unit that counts and holds the number of times that the plurality of sounds stored in the sound storage unit are output to the sound output unit;
A target number storage unit that stores a target number of times that a plurality of sounds stored in the sound storage unit should be output to the sound output unit;
The output device according to claim 1, wherein the voice output control unit selects voices in descending order of a value obtained by subtracting the number of outputs from the target number of times.

An output number holding unit that counts and holds the number of times that the plurality of sounds stored in the sound storage unit are output to the sound output unit;
An output ratio storage unit that stores an output ratio that is a ratio of the number of times that the plurality of voices stored in the voice storage unit should be output to the voice output unit;
The output device according to claim 1, wherein the sound output control unit selects a sound such that a ratio of the number of outputs held by the output number holding unit approaches an output ratio stored in the output ratio storage unit.

The sound storage unit is configured in both the imaging mode, which is an operation mode in which the imaging device accepts an imaging operation, and the non-imaging mode, which is an operation mode in which the imaging device does not accept an imaging operation. Each of the multiple voices recorded by the recording function is stored in association with the operation mode when the voice is recorded,
The output device according to claim 2, wherein the audio output control unit preferentially selects audio recorded when the imaging device is in an imaging mode over audio recorded when the imaging device is in a non-imaging mode.

When the voice stored in the voice storage unit is output from the voice output unit and the output of the voice is limited, the voice storage unit further includes a limited number storage unit that counts and stores the limited number of times. ,
The output device according to claim 1, wherein the sound output control unit preferentially selects a sound with a smaller number of times stored by the limit number storage unit.

An output instruction receiving unit that receives an instruction to output a plurality of images stored in the image storage unit to the image output unit;
An output time detection unit that detects a time at which the output instruction reception unit has received the instruction;
The image storage unit stores the time when the plurality of images are captured in association with each of the plurality of images.
The voice storage unit stores the time when the plurality of voices were recorded in association with each of the plurality of voices,
The audio output control unit stores the image storage unit based on a difference between a time detected by the output time detection unit and a time when a plurality of images stored in the image storage unit are captured. The output device according to claim 1, wherein an allowable range of a difference between a time at which a plurality of images are captured and a time at which a sound selected from a plurality of sounds stored in the sound storage unit is recorded is set.

The audio output control unit stores the image storage unit when the difference between the time detected by the output time detection unit and the time when the plurality of images stored in the image storage unit are captured is larger. 14. The allowable range of a difference between a time when a plurality of images captured is recorded and a time when a sound selected from a plurality of sounds stored in the sound storage unit is recorded is set larger. Output device.

An image storage stage for storing a plurality of captured images;
An image output stage for outputting an image stored in the image storage stage;
An image output control step for outputting the image stored in the image storage step in the image output step;
A voice storage stage for storing a plurality of recorded voices;
An audio output stage for outputting the audio stored in the audio storage stage;
When an image is output in the image output step, a first sound is selected from a plurality of sounds stored in the sound storage step and is output in the sound output step, and the image output step is the same A sound output control step of selecting a second sound different from the first sound from the plurality of sounds stored in the sound storage step and outputting the second image in the sound output step when the image is output again An output method comprising:

An output device program for outputting an image, an image storage unit for storing a plurality of images captured by the output device;
An image output unit for outputting an image stored in the image storage unit;
An image output control unit that causes the image output unit to output an image stored in the image storage unit;
A voice storage unit for storing a plurality of recorded voices;
An audio output unit for outputting the audio stored in the audio storage unit;
When the image output unit is outputting an image, a first sound is selected from a plurality of sounds stored in the sound storage unit and is output to the sound output unit, and the image output unit is When outputting the same image again, an audio output for selecting the second audio different from the first audio from the plurality of audios stored in the audio storage unit and outputting the selected second audio to the audio output unit A program that functions as a control unit.

An imaging unit;
An object extraction unit that extracts an object included in the image captured by the imaging unit;
A sound acquisition unit that acquires sound related to the object extracted by the object extraction unit;
An object position specifying unit for specifying the position of an object related to the sound acquired by the sound acquisition unit in the image captured by the image capturing unit;
An imaging apparatus comprising: a sound storage unit that stores the sound acquired by the sound acquisition unit in association with the position of the object specified by the object position specifying unit.

A recording unit for recording sound around the imaging unit;
The voice acquisition unit extracts the voice related to the object extracted by the object extraction unit from the voice recorded by the recording unit,
The object position specifying unit specifies a position of an object related to the sound extracted by the sound acquisition unit in the image picked up by the image pickup unit,
The imaging device according to claim 17, wherein the sound storage unit stores the sound extracted by the sound acquisition unit in association with the position of the object specified by the object position specifying unit.

Imaging stage;
An object extraction stage for extracting an object included in the image captured in the imaging stage;
A sound acquisition step of acquiring sound related to the object extracted in the object extraction step;
An object position specifying step for specifying a position of an object related to the sound acquired in the sound acquisition step in the image picked up in the image pickup step;
An imaging method comprising: a sound storing step of storing the sound acquired in the sound acquiring step in association with the position of the object specified in the object position specifying step.

A program for an imaging device that captures an image, the imaging device being an imaging unit,
An object extraction unit that extracts an object included in the image captured by the imaging unit;
A sound acquisition unit for acquiring sound related to the object extracted by the object extraction unit;
An object position specifying unit for specifying the position of an object related to the sound acquired by the sound acquisition unit in the image captured by the image capturing unit;
A program that functions as a sound storage unit that stores the sound acquired by the sound acquisition unit in association with the position of the object specified by the object position specifying unit.

An image storage unit for storing images;
An audio storage unit that stores audio in association with an image stored in the image storage unit and a position in the image;
A partial region range acquisition unit for acquiring a range of a partial region including at least a part of the image stored in the image storage unit;
An output image generation unit that generates an output image from an image of a partial region range acquired by the partial region range acquisition unit in the image stored in the image storage unit;
Output from the sound stored in the sound storage unit in association with the position in the entire image where the partial region range acquired by the partial region range acquisition unit exists in the image stored in the image storage unit An output sound generation unit for generating sound;
An image output unit that outputs the output image and the output sound in association with each other so that the output image generated by the output image generation unit and the output sound generated by the output sound generation unit are output in synchronization with each other; Output device provided.

The output image generation unit synthesizes the image of the partial region range acquired by the partial region range acquisition unit in the image stored in the image storage unit with another image stored in the image storage unit. To generate an output image,
The output sound generation unit associates the sound stored in the sound storage unit with the position in the entire image that is the position where the range of the partial region acquired by the partial region range acquisition unit exists, and the output image Generating an output sound from the sound stored in the sound storage unit in association with another image included;
The image output unit outputs the output image and the output sound in association with each other so that the output image generated by the output image generation unit and the output sound generated by the output sound generation unit are output in synchronization with each other. The output device according to claim 21.

A voice database that stores voices in association with object types;
23. The voice storage unit acquires and stores the voice stored in the voice database in association with the type of object existing at a position associated with the image stored in the image storage unit. Output device according to.

The output sound generation unit further includes the sound stored in the sound storage unit in association with the position in the entire image where the object occupying a larger area in the output image and the image including the object are stored. 24. The output device according to claim 23, wherein the output sound is emphasized.

The output sound generation unit further includes the sound stored in the sound storage unit in association with the position in the entire image where the object occupying a larger area in the output image and the image including the object are stored. The output device according to claim 24, wherein the output sound is synthesized with a high volume.

The output sound generation unit stores the position in the entire image, which is the position where the object in the image arranged in front of the output image exists, and the image including the object in association with the image. 25. The output device according to claim 24, wherein an output sound in which the sound is emphasized is generated.

The output sound generation unit stores the position in the entire image, which is the position where the object in the image arranged in front of the output image exists, and the image including the object in association with the image. The output device according to claim 24, wherein an output sound is generated by synthesizing the sound at a larger volume.

An image storage stage for storing images;
A sound storing step of storing sound in association with the image stored in the image storing step and the position in the image;
A partial region range acquisition step of acquiring a partial region range including at least a part of the image stored in the image storage step;
An output image generation step of generating an output image from an image of the partial region range acquired in the partial region range acquisition step in the image stored in the image storage step;
In the image stored in the image storage stage, from the voice stored in the voice storage stage in association with the position in the entire image, which is the position where the range of the partial area acquired in the partial area range acquisition stage exists. An output sound generation stage for generating output sound;
An image output step of outputting the output image and the output sound in association with each other so that the output image generated in the output image generation step and the output sound generated in the output sound generation step are output in synchronization. An output method comprising:

A program for an output device for outputting an image, wherein the output device is
An image storage unit for storing images,
An audio storage unit for storing audio in association with an image stored in the image storage unit and a position in the image;
A partial region range acquisition unit for acquiring a range of a partial region including at least a part of the image stored in the image storage unit;
An output image generation unit that generates an output image from an image of a range of the partial region acquired by the partial region range acquisition unit in the image stored in the image storage unit;
Output from the sound stored in the sound storage unit in association with the position in the entire image where the partial region range acquired by the partial region range acquisition unit exists in the image stored in the image storage unit An output sound generator for generating sound;
An image output unit that outputs the output image and the output sound in association with each other so that the output image generated by the output image generation unit and the output sound generated by the output sound generation unit are output in synchronization with each other. A program to function.

An image storage unit for storing a plurality of images;
A sound storage unit that stores a plurality of sounds in association with each of a plurality of images stored in the image storage unit;
An output image generation unit that generates an output image by combining a plurality of images stored in the image storage unit;
The output sound is generated using the first sound and the second sound stored in the sound storage unit in association with each of the first image and the second image included in the output image generated by the output image generation unit. An output audio generation unit;
An image output unit that outputs the output image and the output sound in association with each other so that the output image generated by the output image generation unit and the output sound generated by the output sound generation unit are output in synchronization with each other; Prepared,
The output sound generation unit synthesizes the first sound with emphasis over the second sound when the first image is emphasized over the second image in the output image generated by the output image generation unit. An output device that generates output audio.

The output sound generation unit, when the first image is larger than the second image in the output image generated by the output image generation unit, outputs the synthesized output sound by emphasizing the first sound from the second sound. The output device according to claim 30, wherein the output device is generated.

The output sound generation unit is an output obtained by emphasizing and synthesizing the first sound from the second sound when the first image is in front of the second image in the output image generated by the output image generation unit. The output device according to claim 30, wherein the output device generates sound.

The output sound generation unit synthesizes the first sound with emphasis from the second sound when the first image is present in the center of the second image in the output image generated by the output image generation unit. The output device according to claim 30, wherein the output device generates output sound.

The output sound generation unit increases a volume of the first sound higher than a volume of the second sound when the first image is emphasized from the second image in the output image generated by the output image generation unit. The output device according to claim 30, wherein the output device generates synthesized output speech.

An image storage stage for storing a plurality of images;
A sound storing step of storing a plurality of sounds in association with each of the plurality of images stored in the image storing step;
An output image generation step of generating an output image by combining a plurality of images stored in the image storage step;
Output sound is generated using the first sound and the second sound stored in the sound storing step in association with each of the first image and the second image included in the output image generated in the output image generating step. An output audio generation stage,
An image output step of outputting the output image and the output sound in association with each other so that the output image generated in the output image generation step and the output sound generated in the output sound generation step are output in synchronization. And
In the output sound generation step, when the first image is emphasized from the second image in the output image generated in the output image generation step, the first sound is emphasized from the second sound and synthesized. Output method to generate the output audio.

A program for an output device for outputting an image, wherein the output device is
An image storage unit for storing a plurality of images;
A sound storage unit that stores a plurality of sounds in association with each of a plurality of images stored in the image storage unit;
An output image generation unit that generates an output image by combining a plurality of images stored in the image storage unit;
Output sound for generating output sound using the first sound and the second sound stored in the sound storage unit in association with each of the first image and the second image included in the output image generated by the output image generation unit Generator,
Functions as an image output unit that outputs the output image and the output sound in association with each other so that the output image generated by the output image generation unit and the output sound generated by the output sound generation unit are output in synchronization. Let
When the first image is emphasized from the second image in the output image generated by the output image generation unit, the first sound is emphasized from the second sound and synthesized with the output sound generation unit. A program that generates output audio.