JP2006304257A

JP2006304257A - Image pickup device, image pickup method, output device, and output method and program

Info

Publication number: JP2006304257A
Application number: JP2005361061A
Authority: JP
Inventors: Ichigaku Mino; 一学三野
Original assignee: Fuji Photo Film Co Ltd
Current assignee: Fujifilm Holdings Corp
Priority date: 2005-03-25
Filing date: 2005-12-14
Publication date: 2006-11-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image pickup device or an output device capable of easily obtaining music being played when an image is picked up as well as the image. <P>SOLUTION: The image pickup device includes: an image pickup section; a recording section for recording sound around the image pickup section; a characteristic sound extraction section for extracting predetermined type of sound from the sound recorded by the recording section; a sound acquisition section for acquiring the same type of sound as the sound extracted by the characteristic sound extraction section from a sound database storing a plurality of types of sound; and a data storage section for storing the sound acquired by the sound acquisition section and an image picked up by the image pickup section while they are associated with each other to be outputted in synchronization. The characteristic sound extraction section may extract predetermined sound from the sound recorded by the recording section within preset time from time when the image pickup section has picked up the image. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、撮像装置、撮像方法、出力装置、及びプログラムに関する。特に本発明は、画像を撮像する撮像装置及び撮像方法、並びに画像を出力する出力装置及び出力方法、並びに当該撮像装置及び当該出力装置用のプログラムに関する。 The present invention relates to an imaging device, an imaging method, an output device, and a program. In particular, the present invention relates to an imaging device and an imaging method for capturing an image, an output device and an output method for outputting an image, and a program for the imaging device and the output device.

従来、静止画だけでなく動画もメモリカードに記録することができるデジタルスチルカメラがある。このようなデジタルスチルカメラでは、静止画や動画の撮影記録時にマイクロホンで検出した音声を画像に対応させて記録できる（例えば、特許文献１参照。）。また、デジタルスチルカメラで撮影した静止画や動画を表示しながら、画像に対応させて記録された音声を再生することができる電子フォトスタンドが知られている。
特開平７−１５４７３４号公報 2. Description of the Related Art Conventionally, there are digital still cameras that can record not only still images but also moving images on a memory card. In such a digital still camera, sound detected by a microphone at the time of shooting and recording of a still image or a moving image can be recorded in association with the image (see, for example, Patent Document 1). There is also known an electronic photo stand that can reproduce sound recorded in correspondence with an image while displaying still images and moving images taken with a digital still camera.
JP-A-7-154734

このようなカメラ及び電子フォトスタンドによって、例えば、撮影記録時に周囲で流れていた音楽とともに、撮影した画像を鑑賞することができる。しかし、このようなカメラでは、撮影記録時に流れていた音声しか記録できない。他にも、例えばインターネット、ＣＤ等からデジタルデータとして取得することのできる音楽データ等に比べて、マイクロホンから録音された音楽は低音質である場合が多い。このため、ユーザは再生される音楽に不満を持ってしまう場合がある。また、ユーザにとっては画像と音声の編集等の煩雑な作業をすることなく、撮影した画像と音楽とを簡単に鑑賞することができることが望ましい。また、音楽の他にも、波の音、鳥の鳴き声等、撮像時の環境音と同種の音声をより高い音質で画像とともに鑑賞することができることが望ましい。 With such a camera and an electronic photo stand, for example, a photographed image can be viewed together with music that has been flowing around during photographing recording. However, such a camera can only record the sound that was flowing at the time of shooting and recording. In addition, music recorded from a microphone often has lower sound quality than music data that can be acquired as digital data from, for example, the Internet or a CD. For this reason, the user may be dissatisfied with the music to be played. In addition, it is desirable for the user to be able to easily appreciate the captured image and music without performing complicated operations such as editing of the image and sound. In addition to music, it is desirable to be able to appreciate the same kind of sound as the environmental sound at the time of imaging, such as the sound of waves and the sound of birds, together with images with higher sound quality.

そこで本発明は、上記の課題を解決することができる撮像装置、撮像方法、出力装置、出力方法、及びプログラムを提供することを目的とする。この目的は特許請求の範囲における独立項に記載の特徴の組み合わせにより達成される。また従属項は本発明の更なる有利な具体例を規定する。 Accordingly, an object of the present invention is to provide an imaging device, an imaging method, an output device, an output method, and a program that can solve the above-described problems. This object is achieved by a combination of features described in the independent claims. The dependent claims define further advantageous specific examples of the present invention.

本発明の第１の形態における撮像装置は、撮像部と、撮像部の周囲の音声を録音する録音部と、録音部が録音した音声から予め定められた種類の音声を抽出する特徴音抽出部と、複数の種類の音声を格納する音声データベースから、特徴音抽出部が抽出した音声と同一の種類の音声を取得する音声取得部と、音声取得部が取得した音声と撮像部が撮像した画像とを同期して出力させるべく対応づけて格納するデータ格納部とを備える。 An imaging apparatus according to a first aspect of the present invention includes an imaging unit, a recording unit that records audio around the imaging unit, and a feature sound extraction unit that extracts a predetermined type of audio from the audio recorded by the recording unit And an audio acquisition unit that acquires the same type of audio as the audio extracted by the feature sound extraction unit from an audio database that stores a plurality of types of audio, and the audio acquired by the audio acquisition unit and the image captured by the imaging unit And a data storage unit that stores the data in association with each other so as to output them synchronously.

特徴音抽出部は、撮像部が画像を撮像した時刻から予め設定された時間内に、録音部が録音した音声から予め定められた種類の音声を抽出してよい。 The feature sound extraction unit may extract a predetermined type of sound from the sound recorded by the recording unit within a preset time from the time when the image capturing unit captures the image.

撮像部が有する受光素子により受光した光の画像を表示する表示部と、表示部が画像を表示している状態の動作モードである撮像モード、又は表示部が画像を表示していない状態の動作モードである非撮像モードに当該撮像装置を設定するモード設定部とをさらに備え、録音部は、モード設定部が撮像モードに設定している場合、及びモード設定部が非撮像モードに設定している場合の双方において、撮像部の周囲の音声を録音してよい。 A display unit that displays an image of light received by the light receiving element of the imaging unit and an imaging mode that is an operation mode in which the display unit displays an image, or an operation in which the display unit does not display an image A mode setting unit that sets the imaging apparatus to the non-imaging mode that is the mode, and the recording unit sets the imaging mode when the mode setting unit is set to the imaging mode, and the mode setting unit sets the imaging mode to the non-imaging mode. In both cases, sound around the imaging unit may be recorded.

特徴音抽出部は、モード設定部が撮像モードに設定している時間を含み、モード設定部が撮像モードに設定している時間より長い、予め設定された時間内に、録音部が録音した音声から予め定められた種類の音声を抽出してよい。音声データベースは、複数の音楽を格納しており、特徴音抽出部は、録音部が録音した音声から音楽を抽出し、音声取得部は、音声データベースから、特徴音抽出部が抽出した音楽と同一の音楽を取得してよい。 The feature sound extraction unit includes the time set by the mode setting unit in the imaging mode, and is recorded by the recording unit within a preset time longer than the time set by the mode setting unit in the imaging mode. A predetermined type of sound may be extracted from The voice database stores a plurality of music, the feature sound extraction unit extracts music from the voice recorded by the recording unit, and the voice acquisition unit is the same as the music extracted by the feature sound extraction unit from the voice database. May get music.

特徴音抽出部が抽出する環境音のそれぞれの種類を特定する条件を予め格納する条件格納部をさらに備え、音声データベースは、環境音の種類別に複数の環境音を格納しており、特徴音抽出部は、条件格納部が格納する条件に一致する環境音を、録音部が録音した音声から抽出し、音声取得部は、音声データベースから、特徴音抽出部が抽出した環境音と同一の種類の環境音を取得し、データ格納部は、音声取得部が取得した環境音と撮像部が撮像した画像とを同期して出力させるべく対応づけて格納してよい。 The feature sound extraction unit further includes a condition storage unit that preliminarily stores a condition for specifying each type of environmental sound extracted by the feature sound extraction unit, and the voice database stores a plurality of environmental sounds for each type of environmental sound. The unit extracts the environmental sound that matches the conditions stored in the condition storage unit from the voice recorded by the recording unit, and the voice acquisition unit extracts the same type of environmental sound from the voice database as the environmental sound extracted by the feature sound extraction unit. The environmental sound may be acquired, and the data storage unit may store the environmental sound acquired by the sound acquisition unit and the image captured by the imaging unit in association with each other so as to be output in synchronization.

音声データベースは、時代別に複数の音楽を格納しており、特徴音抽出部は、録音部が録音した音声から音楽を抽出し、音声取得部は、音声データベースから、特徴音抽出部が抽出した音楽と同じ時代の音楽を取得してよい。音声データベースは、ジャンル別に複数の音楽を格納しており、音声取得部は、音声データベースから、特徴音抽出部が抽出した音楽と同じジャンルの音楽を取得してよい。 The voice database stores a plurality of music by era, the feature sound extraction unit extracts music from the voice recorded by the recording unit, and the voice acquisition unit extracts the music extracted by the feature sound extraction unit from the voice database. May get music of the same era. The audio database stores a plurality of music by genre, and the audio acquisition unit may acquire music of the same genre as the music extracted by the feature sound extraction unit from the audio database.

本発明の第２の形態における撮像方法は、画像を撮像部により撮像する撮像段階と、撮像部の周囲の音声を録音する録音段階と、録音段階において録音された音声から予め定められた種類の音声を抽出する特徴音抽出段階と、複数の種類の音声を格納する音声データベースから、特徴音抽出段階において抽出された音声と同一の種類の音声を取得する音声取得段階と、音声取得段階において取得された音声と撮像部が撮像した画像とを同期して出力させるべく対応づけて格納するデータ格納段階とを備える。 The imaging method according to the second aspect of the present invention includes an imaging stage in which an image is captured by an imaging unit, a recording stage in which sound around the imaging section is recorded, and a predetermined type of sound recorded in the recording stage. Acquired in a feature sound extraction stage for extracting sound, a sound acquisition stage for acquiring the same type of sound as the sound extracted in the feature sound extraction stage, and a sound acquisition stage from a sound database storing a plurality of kinds of sounds A data storage stage for storing the audio and the image captured by the imaging unit in association with each other so as to be output in synchronization.

本発明の第３の形態によると、画像を撮像する撮像装置用のプログラムであって、撮像装置を、画像を撮像する撮像部、撮像部の周囲の音声を録音する録音部、録音部が録音した音声から予め定められた種類の音声を抽出する特徴音抽出部、複数の種類の音声を格納する音声データベースから、特徴音抽出部が抽出した音声と同一の種類の音声を取得する音声取得部、音声取得部が取得した音声と撮像部が撮像した画像とを同期して出力させるべく対応づけて格納するデータ格納部として機能させる。 According to a third aspect of the present invention, there is provided a program for an imaging device that captures an image, the imaging device including an imaging unit that captures an image, a recording unit that records sound around the imaging unit, and a recording unit A feature sound extraction unit for extracting a predetermined type of sound from the obtained sound, and a sound acquisition unit for acquiring the same type of sound as the sound extracted by the feature sound extraction unit from a sound database storing a plurality of types of sound The voice acquired by the voice acquisition unit and the image captured by the imaging unit are made to function as a data storage unit that stores them in association with each other so as to be output in synchronization.

本発明の第４の形態における出力装置は、撮像装置によって撮像された画像を格納する画像格納部と、撮像装置によって録音された音声を格納する音声格納部と、音声格納部が格納する音声から予め定められた種類の音声を抽出する特徴音抽出部と、複数の種類の音声を格納する音声データベースから、特徴音抽出部が抽出した音声と同一の種類の音声を取得する音声取得部と、音声取得部が取得した音声と画像格納部が格納する画像とを同期して出力する出力部とを備える。 An output device according to a fourth aspect of the present invention includes an image storage unit that stores an image captured by an imaging device, an audio storage unit that stores audio recorded by the imaging device, and audio stored in the audio storage unit. A feature sound extraction unit that extracts a predetermined type of sound, and a sound acquisition unit that acquires the same type of sound as the sound extracted by the feature sound extraction unit from a sound database that stores a plurality of types of sound; And an output unit that outputs the sound acquired by the sound acquisition unit and the image stored in the image storage unit in synchronization.

画像格納部は、画像に対応づけて当該画像の撮像時刻を格納しており、音声格納部は、音声に対応づけて当該音声の録音時刻を格納しており、特徴音抽出部は、画像が撮像された時刻から予め設定された許容時間内に録音された音声から予め定められた種類の音声を抽出してよい。 The image storage unit stores the imaging time of the image in association with the image, the audio storage unit stores the recording time of the audio in association with the audio, and the feature sound extraction unit stores the image A predetermined type of sound may be extracted from sound recorded within a preset allowable time from the time when the image was taken.

画像格納部が格納する画像の出力要求を取得する出力要求取得部と、出力要求取得部が出力要求を取得した時刻と、画像格納部が格納する画像の撮像時刻との差がより大きい場合に、許容時間をより長く設定する許容時間設定部とをさらに備えてよい。 When the difference between the output request acquisition unit that acquires the output request of the image stored in the image storage unit, the time when the output request acquisition unit acquires the output request, and the imaging time of the image stored in the image storage unit is larger And an allowable time setting unit for setting the allowable time longer.

本発明の第５の形態における出力方法は、撮像装置によって撮像された画像を格納する画像格納段階と、撮像装置によって録音された音声を格納する音声格納段階と、音声格納段階において格納される音声から予め定められた種類の音声を抽出する特徴音抽出段階と、複数の音楽を格納する音声データベースから、特徴音抽出段階において抽出された音声と同一の種類の音声を取得する音声取得段階と、音声取得段階において取得された音声と画像格納段階において格納される画像とを同期して出力する出力段階とを備える。 An output method according to a fifth aspect of the present invention includes an image storage stage for storing an image captured by an imaging apparatus, an audio storage stage for storing sound recorded by the imaging apparatus, and an audio stored in the audio storage stage. A feature sound extraction stage for extracting a predetermined type of voice from the voice acquisition stage for acquiring the same type of voice as the voice extracted in the feature sound extraction stage from a voice database storing a plurality of music; An output stage for synchronizing and outputting the audio acquired in the audio acquisition stage and the image stored in the image storage stage.

本発明の第６の形態によると、画像を出力する出力装置用のプログラムであって、出力装置を、画像を撮像する撮像部、撮像部の周囲の音声を録音する録音部、録音部が録音した音声から予め定められた種類の音声を抽出する特徴音抽出部、複数の種類の音声を格納する音声データベースから、特徴音抽出部が抽出した音声と同一の種類の音声を取得する音声取得部、音声取得部が取得した音声と撮像部が撮像した画像とを同期して出力させるべく対応づけて格納するデータ格納部として機能させる。 According to a sixth aspect of the present invention, there is provided a program for an output device that outputs an image, wherein the output device is recorded by an imaging unit that captures an image, a recording unit that records sound around the imaging unit, and a recording unit A feature sound extraction unit for extracting a predetermined type of sound from the obtained sound, and a sound acquisition unit for acquiring the same type of sound as the sound extracted by the feature sound extraction unit from a sound database storing a plurality of types of sound The voice acquired by the voice acquisition unit and the image captured by the imaging unit are made to function as a data storage unit that stores them in association with each other so as to be output in synchronization.

なお上記の発明の概要は、本発明の必要な特徴の全てを列挙したものではなく、これらの特徴群のサブコンビネーションもまた発明となりうる。 Note that the above summary of the invention does not enumerate all the necessary features of the present invention, and sub-combinations of these feature groups can also be the invention.

本発明によれば、画像を撮像したときに流れていた種類の音声を、画像とともに容易に得ることができる撮像装置及び出力装置を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the imaging device and output device which can obtain easily the kind of audio | voice which was flowing when the image was imaged with an image can be provided.

以下、発明の実施形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲に係る発明を限定するものではなく、また実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 Hereinafter, the present invention will be described through embodiments of the invention. However, the following embodiments do not limit the claimed invention, and all combinations of features described in the embodiments are inventions. It is not always essential to the solution.

図１は、本発明の一実施形態に係る音声提供システムの一例を示す。音声提供システムは、撮像装置１００、出力装置１４０、説明音データベース１７０、音楽データベース１７２、及び環境音データベース１７４を有する。この例では、撮像装置１００は、観光地において記念写真を撮像する。撮像装置１００は、撮像装置１００が撮像した画像及び撮像したときの撮像位置を、インターネット等の通信回線１５０を通じて出力装置１４０に送信する。音楽データベース１７２は、複数の音楽を格納している。また、環境音データベース１７４は、環境音の種類別に複数の環境音を格納している。なお、本実施形態における音楽データベース１７２及び環境音データベース１７４は、この発明における音声データベースの一例である。 FIG. 1 shows an example of a voice providing system according to an embodiment of the present invention. The voice providing system includes an imaging device 100, an output device 140, an explanatory sound database 170, a music database 172, and an environmental sound database 174. In this example, the imaging device 100 captures a commemorative photo at a sightseeing spot. The image capturing apparatus 100 transmits the image captured by the image capturing apparatus 100 and the image capturing position when the image capturing is performed to the output apparatus 140 through the communication line 150 such as the Internet. The music database 172 stores a plurality of music. The environmental sound database 174 stores a plurality of environmental sounds for each type of environmental sound. Note that the music database 172 and the environmental sound database 174 in this embodiment are examples of the audio database in the present invention.

出力装置１４０は、撮像装置１００から受け取った撮像位置に関する音声データ、例えば観光地の特色を説明する音声データを説明音データベース１７０から取得して、撮像装置１００から受け取った画像とともに出力する。なお、撮像装置１００は、撮像装置１００の周囲の音声を録音しておき、出力装置１４０は、撮像装置１００で録音された音声を受け取ってもよい。そして出力装置１４０は、当該音声の中から音楽を抽出して、抽出した音楽と同じ音楽を音楽データベース１７２から取得して、画像とともに出力してもよい。また、出力装置１４０は、当該音声の中から波の音、鳥の鳴き声等の環境音を抽出して、抽出した環境音と同じ種類の環境音を環境音データベース１７４から取得して、画像とともに出力してもよい。なお、出力装置１４０は、説明音データベース１７０から取得した音声データと、音楽データベース１７２から取得した音楽又は環境音データベース１７４から取得した環境音とを同時に出力してもよい。 The output device 140 acquires audio data relating to the imaging position received from the imaging device 100, for example, audio data describing the feature of the tourist spot from the explanatory sound database 170, and outputs it together with the image received from the imaging device 100. Note that the imaging apparatus 100 may record voices around the imaging apparatus 100, and the output apparatus 140 may receive voices recorded by the imaging apparatus 100. The output device 140 may extract music from the sound, acquire the same music as the extracted music from the music database 172, and output the music together with the image. Further, the output device 140 extracts environmental sounds such as wave sounds and bird calls from the sound, acquires environmental sounds of the same type as the extracted environmental sounds from the environmental sound database 174, and together with the images. It may be output. Note that the output device 140 may simultaneously output the sound data acquired from the explanatory sound database 170 and the music acquired from the music database 172 or the environmental sound acquired from the environmental sound database 174.

出力装置１４０は、例えば、ＨＤＴＶ、電子フォトスタンド、コンピュータ等の、画像及び音声を出力する装置であってよい。また、出力装置１４０は、音声を文字として出力してもよい。例えば、出力装置１４０は、液晶等の表示デバイスに画像を表示するときに、説明音データベース１７０から取得した音声及び／又は音楽データベース１７２から取得した音楽データに含まれる歌詞等を文字として表示デバイスに表示させてよい。なお、出力装置１４０は、画像を表示させる表示デバイスに文字を表示させてよく、画像を表示させる表示デバイスとは別の表示デバイスに文字を表示させてもよい。他にも、出力装置１４０は、プリンタ等の画像を印刷する印刷装置であってもよく、画像を印刷するとともに音声を文字として印刷してもよい。 The output device 140 may be a device that outputs images and sounds, such as an HDTV, an electronic photo stand, and a computer. Further, the output device 140 may output the voice as characters. For example, when the output device 140 displays an image on a display device such as a liquid crystal display, the voice acquired from the explanatory sound database 170 and / or the lyrics included in the music data acquired from the music database 172 are displayed on the display device as characters. You may display. Note that the output device 140 may display characters on a display device that displays an image, and may display characters on a display device that is different from the display device that displays an image. In addition, the output device 140 may be a printing device that prints an image, such as a printer, and may print the image and sound as characters.

撮像装置１００は、例えば、デジタルスチルカメラ、カメラ付携帯電話等であってよい。また、撮像装置１００が画像及び音声データを記録媒体に記録して、出力装置１４０は当該記録媒体から画像及び音声データを受け取ってもよい。また、撮像装置１００は、画像及び音声データを、通信回線１５０に接続されたサーバの、ユーザ１８０毎にそれぞれ設けられたディレクトリ、例えば撮像装置１００と関連付けられたディレクトリに格納してもよい。そして出力装置１４０は、ユーザ１８０毎にサーバに格納された画像及び音声データを受け取ってもよい。 The imaging device 100 may be, for example, a digital still camera, a camera-equipped mobile phone, or the like. Further, the image capturing apparatus 100 may record image and sound data on a recording medium, and the output apparatus 140 may receive the image and sound data from the recording medium. In addition, the imaging apparatus 100 may store the image and audio data in a directory provided for each user 180 of the server connected to the communication line 150, for example, a directory associated with the imaging apparatus 100. The output device 140 may receive image and sound data stored in the server for each user 180.

以上説明した出力装置１４０によれば、撮像装置１００で撮像した画像を、撮像した場所に関する音声とともにユーザ１８０に提供することができる。このため、ユーザ１８０は、観光地等の特色等を思い出しながら、楽しく画像を鑑賞することができる。また、撮像装置１００で撮像したときに周囲で流れていた音楽、周囲の波の音等の環境音等を、ユーザ１８０に提供することができる。このため、ユーザ１８０は、観光地等を訪れたときに流行していた音楽を聴きながら、楽しく画像を鑑賞することができる。 According to the output device 140 described above, the image captured by the imaging device 100 can be provided to the user 180 together with the sound related to the captured location. For this reason, the user 180 can enjoy the image happily while remembering features such as sightseeing spots. In addition, music that has been flowing around when imaged by the imaging apparatus 100, environmental sounds such as ambient sound, and the like can be provided to the user 180. For this reason, the user 180 can enjoy the image happily while listening to the music that was popular when visiting tourist spots.

図２は、出力装置１４０のブロック構成の一例を示す。出力装置１４０は、画像格納部２１０、画像選択部２７８、撮像領域判断部２８２、撮像期間判断部２８４、撮像位置分布算出部２８６、撮像枚数算出部２８８、出力部２２４、及び音声取得部２６２を備える。 FIG. 2 shows an example of a block configuration of the output device 140. The output device 140 includes an image storage unit 210, an image selection unit 278, an imaging region determination unit 282, an imaging period determination unit 284, an imaging position distribution calculation unit 286, an imaging number calculation unit 288, an output unit 224, and an audio acquisition unit 262. Prepare.

画像格納部２１０は、撮像された画像と、当該画像が撮像された位置とを対応づけて格納する。例えば、撮像装置１００は、画像を撮像したときの撮像装置１００の位置における緯度及び経度情報をＧＰＳ衛星から受信する。そして、画像格納部２１０は、撮像装置１００から受け取った画像を、撮像装置１００が検出した緯度及び経度情報と対応づけて格納する。 The image storage unit 210 stores the captured image and the position where the image is captured in association with each other. For example, the imaging device 100 receives latitude and longitude information at the position of the imaging device 100 when an image is captured from a GPS satellite. The image storage unit 210 stores the image received from the imaging device 100 in association with the latitude and longitude information detected by the imaging device 100.

画像選択部２７８は、画像格納部２１０が格納する画像から、ユーザ１８０の指示入力に基づいて複数の画像を選択する。撮像領域判断部２８２は、画像格納部２１０が格納する複数の画像のそれぞれが撮像された複数の撮像位置を含む撮像領域を判断する。 The image selection unit 278 selects a plurality of images from the images stored in the image storage unit 210 based on an instruction input from the user 180. The imaging region determination unit 282 determines an imaging region including a plurality of imaging positions where each of the plurality of images stored in the image storage unit 210 is captured.

具体的には、撮像領域判断部２８２は、画像選択部２７８が選択した複数の画像のそれぞれが撮像された複数の位置を含む撮像領域を判断する。例えば、撮像領域判断部２８２は、画像格納部２１０が格納する複数の画像のそれぞれが撮像された複数の撮像位置を含む撮像領域の地理的な範囲、例えば緯度及び経度の範囲を判断してよい。 Specifically, the imaging region determination unit 282 determines an imaging region including a plurality of positions where each of the plurality of images selected by the image selection unit 278 is captured. For example, the imaging region determination unit 282 may determine a geographical range of an imaging region including a plurality of imaging positions where each of the plurality of images stored in the image storage unit 210 is captured, for example, latitude and longitude ranges. .

撮像位置分布算出部２８６は、撮像領域判断部２８２が判断した撮像領域内における、画像格納部２１０が格納する複数の画像のそれぞれが撮像された複数の位置の分布を算出する。撮像枚数算出部２８８は、撮像領域判断部２８２が判断した撮像領域に含まれる複数の部分領域毎に、画像格納部２１０が格納する複数の画像のそれぞれが撮像された撮像枚数を算出する。 The imaging position distribution calculation unit 286 calculates a distribution of a plurality of positions at which each of a plurality of images stored in the image storage unit 210 in the imaging region determined by the imaging region determination unit 282 is captured. The number of captured images calculation unit 288 calculates the number of captured images in which each of the plurality of images stored in the image storage unit 210 is captured for each of the plurality of partial regions included in the imaging region determined by the imaging region determination unit 282.

音声取得部２６２は、撮像領域判断部２８２が判断した撮像領域の広さに応じて説明音データベース１７０が格納する音声を取得する。具体的には、音声取得部２６２は、撮像領域判断部２８２が判断した撮像領域がより狭い場合に、撮像領域判断部２８２が判断した撮像領域についてのより詳しい説明の音声を取得する。 The voice acquisition unit 262 acquires the voice stored in the explanatory sound database 170 according to the size of the imaging area determined by the imaging area determination unit 282. Specifically, when the imaging area determined by the imaging area determination unit 282 is narrower, the audio acquisition unit 262 acquires a more detailed description of the audio regarding the imaging area determined by the imaging area determination unit 282.

より具体的には、音声取得部２６２は、撮像位置分布算出部２８６が算出した分布の偏りが予め定められた偏りより大きい部分領域についての説明の音声を取得する。また、音声取得部２６２は、撮像枚数算出部２８８が算出した撮像枚数がより多い場合に、説明音データベース１７０が格納する部分領域についての詳しさが異なる複数の説明のうちのより詳しい説明の音声を取得する。 More specifically, the sound acquisition unit 262 acquires a description sound for a partial region in which the distribution bias calculated by the imaging position distribution calculation unit 286 is larger than a predetermined bias. In addition, when the number of captured images calculated by the captured image number calculating unit 288 is larger, the sound acquisition unit 262 has a more detailed description of a plurality of descriptions with different details about partial areas stored in the description sound database 170. To get.

出力部２２４は、画像格納部２１０が格納する複数の画像とともに、音声取得部２６２が取得した音声を出力する。具体的には、出力部２２４は、画像選択部２７８が選択した複数の画像とともに、音声取得部２６２が取得した音声を出力する。 The output unit 224 outputs the sound acquired by the sound acquisition unit 262 together with the plurality of images stored in the image storage unit 210. Specifically, the output unit 224 outputs the sound acquired by the sound acquisition unit 262 together with the plurality of images selected by the image selection unit 278.

画像格納部２１０は、撮像された画像に対応づけて、当該画像が撮像された時刻をさらに格納する。撮像期間判断部２８４は、画像格納部２１０が格納する複数の画像のそれぞれが撮像された複数の時刻を含む撮像期間を判断する。具体的には、撮像期間判断部２８４は、画像選択部２７８が選択した複数の画像が撮像された複数の時刻を含む撮像期間を判断する。 The image storage unit 210 further stores the time when the image was captured in association with the captured image. The imaging period determination unit 284 determines an imaging period including a plurality of times when each of the plurality of images stored in the image storage unit 210 is captured. Specifically, the imaging period determination unit 284 determines an imaging period including a plurality of times when a plurality of images selected by the image selection unit 278 are captured.

そして、音声取得部２６２は、撮像期間判断部２８４が判断した撮像期間の長さにさらに応じて説明音データベース１７０が格納する音声を取得する。具体的には、音声取得部２６２は、撮像期間判断部２８４が判断した撮像期間がより長い場合に、撮像領域判断部２８２が判断した撮像領域についてのより詳しい説明の音声を取得する。 Then, the voice acquisition unit 262 acquires the voice stored in the explanatory sound database 170 according to the length of the imaging period determined by the imaging period determination unit 284. Specifically, when the imaging period determined by the imaging period determination unit 284 is longer, the voice acquisition unit 262 acquires a more detailed explanation voice regarding the imaging region determined by the imaging region determination unit 282.

図３は、説明音データベース１７０が格納するデータの一例をテーブル形式で示す。説明音データベース１７０は、領域に対応づけて、領域の位置する範囲を示す位置範囲及び領域についての音声データを格納する。位置範囲は、例えば、領域が含まれる領域の起点となる緯度及び経度、並びに、終点となる緯度及び経度を含んでよい。なお、位置範囲には、領域の位置する複数の範囲を含んでよい。音声データは、例えば、各領域についての詳しさの異なる複数の音声データ、例えば概要説明、詳細説明の音声データを含んでよい。また、音声データには、各領域に関するニュース等を含んでもよい。 FIG. 3 shows an example of data stored in the explanatory sound database 170 in a table format. The explanatory sound database 170 stores a position range indicating a range where the region is located and sound data regarding the region in association with the region. The position range may include, for example, the latitude and longitude that are the starting point of the region including the region, and the latitude and longitude that are the end point. Note that the position range may include a plurality of ranges where the region is located. The audio data may include, for example, a plurality of audio data with different details about each region, for example, audio data with an outline description and a detailed description. Further, the audio data may include news about each area.

例えば、日本についての概要説明データとは、日本全体の特色、人口、面積等を説明する音声データであってよい。また、日本についての詳細を説明する音声データとは、日本に含まれる各地域の特色、人口、面積等を説明する音声データであってよく、日本の歴史等を説明する音声データであってもよい。 For example, the outline explanation data about Japan may be voice data explaining characteristics, population, area, etc. of the whole of Japan. Moreover, the audio data explaining the details about Japan may be audio data explaining the characteristics, population, area, etc. of each region included in Japan, or even audio data explaining the history of Japan. Good.

図４は、画像が撮像された位置の分布の一例を示す。図４の例では、画像格納部２１０は、長崎、広島、京都、静岡、東京、及び北海道で撮像した、それぞれ１０枚、９枚、１０枚、７枚、８枚、及び６枚の画像を格納している。撮像領域判断部２８２は、例えば、予め定めた緯度及び経度範囲毎の部分領域に区切り、画像が撮像された位置を含む部分領域を判断する。撮像枚数算出部２８８は、部分領域のそれぞれで撮像された画像の枚数を算出する。そして、撮像位置分布算出部２８６は、画像が撮像された位置が含まれる部分領域の分布を判断する。 FIG. 4 shows an example of the distribution of positions at which images are taken. In the example of FIG. 4, the image storage unit 210 captures 10 images, 9 images, 10 images, 7 images, 8 images, and 6 images captured in Nagasaki, Hiroshima, Kyoto, Shizuoka, Tokyo, and Hokkaido, respectively. Storing. The imaging area determination unit 282 determines, for example, a partial area including a position where an image is captured by dividing into partial areas for each predetermined latitude and longitude range. The number of captured images calculation unit 288 calculates the number of images captured in each of the partial areas. Then, the imaging position distribution calculating unit 286 determines the distribution of the partial area including the position where the image is captured.

そして、音声取得部２６２は、撮像位置が広い範囲に分布しているか否かを判断する。例えば、音声取得部２６２は、長崎で撮像した画像が画像選択部２７８によって選択されたときには、撮像位置が狭い範囲に分布していると判断して、日本をより詳しく説明する音声データとして、例えば長崎を説明する音声データを説明音データベース１７０から取得する。また、長崎、広島、京都、静岡、東京、及び北海道で撮像した画像が画像選択部２７８によって選択された場合には、撮像位置がより広い範囲に分布していると判断して、撮像領域についての概要を説明する音声データとして、例えば日本を説明する音声データを説明音データベース１７０から取得する。 Then, the sound acquisition unit 262 determines whether the imaging positions are distributed over a wide range. For example, when the image picked up in Nagasaki is selected by the image selection unit 278, the sound acquisition unit 262 determines that the image pickup positions are distributed in a narrow range, and as sound data explaining Japan in more detail, for example, Audio data describing Nagasaki is acquired from the explanatory sound database 170. When images picked up in Nagasaki, Hiroshima, Kyoto, Shizuoka, Tokyo, and Hokkaido are selected by the image selection unit 278, it is determined that the image pickup positions are distributed over a wider range, For example, voice data explaining Japan is acquired from the explanation sound database 170 as voice data explaining the outline of the above.

なお、撮像枚数算出部２８８によって算出される枚数がより多い場合には、音声取得部２６２は、それぞれの撮像領域についてのより詳しい説明の音声データを説明音データベース１７０から取得し、枚数がより少ない場合は、それぞれの撮像領域についてのより概要を説明する音声データを説明音データベース１７０から取得してよい。 Note that when the number of images calculated by the imaging number calculation unit 288 is larger, the voice acquisition unit 262 acquires more detailed explanation audio data for each imaging region from the explanation sound database 170, and the number of images is smaller. In such a case, audio data that explains the outline of each imaging region may be acquired from the explanatory sound database 170.

また、音声取得部２６２は、選択された枚数に対する、特定の部分領域で撮像された画像の枚数の比率を計算して、当該比率が予め定めた比率よりも大きい場合に、当該特定の部分領域について説明する音声データを説明音データベース１７０から取得する。例えば、画像格納部２１０が、長崎、広島、京都、東京、及び北海道で撮像した、それぞれ６枚、７枚、３０枚、４枚、３枚の合計５０枚の画像を格納しており、半数以上の３０枚が京都で撮像された画像である場合には、音声取得部２６２は、京都を説明する音声データを説明音データベース１７０から取得する。このため、出力装置１４０は、ユーザ１８０が特に多く撮像した場所について説明する音声をユーザ１８０に提供することができる。 In addition, the sound acquisition unit 262 calculates the ratio of the number of images captured in the specific partial area to the selected number, and when the ratio is larger than a predetermined ratio, the specific partial area Is obtained from the explanation sound database 170. For example, the image storage unit 210 stores a total of 50 images of 6, 7, 30, 4, and 3 images captured in Nagasaki, Hiroshima, Kyoto, Tokyo, and Hokkaido, respectively. When the above 30 images are images captured in Kyoto, the sound acquisition unit 262 acquires sound data describing Kyoto from the description sound database 170. For this reason, the output device 140 can provide the user 180 with sound that describes a place where the user 180 has captured a particularly large amount.

図５は、画像が撮像された時刻の分布の一例を示す。撮像期間判断部２８４は、画像が撮像された時間範囲を、部分領域毎に判断する。例えば、撮像期間判断部２８４は、長崎、広島、京都、静岡、東京、及び北海道のそれぞれを含む部分領域で撮像された時間範囲（ｔ１〜ｔ１０、ｔ１１〜ｔ１９、ｔ２０〜ｔ２９、ｔ３０〜ｔ３６、ｔ３７〜ｔ４４、及びｔ４５〜ｔ５０）を判断する。 FIG. 5 shows an example of the distribution of times when images are captured. The imaging period determination unit 284 determines a time range in which an image is captured for each partial area. For example, the imaging period determination unit 284 includes time ranges (t1 to t10, t11 to t19, t20 to t29, t30 to t36, which are captured in partial areas including each of Nagasaki, Hiroshima, Kyoto, Shizuoka, Tokyo, and Hokkaido. t37 to t44 and t45 to t50) are determined.

そして、音声取得部２６２は、撮像期間の長さを判断する。例えば、音声取得部２６２は、長崎で撮像した画像が画像選択部２７８によって選択されたときには、長崎で撮像された期間（ｔ１〜ｔ１０）がより短いと判断して、長崎を説明する音声データを説明音データベース１７０から取得する。また、長崎、広島、京都、静岡、東京、及び北海道で撮像した画像が画像選択部２７８によって選択された場合には、撮像された期間（ｔ１〜ｔ５０）がより長いと判断して、撮像領域についての詳しい説明の音声データとして、例えば日本の詳しい説明の音声データを説明音データベース１７０から取得する。 Then, the audio acquisition unit 262 determines the length of the imaging period. For example, when the image picked up in Nagasaki is selected by the image selection unit 278, the sound acquisition unit 262 determines that the period (t1 to t10) picked up in Nagasaki is shorter, and obtains sound data describing Nagasaki. Obtained from the explanatory sound database 170. When images picked up in Nagasaki, Hiroshima, Kyoto, Shizuoka, Tokyo, and Hokkaido are selected by the image selection unit 278, it is determined that the imaged period (t1 to t50) is longer, and the imaging region For example, Japanese detailed audio data is acquired from the explanatory sound database 170.

そして、音声取得部２６２は、選択された画像が撮像された期間に対する、特定の部分領域で撮像された期間の比率を計算して、当該比率が予め定めた比率よりも大きい場合に、当該特定の部分領域について説明する音声データを説明音データベース１７０から取得する。例えば、画像格納部２１０が、長崎、広島、京都、東京、及び北海道で撮像した画像を格納している場合に、京都で撮像された期間（ｔ６４〜ｔ９３）が、選択された画像が撮像された期間の合計（ｔ５１〜ｔ５６、ｔ５７〜ｔ６３、ｔ６４〜ｔ９３、ｔ９４〜ｔ９７、ｔ９８〜ｔ１０の合計期間）の半分の期間以上である場合に、音声取得部２６２は京都について説明する音声データを説明音データベース１７０から取得する。このため、出力装置１４０は、ユーザ１８０が特に長く滞在して撮像した場所について説明する音声をユーザ１８０に提供することができる。 Then, the sound acquisition unit 262 calculates the ratio of the period when the selected image is captured with respect to the period when the selected image is captured, and if the ratio is greater than a predetermined ratio, The voice data describing the partial area is acquired from the explanation sound database 170. For example, when the image storage unit 210 stores images captured in Nagasaki, Hiroshima, Kyoto, Tokyo, and Hokkaido, the selected image is captured during the period (t64 to t93) captured in Kyoto. Audio acquisition unit 262 receives the audio data for explaining the Kyoto data when the period is equal to or longer than half of the total period (total period of t51 to t56, t57 to t63, t64 to t93, t94 to t97, t98 to t10). Obtained from the explanatory sound database 170. For this reason, the output device 140 can provide the user 180 with a sound that explains the location where the user 180 stayed for a long time and captured the image.

図６は、撮像装置６００のブロック構成の一例を示す。撮像装置６００は、図１で説明した撮像装置１００の他の例であって、特に撮像された画像とともに録音された音声から音楽、環境音等の特徴的な音声を抽出して格納する機能を有する。撮像装置６００は、モード設定部６９２、撮像部６０２、表示部６７０、データ格納部６９８、録音部６５０、特徴音抽出部６９４、条件格納部６６０、及び音声取得部６９６を備える。 FIG. 6 shows an example of a block configuration of the imaging apparatus 600. The imaging apparatus 600 is another example of the imaging apparatus 100 described with reference to FIG. 1, and has a function of extracting and storing characteristic sounds such as music and environmental sounds from voices recorded together with captured images. Have. The imaging apparatus 600 includes a mode setting unit 692, an imaging unit 602, a display unit 670, a data storage unit 698, a recording unit 650, a characteristic sound extraction unit 694, a condition storage unit 660, and a voice acquisition unit 696.

撮像部６０２は、画像を撮像する。撮像部６０２は、具体的には、被写体からの光をＣＣＤ等の撮像デバイスで受光して、被写体を撮像する。なお、撮像部６０２は所定の時間間隔で連続的に被写体を撮像してもよい。そして、撮像部６０２は、連続的に撮像して得られる所定の個数の画像を保持してもよい。そして、撮像部６０２は、撮像部６０２が保持している画像のうち、撮像を指示された時刻に最も近いタイミングで撮像された画像を撮像画像としてもよい。 The imaging unit 602 captures an image. Specifically, the imaging unit 602 captures the subject by receiving light from the subject with an imaging device such as a CCD. Note that the imaging unit 602 may continuously capture the subject at predetermined time intervals. Then, the imaging unit 602 may hold a predetermined number of images obtained by continuously capturing images. Then, the imaging unit 602 may use, as the captured image, an image captured at the timing closest to the time when the imaging is instructed, among the images held by the imaging unit 602.

表示部６７０は、撮像部６０２が有する受光素子により受光した光の画像を表示する。モード設定部６９２は、表示部６７０が画像を表示している状態の動作モードである撮像モード、又は表示部６７０が画像を表示していない状態の動作モードである非撮像モードに当該撮像装置６００を設定する。 The display unit 670 displays an image of light received by the light receiving element included in the imaging unit 602. The mode setting unit 692 includes the imaging device 600 in an imaging mode that is an operation mode in a state where the display unit 670 is displaying an image or a non-imaging mode that is an operation mode in which the display unit 670 is not displaying an image. Set.

録音部６５０は、撮像部６０２の周囲の音声を録音する。なお、録音部６５０は、モード設定部６９２が撮像モードに設定している場合、及びモード設定部６９２が非撮像モードに設定している場合の双方において、撮像部６０２の周囲の音声を録音する。 The recording unit 650 records the sound around the imaging unit 602. Note that the recording unit 650 records the sound around the imaging unit 602 both when the mode setting unit 692 is set to the imaging mode and when the mode setting unit 692 is set to the non-imaging mode. .

特徴音抽出部６９４は、録音部６５０が録音した音声から予め定められた種類の音声を抽出する。例えば、特徴音抽出部６９４は、録音部６５０が録音した音声から音楽を抽出する。この場合、特徴音抽出部６９４は、音声の周波数スペクトルに基づいて基本周波数を抽出して、音階を決定する。そして、特徴音抽出部６９４は、決定した音階に基づいて、リズム、テンポ、調性等の音楽の特徴量を判断して、音符データを抽出する。また、特徴音抽出部６９４は、更に、音符データに基づいて音楽のコード進行を抽出してもよい。 The feature sound extraction unit 694 extracts a predetermined type of sound from the sound recorded by the recording unit 650. For example, the feature sound extraction unit 694 extracts music from the sound recorded by the recording unit 650. In this case, the feature sound extraction unit 694 extracts the fundamental frequency based on the frequency spectrum of the speech and determines the scale. Then, the feature sound extraction unit 694 determines music feature values such as rhythm, tempo, and tonality based on the determined scale, and extracts note data. The feature sound extraction unit 694 may further extract the chord progression of music based on the note data.

条件格納部６６０は、特徴音抽出部６９４が抽出する環境音のそれぞれの種類を特定する条件を予め格納する。具体的には、条件格納部６６０は、犬、鳥の鳴き声、虫の鳴き声、波の音等の環境音の種類毎に、それぞれの種類の環境音に特徴的な周波数スペクトルを格納する。そして、特徴音抽出部６９４は、条件格納部６６０が格納する条件に一致する環境音を、録音部６５０が録音した音声から抽出する。例えば、特徴音抽出部６９４は、条件格納部６６０が格納する周波数スペクトルに予め定められた一致度以上で一致する環境音を、録音部６５０が録音した音声から抽出する。なお、条件格納部６６０は、環境音の音声そのものを、環境音の種類毎に格納してよい。この場合、特徴音抽出部６９４は、条件格納部６６０が格納する環境音の音声と録音部６５０が録音した音声とを比較して、音声の特徴量（例えば、周波数スペクトル等）が最も一致度の高い環境音を抽出して、当該環境音の種類を決定してよい。 The condition storage unit 660 stores in advance conditions for specifying each type of environmental sound extracted by the feature sound extraction unit 694. Specifically, the condition storage unit 660 stores a frequency spectrum that is characteristic of each type of environmental sound for each type of environmental sound, such as a dog, bird call, insect call, and wave sound. Then, the feature sound extraction unit 694 extracts the environmental sound that matches the conditions stored in the condition storage unit 660 from the sound recorded by the recording unit 650. For example, the feature sound extraction unit 694 extracts the environmental sound that matches the frequency spectrum stored in the condition storage unit 660 with a predetermined matching degree or higher from the sound recorded by the recording unit 650. The condition storage unit 660 may store the environmental sound itself for each type of environmental sound. In this case, the feature sound extraction unit 694 compares the sound of the environmental sound stored in the condition storage unit 660 and the sound recorded by the recording unit 650, and the feature amount (for example, frequency spectrum) of the sound is the most coincident. A high environmental sound may be extracted to determine the type of the environmental sound.

なお、特徴音抽出部６９４は、撮像部６０２が画像を撮像した時刻から予め設定された時間内に、録音部６５０が録音した音声から予め定められた種類の音声を抽出する。例えば、特徴音抽出部６９４は、撮像部６０２が画像を撮像した時刻から予め設定された時間内に、録音部６５０が録音した音声から音楽又は環境音を抽出する。具体的には、特徴音抽出部６９４は、モード設定部６９２が撮像モードに設定している時間を含み、モード設定部６９２が撮像モードに設定している時間より長い予め設定された時間内に、録音部６５０が録音した音声から予め定められた種類の音声を抽出する。より具体的には、特徴音抽出部６９４は、モード設定部６９２が撮像モードに設定している時間を含み、モード設定部６９２が撮像モードに設定している時間より長い、予め設定された時間内に、録音部６５０が録音した音声から音楽又は環境音を抽出する。 The feature sound extraction unit 694 extracts a predetermined type of sound from the sound recorded by the recording unit 650 within a preset time from the time when the image capturing unit 602 captured the image. For example, the feature sound extraction unit 694 extracts music or environmental sound from the sound recorded by the recording unit 650 within a preset time from the time when the image capturing unit 602 captured the image. Specifically, the feature sound extraction unit 694 includes the time set by the mode setting unit 692 in the imaging mode and within a preset time longer than the time set by the mode setting unit 692 in the imaging mode. A predetermined type of sound is extracted from the sound recorded by the recording unit 650. More specifically, the feature sound extraction unit 694 includes a time set in advance by the mode setting unit 692 including the time set in the imaging mode and longer than the time set by the mode setting unit 692 in the imaging mode. The music or environmental sound is extracted from the voice recorded by the recording unit 650.

音楽データベース１７２は、時代別に複数の音楽を格納する。また、音楽データベース１７２は、ジャンル別に複数の音楽を格納する。具体的には、音楽データベース１７２は、音楽データに対応づけて、音楽のジャンル及び時代を格納する。また、音楽データベース１７２は、音楽データに対応づけて、音符データ、リズム、テンポ、調性、及びコード進行等の、音楽の特徴量を格納してよい。他にも、音楽データベース１７２は、音楽データに対応づけて、当該音楽に関連する人物、例えば作曲者、作詞者、編曲者、演奏者等を格納してもよい。また、音楽データベース１７２は、音楽データに対応づけて、当該音楽が発信される地域の位置を示す発信位置、発信される発信時刻、及び発信手段を格納してもよい。なお、発信手段とは、例えばラジオ、有線等であってよい。また、発信時刻とは、例えば放送局の番組データ等、音楽が放送されるべき時刻を示す情報であってよい。また、音楽データベース１７２は、音楽データに対応づけて、当該音楽が複数の地域においてヒットした程度を示す情報を、地域毎及び時代毎に格納してもよい。 The music database 172 stores a plurality of music by era. The music database 172 stores a plurality of music by genre. Specifically, the music database 172 stores music genres and times in association with music data. The music database 172 may store music feature values such as note data, rhythm, tempo, tonality, and chord progression in association with the music data. In addition, the music database 172 may store a person related to the music, for example, a composer, a lyricist, an arranger, a performer, or the like in association with the music data. Further, the music database 172 may store a transmission position indicating a position of a region where the music is transmitted, a transmission time at which the music is transmitted, and a transmission unit in association with the music data. Note that the transmission means may be, for example, a radio or a cable. The transmission time may be information indicating the time at which music is to be broadcast, such as program data of a broadcasting station. The music database 172 may store information indicating the degree to which the music hits in a plurality of regions in association with the music data for each region and each era.

音声取得部６９６は、複数の種類の音声を格納する音声データベースから、特徴音抽出部６９４が抽出した音声と同一の種類の音声を取得する。具体的には、音声取得部６９６は、音楽データベース１７２から、特徴音抽出部６９４が抽出した音楽と同一の音楽を取得する。具体的には、音声取得部６９６は、特徴音抽出部６９４が抽出した音符データと一致する音符データを有する音楽を音楽データベース１７２から取得する。このとき、音声取得部６９６は、撮像部６０２が画像を撮像したタイミングにおいて撮像装置６００の撮像位置及び撮像時刻を検出し、当該撮像位置を含む発信位置及び当該撮像時刻を含む発信時刻に対応づけて音楽データベース１７２に格納された音楽データの中から、特徴音抽出部６９４が抽出した音楽と同一の音楽を取得してよい。このとき、音声取得部６９６は、音楽データベース１７２に格納された音楽データのうち、撮像位置を含む地域及び撮像時刻を含む時代においてヒットした程度がより高い音楽を優先して検索し、取得してよい。また、音声取得部６９６は、音楽が発信されるべき発信手段を撮像位置に基づいて特定して、当該発信手段によって発信される音楽から順に検索してもよい。例えば、音声取得部６９６は、住宅街で録音された音楽を取得する場合には、ラジオ放送で放送されるべき音楽から順に検索し、遊園地等で録音された音楽を取得する場合には、有線放送によって放送されるべき音楽から順に検索してよい。 The voice acquisition unit 696 acquires the same type of voice as the voice extracted by the feature sound extraction unit 694 from a voice database that stores a plurality of types of voice. Specifically, the voice acquisition unit 696 acquires the same music as the music extracted by the feature sound extraction unit 694 from the music database 172. Specifically, the voice acquisition unit 696 acquires from the music database 172 music having note data that matches the note data extracted by the feature sound extraction unit 694. At this time, the sound acquisition unit 696 detects the imaging position and imaging time of the imaging device 600 at the timing when the imaging unit 602 captures an image, and associates the transmission position including the imaging position and the transmission time including the imaging time. The same music as the music extracted by the feature sound extraction unit 694 may be acquired from the music data stored in the music database 172. At this time, the voice acquisition unit 696 preferentially searches for and acquires music having a higher degree of hit in the era including the region including the imaging position and the imaging time among the music data stored in the music database 172. Good. In addition, the voice acquisition unit 696 may specify a transmission unit to which music is to be transmitted based on the imaging position and sequentially search from the music transmitted by the transmission unit. For example, when acquiring music recorded in a residential area, the audio acquisition unit 696 searches in order from music to be broadcast on radio broadcast, and when acquiring music recorded at an amusement park, You may search in order from the music which should be broadcast by cable broadcasting.

また、音声取得部６９６は、複数の音楽を格納する音楽データベース１７２から、特徴音抽出部６９４が抽出した音楽と同一の種類の音楽を取得する。具体的には、音声取得部６９６は、音楽データベース１７２から、特徴音抽出部６９４が抽出した音楽と同じ時代の音楽を取得する。また、音声取得部６９６は、音楽データベース１７２から、特徴音抽出部６９４が抽出した音楽と同じジャンルの音楽を取得する。具体的には、音声取得部６９６は、特徴音抽出部６９４が抽出したリズム、テンポ、調性、及びコード進行等の特徴量に基づいて音楽のジャンル及び／又は時代を特定して、特定したジャンル及び／又は時代の音楽を音楽データベース１７２から取得する。他にも、音声取得部６９６は、特徴音抽出部６９４が抽出した特徴量に基づいて、当該特徴量の音楽を音楽データベース１７２を検索することによって、当該音楽に関連する人物を特定して、特定した人物に対応づけられた音楽を音楽データベース１７２から取得してもよい。なお、音声取得部６９６は、音楽データベース１７２に格納された音楽データのうち、撮像位置を含む地域及び撮像時刻を含む期間においてヒットした程度が最も高い音楽を取得してよい。 In addition, the voice acquisition unit 696 acquires the same type of music as the music extracted by the feature sound extraction unit 694 from the music database 172 storing a plurality of music. Specifically, the voice acquisition unit 696 acquires, from the music database 172, music of the same era as the music extracted by the feature sound extraction unit 694. Also, the voice acquisition unit 696 acquires music of the same genre as the music extracted by the feature sound extraction unit 694 from the music database 172. Specifically, the voice acquisition unit 696 specifies and specifies the genre and / or period of music based on feature quantities such as rhythm, tempo, tonality, and chord progression extracted by the feature sound extraction unit 694. Genre and / or period music is obtained from the music database 172. In addition, the voice acquisition unit 696 specifies a person related to the music by searching the music database 172 for music of the feature amount based on the feature amount extracted by the feature sound extraction unit 694. Music associated with the specified person may be acquired from the music database 172. Note that the voice acquisition unit 696 may acquire music having the highest degree of hit among the music data stored in the music database 172 in the region including the imaging position and the period including the imaging time.

また、音声取得部６９６は、環境音データベース１７４から、特徴音抽出部６９４が抽出した環境音と同一の種類の環境音を取得する。なお、条件格納部６６０が環境音そのものを格納している場合には、音声取得部６９６は、特徴音抽出部６９４が抽出した音声と同種の音声を条件格納部６６０から取得してもよい。 In addition, the sound acquisition unit 696 acquires from the environmental sound database 174 the same type of environmental sound as the environmental sound extracted by the feature sound extraction unit 694. When the condition storage unit 660 stores the environmental sound itself, the sound acquisition unit 696 may acquire the same type of sound as the sound extracted by the feature sound extraction unit 694 from the condition storage unit 660.

データ格納部６９８は、音声取得部６９６が取得した音声と撮像部６０２が撮像した画像とを同期して出力させるべく対応づけて格納する。具体的には、データ格納部６９８は、音声取得部６９６が取得した音楽と撮像部６０２が撮像した画像とを同期して出力させるべく対応づけて格納する。他にも、データ格納部６９８は、音声取得部６９６が取得した環境音と撮像部６０２が撮像した画像とを同期して出力させるべく対応づけて格納する。以上説明した撮像装置６００によれば、撮像装置６００によって撮像したときに撮像装置６００の周囲を流れていたＢＧＭと同じ音楽を、画像とともにユーザ１８０に提供することができる。また、撮像装置６００は、撮像時の周囲の環境音を画像とともにユーザ１８０に提供することができる。 The data storage unit 698 stores the audio acquired by the audio acquisition unit 696 and the image captured by the imaging unit 602 in association with each other so as to be output in synchronization. Specifically, the data storage unit 698 stores the music acquired by the voice acquisition unit 696 and the image captured by the imaging unit 602 in association with each other so as to be output in synchronization. In addition, the data storage unit 698 stores the environmental sound acquired by the audio acquisition unit 696 and the image captured by the imaging unit 602 in association with each other so as to be output in synchronization. According to the imaging device 600 described above, it is possible to provide the user 180 with the same music as the BGM flowing around the imaging device 600 when the imaging device 600 takes an image. In addition, the imaging apparatus 600 can provide ambient sound at the time of imaging to the user 180 together with the image.

図７は、音楽データベース１７２が格納するデータの一例をテーブル形式示す。音楽データベース１７２は、音楽データに対応づけて、時代、音楽が属するジャンル、音楽が有するリズム、音楽が有するテンポ、音楽が有する調性、音楽が有するコード進行、音楽の楽譜を示す音符データ、音楽の原盤権を保有するレコード会社、音楽を制作したレーベル、音楽がヒットしたヒット度を格納する。なお、音楽データベース１７２が格納する時代とは、音楽が作成された時代、音楽が発表された時代、音楽が流行した時代等であってよい。なお、音楽データベース１７２は、本図において例示した属性の他に、音楽をプロデュースしたプロデューサ、音楽が含まれる音楽アルバム、音楽をダウンロードする場合に課金される課金情報等、音楽に関連する様々な属性を格納してよいことは言うまでもない。 FIG. 7 shows an example of data stored in the music database 172 in a table format. The music database 172 is associated with music data, the era, the genre to which the music belongs, the rhythm that the music has, the tempo that the music has, the tonality that the music has, the chord progression that the music has, the note data that indicates the musical score, the music Stores the record company that owns the original title, the label that produced the music, and the hit degree of the music hit. Note that the times when the music database 172 stores may be the times when music was created, the times when music was released, the times when music was popular, and the like. Note that the music database 172 includes various attributes related to music such as a producer that produced music, a music album that includes music, and billing information that is charged when music is downloaded, in addition to the attributes illustrated in FIG. Needless to say, can be stored.

図８は、撮像装置６００が録音する音声と時間範囲の一例を示す。撮像装置６００は、動作モードとして、撮像モード、出力モード、及び待機モードを有する。撮像モードとは、撮像装置６００が撮像及び／又は録音することのできる動作モードであってよい。また、出力モードは、例えば、撮像装置６００が画像及び／又は音声を出力することのできる動作モードであってよい。なお、非撮像モードは、出力モード及び待機モードを含む。そして、撮像装置６００は、撮像モードに設定されている期間（ｔ１〜ｔ３）、出力モードに設定されている期間（ｔ３〜ｔ４）、及び待機モードに設定されている期間（ｔ０〜ｔ１及びｔ４〜ｔ５）における撮像装置６００の周囲の音声を録音する。 FIG. 8 shows an example of sound and time range recorded by the imaging apparatus 600. The imaging apparatus 600 has an imaging mode, an output mode, and a standby mode as operation modes. The imaging mode may be an operation mode in which the imaging apparatus 600 can capture and / or record. The output mode may be an operation mode in which the imaging apparatus 600 can output an image and / or sound, for example. Note that the non-imaging mode includes an output mode and a standby mode. The imaging apparatus 600 includes a period (t1 to t3) set in the imaging mode, a period (t3 to t4) set in the output mode, and a period (t0 to t1 and t4) set in the standby mode. Sounds around the imaging apparatus 600 at t5) are recorded.

なお、撮像装置６００は、起動された直後には撮像装置６００は待機モードに設定される。撮像装置６００は、動作モードが待機モード又は出力モードに設定されている場合に、ユーザ１８０によって撮像動作又は録音動作に関する操作がなされた場合に撮像モードに遷移する。撮像動作に関する操作は、例えば、画像を撮像する操作、シャッタスピード、焦点距離等の、撮像条件を調整する操作等を含む。また、録音動作に関する操作は、例えば、音声を録音する操作、録音感度の調整等の、録音条件を調整する操作等を含む。また、撮像装置６００は、動作モードが待機モード又は撮像モードに設定されている場合に、ユーザ１８０によって撮像装置６００の出力動作に関する操作がなされた場合に、出力モードに遷移する。出力動作に関する操作は、例えば、画像を出力する操作、出力する画像を選択する操作、出力速度の調節等の、出力条件を調整する操作等を含む。なお、撮像装置６００は、撮像装置６００が撮像モード又は出力モードに設定されている場合に、ユーザ１８０による撮像装置６００の操作が所定の期間操作されなかったことを条件として、待機モードに遷移してよい。 Note that immediately after the imaging apparatus 600 is activated, the imaging apparatus 600 is set to the standby mode. When the operation mode is set to the standby mode or the output mode, the imaging apparatus 600 transitions to the imaging mode when an operation related to the imaging operation or the recording operation is performed by the user 180. The operation related to the imaging operation includes, for example, an operation for imaging an image, an operation for adjusting imaging conditions such as a shutter speed and a focal length, and the like. The operation related to the recording operation includes, for example, an operation for adjusting recording conditions such as an operation for recording sound and an adjustment of recording sensitivity. In addition, when the operation mode is set to the standby mode or the imaging mode, the imaging apparatus 600 transitions to the output mode when an operation related to the output operation of the imaging apparatus 600 is performed by the user 180. The operation related to the output operation includes, for example, an operation for outputting an image, an operation for selecting an image to be output, an operation for adjusting output conditions such as adjustment of an output speed, and the like. Note that when the imaging apparatus 600 is set to the imaging mode or the output mode, the imaging apparatus 600 transitions to the standby mode on condition that the operation of the imaging apparatus 600 by the user 180 is not operated for a predetermined period. It's okay.

特徴音抽出部６９４は、撮像時刻ｔ２から予め定めた時間だけ前又は後の時間範囲において、録音部６５０で録音された音声から音楽を抽出する。例えば、撮像装置６００がユーザ１８０から時刻ｔ２において撮像するよう指示された場合に、特徴音抽出部６９４は、撮像時刻ｔ２を含む撮像モードに設定されていた期間（ｔ１〜ｔ３）を含む、待機モード又は出力モードに設定されていた期間、例えば期間（ｔ０〜ｔ５）において録音された音声から音楽を抽出する。 The feature sound extraction unit 694 extracts music from the sound recorded by the recording unit 650 in a time range before or after a predetermined time from the imaging time t2. For example, when the imaging apparatus 600 is instructed by the user 180 to capture an image at time t2, the feature sound extraction unit 694 includes a standby period including a period (t1 to t3) set in the imaging mode including the imaging time t2. Music is extracted from the sound recorded during the period set to the mode or the output mode, for example, the period (t0 to t5).

なお、特徴音抽出部６９４は、撮像時刻ｔ２を含む、期間（ｔ０〜ｔ５）において録音部６５０によって録音された音声のうち、撮像時刻ｔ２の最も近い時刻で録音された音声から音楽を抽出してよい。また、特徴音抽出部６９４は、最も音量の大きい音量の音声の中から音楽を抽出してよい。 Note that the feature sound extraction unit 694 extracts music from the sound recorded at the time closest to the imaging time t2 among the sounds recorded by the recording unit 650 in the period (t0 to t5) including the imaging time t2. It's okay. Further, the feature sound extraction unit 694 may extract music from the sound with the loudest volume.

図９は、音楽を取得する手順の一例を示す。特徴音抽出部６９４は、録音部６５０が録音した音声から撮像した時刻を含む期間の音声を抽出する（Ｓ９１２）。そして、特徴音抽出部６９４は、Ｓ９１２で抽出した期間の音声から、音楽の特徴量を抽出する（Ｓ９１４）。音楽の特徴量とは、例えば音符データ、リズム、テンポ、調性、コード進行等であってよい。 FIG. 9 shows an example of a procedure for acquiring music. The feature sound extraction unit 694 extracts the sound in the period including the time taken from the sound recorded by the recording unit 650 (S912). Then, the feature sound extraction unit 694 extracts a feature amount of music from the sound of the period extracted in S912 (S914). The feature amount of music may be, for example, note data, rhythm, tempo, tonality, chord progression, or the like.

そして、音声取得部６９６は、特徴音抽出部６９４が抽出した音符データと一致する音楽を音楽データベース１７２の中から検索する（Ｓ９１６）。そして、特徴音抽出部６９４は、抽出した音符データと一致する音楽が音楽データベース１７２に格納されているか否かを判断する（Ｓ９１８）。音声取得部６９６は、Ｓ９１８において、一致する音楽があると判断した場合には、音楽データベース１７２の中から一致する音楽を取得する（Ｓ９２０）。 Then, the voice acquisition unit 696 searches the music database 172 for music that matches the note data extracted by the feature sound extraction unit 694 (S916). Then, the feature sound extraction unit 694 determines whether or not music that matches the extracted note data is stored in the music database 172 (S918). If it is determined in S918 that there is matching music, the voice acquisition unit 696 acquires matching music from the music database 172 (S920).

音声取得部６９６は、Ｓ９１８において一致する音楽がないと判断した場合には、特徴音抽出部６９４がＳ９１４で抽出した音楽の特徴量に基づいて、Ｓ９１２で抽出した音楽と同じジャンル及び／又は年代を特定する（Ｓ９２２）。例えば、音声取得部６９６は、音楽データベース１７２に格納された音楽の中で、最も類似する特徴量を持つ音楽を最も多く含むジャンル及び／又は時代を特定する。そして、音声取得部６９６は、Ｓ９２２で特定したジャンル及び／又は年代の音楽を、音楽データベース１７２に格納された音楽の中から取得する（Ｓ９２４）。また、音声取得部６９６は、Ｓ９２２において、ジャンル、年代の他にも、類似する特徴量を持つ最も多くの音楽に対応づけて音楽データベース１７２が格納しているレコード会社又はレーベルを特定してよい。そして、音声取得部６９６は、Ｓ９２４において、Ｓ９２２において特定したレコード会社又はレーベルの音楽を、音楽データベース１７２に格納された音楽の中から取得してよい。なお、Ｓ９２４において、音声取得部６９６は、同じ種類の音楽が複数存在する場合には、音楽データベース１７２が格納するヒット度が最も高い音楽を取得してよい。 If the sound acquisition unit 696 determines that there is no matching music in S918, based on the feature amount of the music extracted in S914 by the feature sound extraction unit 694, the same genre and / or age as the music extracted in S912 Is specified (S922). For example, the voice acquisition unit 696 identifies a genre and / or era that includes the most music having the most similar feature amount among the music stored in the music database 172. Then, the voice acquisition unit 696 acquires the music of the genre and / or age specified in S922 from the music stored in the music database 172 (S924). In S922, the voice acquisition unit 696 may specify the record company or label stored in the music database 172 in association with the most music having similar feature values in addition to the genre and age. . In S924, the voice acquisition unit 696 may acquire the music of the record company or label specified in S922 from the music stored in the music database 172. In S924, the voice acquisition unit 696 may acquire the music with the highest hit degree stored in the music database 172 when there are a plurality of the same type of music.

なお、音声取得部６９６は、Ｓ９１８で同一の音楽と判断された音楽が音楽データベース１７２から複数検索された場合は、検索された複数の音楽を音楽データベース１７２から取得してユーザ１８０に選択させてもよい。また、音声取得部６９６は、Ｓ９２４においても、音楽データベース１７２から取得した同じジャンル及び／又は年代の音楽を複数取得して、ユーザ１８０に選択させてもよい。他にも、音声取得部６９６は、検索された複数の音楽のうち音楽データベース１７２が格納するヒット度が最も高い音楽を取得してよい。また、音声取得部６９６は、複数の音楽が検索された場合には、検索された複数の音楽のうち最も多い数の音楽に対応づけて音楽データベース１７２が格納しているレコード会社又はレーベルの音楽を特定してよい。そして、音声取得部６９６は、特定したレコード会社又はレーベルの年代の音楽を、音楽データベース１７２に格納された音楽の中から取得してよい。なお、Ｓ９２０において音声取得部６９６は、Ｓ９１８で同一の音楽と判断された音楽と同じレコード会社又はレーベルの音楽を音楽データベース１７２から取得してもよい。また、音声取得部６９６は、レコード会社又はレーベルの他にも、Ｓ９１８で同一の音楽と判断された音楽と同じ属性に対応づけて音楽データベース１７２に格納されている音楽を、音楽データベース１７２から取得してもよい。 When a plurality of music determined to be the same music in S918 are searched from the music database 172, the voice acquisition unit 696 acquires the plurality of searched music from the music database 172 and causes the user 180 to select the music. Also good. Also, in S924, the voice acquisition unit 696 may acquire a plurality of pieces of music of the same genre and / or age acquired from the music database 172 and allow the user 180 to select them. In addition, the voice acquisition unit 696 may acquire music having the highest hit degree stored in the music database 172 among the plurality of searched music. In addition, when a plurality of music is searched, the voice acquisition unit 696 stores music of a record company or label stored in the music database 172 in association with the largest number of music among the plurality of searched music. May be specified. Then, the voice acquisition unit 696 may acquire the music of the specified record company or label from the music stored in the music database 172. In S920, the voice acquisition unit 696 may acquire music from the same record company or label as the music determined to be the same music in S918 from the music database 172. In addition to the record company or the label, the voice acquisition unit 696 acquires, from the music database 172, music stored in the music database 172 in association with the same attribute as the music determined to be the same music in S918. May be.

以上、図７から図９にかけて、音声取得部６９６が音楽データベース１７２から音楽を取得する動作について説明したが、同様の動作によって、音声取得部６９６は環境音データベース１７４から環境音を取得することができる。また、音声取得部６９６は、音楽、環境音の他にも、画像に対応づけて記録すべき予め定められた様々な種類の音声を、音声データベースから取得してよいことは言うまでもない。 As described above, the operation in which the sound acquisition unit 696 acquires music from the music database 172 has been described with reference to FIGS. 7 to 9. However, the sound acquisition unit 696 can acquire the environmental sound from the environmental sound database 174 by the same operation. it can. Needless to say, the sound acquisition unit 696 may acquire various types of predetermined sounds to be recorded in association with images, in addition to music and environmental sounds, from the sound database.

図１０は、出力装置１０４０のブロック構成の一例を示す。出力装置１０４０は、音声提供システムが有する出力装置１４０の他の一例であってよい。なお、撮像装置１００は、撮像した画像及び撮像したときの周囲の音声の他に、周囲の音声を録音した時刻も出力装置１０４０に送信する。 FIG. 10 shows an example of a block configuration of the output device 1040. The output device 1040 may be another example of the output device 140 included in the voice providing system. In addition to the captured image and the surrounding sound at the time of capturing, the image capturing apparatus 100 also transmits the time when the surrounding sound is recorded to the output device 1040.

出力装置１０４０は、画像格納部１０１０、出力部１０２４、音声取得部１０９６、特徴音抽出部１０９４、条件格納部１０６０、音声格納部１０２０、許容時間設定部１０４３、出力時刻検出部１０４４、出力要求取得部１０４８を備える。 The output device 1040 includes an image storage unit 1010, an output unit 1024, an audio acquisition unit 1096, a feature sound extraction unit 1094, a condition storage unit 1060, an audio storage unit 1020, an allowable time setting unit 1043, an output time detection unit 1044, and an output request acquisition. Part 1048.

画像格納部１０１０は、撮像装置１００によって撮像された画像を格納する。また、画像格納部１０１０は、画像に対応づけて当該画像が撮像された撮像時刻を格納する。 The image storage unit 1010 stores an image captured by the imaging device 100. The image storage unit 1010 stores the imaging time when the image is captured in association with the image.

音声格納部１０２０は、撮像装置１００によって録音された音声を格納する。音声格納部１０２０は、音声に対応づけて当該音声の録音時刻を格納する。具体的には、音声格納部１０２０は、撮像装置１００の周囲の音声を格納する。なお、録音時刻とは、音声の録音を開始した時刻であってよく、録音を終了した時刻であってよい。 The sound storage unit 1020 stores the sound recorded by the imaging device 100. The voice storage unit 1020 stores the recording time of the voice in association with the voice. Specifically, the sound storage unit 1020 stores sound around the imaging device 100. Note that the recording time may be the time when voice recording starts or the time when recording ends.

特徴音抽出部１０９４は、音声格納部１０２０が格納する音声から予め定められた種類の音声を抽出する。具体的には、音声取得部１０９６は、複数の種類の音声を格納する音声データベースから、特徴音抽出部１０９４が抽出した音声と同一の種類の音声を取得する。例えば、特徴音抽出部１０９４は、音声格納部１０２０が格納する音声から音楽を抽出する。そして、音声取得部１０９６は、複数の音楽を格納する音楽データベース１７２から、特徴音抽出部１０９４が抽出した音楽と同一の種類の音楽を取得する。他にも、音声取得部１０９６は、環境音データベース１７４から、特徴音抽出部１０９４が抽出した環境音と同一の種類の環境音を取得する。なお、音声取得部１０９６が音楽又は環境音等の音声を取得する具体的な動作は、図９で説明した音声取得部６９６の動作と同一であるので、説明を省略する。 The feature sound extraction unit 1094 extracts a predetermined type of sound from the sound stored in the sound storage unit 1020. Specifically, the voice acquisition unit 1096 acquires the same type of voice as the voice extracted by the feature sound extraction unit 1094 from a voice database that stores a plurality of types of voice. For example, the feature sound extraction unit 1094 extracts music from the sound stored in the sound storage unit 1020. Then, the sound acquisition unit 1096 acquires the same type of music as the music extracted by the feature sound extraction unit 1094 from the music database 172 storing a plurality of music. In addition, the sound acquisition unit 1096 acquires the same type of environmental sound as the environmental sound extracted by the feature sound extraction unit 1094 from the environmental sound database 174. Note that the specific operation in which the sound acquisition unit 1096 acquires sound such as music or environmental sound is the same as the operation of the sound acquisition unit 696 described with reference to FIG.

出力要求取得部１０４８は、画像格納部１０１０が格納する画像の出力要求を取得する。許容時間設定部１０４３は、出力要求取得部１０４８が出力要求を取得した時刻と、画像格納部１０１０が格納する画像の撮像時刻との差がより大きい場合に、音声を抽出する期間である許容時間をより長く設定する。そして、特徴音抽出部１０９４は、画像が撮像された時刻から予め設定された許容時間内に録音された音声から音楽を抽出する。出力部１０２４は、音声取得部１０９６が取得した音声と画像格納部１０１０が格納する画像とを同期して出力する。具体的には、出力部１０２４は、音声取得部１０９６が取得した音楽又は環境音と、画像格納部１０１０が格納する画像とを同期して出力する。 The output request acquisition unit 1048 acquires an output request for an image stored in the image storage unit 1010. The permissible time setting unit 1043 is a permissible time during which sound is extracted when the difference between the time when the output request acquisition unit 1048 acquires the output request and the image capturing time stored in the image storage unit 1010 is larger. Set a longer time. Then, the feature sound extraction unit 1094 extracts music from the sound recorded within the preset allowable time from the time when the image was captured. The output unit 1024 outputs the audio acquired by the audio acquisition unit 1096 and the image stored in the image storage unit 1010 in synchronization. Specifically, the output unit 1024 outputs the music or environmental sound acquired by the audio acquisition unit 1096 and the image stored in the image storage unit 1010 in synchronization.

本実施形態の出力装置１０４０によれば、撮像した画像を、当該画像を撮像したときに流れていた音楽、例えば撮像当時に流行していた音楽とともにユーザ１８０に提供することができる。 According to the output device 1040 of this embodiment, the captured image can be provided to the user 180 together with music that was played when the image was captured, for example, music that was popular at the time of imaging.

図１１は、許容時間設定部１０４３が設定する許容時間の一例を示す。例えば、ユーザ１８０から、時刻ｔ１２で撮像された画像を出力する指示を時刻ｔ１３において受け付けた場合に、許容時間設定部１０４３は、出力を指示された時刻と出力される画像が撮像された時刻との差（ｔ１３−ｔ１２）に基づいて、特徴音抽出部１０９４に音楽を抽出する音声をさせる許容範囲Δｔ５２を決定する。そして、特徴音抽出部１０９４は、音声格納部１０２０に格納されている音声のうち、時刻ｔ１２からΔｔ５２だけ前又は後の時間範囲（時刻ｔ１２―Δｔ５２〜時刻ｔ１２＋Δｔ５２）に録音された音声の中から音楽を抽出する。 FIG. 11 shows an example of the allowable time set by the allowable time setting unit 1043. For example, when an instruction to output an image captured at time t12 is received from the user 180 at time t13, the allowable time setting unit 1043 includes the time when the output is instructed and the time when the output image is captured. On the basis of the difference (t13−t12), an allowable range Δt52 for causing the feature sound extraction unit 1094 to output a sound for extracting music is determined. Then, the feature sound extraction unit 1094 among the voices stored in the voice storage unit 1020, from the voices recorded in the time range before or after Δt52 from time t12 (time t12−Δt52 to time t12 + Δt52). Extract music.

なお、特徴音抽出部１０９４は、時刻ｔ１２から許容範囲Δｔ５２だけ前の時刻から時刻ｔ１２までの間に録音された音声の中から音声を抽出してもよいし、時刻ｔ１２から許容範囲Δｔ５２だけ後の時刻までの間に録音された音声の中から音声を抽出してもよい。 Note that the feature sound extraction unit 1094 may extract the sound from the sound recorded between the time t12 before the time t12 and the time t12 before the time t12, or after the time t12 the time t12 after the time t12. Voices may be extracted from voices recorded up to the time.

また、許容時間設定部１０４３は、画像格納部１０１０が格納する撮像画像が撮像された時刻と、出力する指示を受け付けた時刻との差が大きいほど、許容時間をより大きく設定する。図１１の例では、許容時間設定部１０４３は、時刻ｔ１２よりも前の時刻ｔ１１に撮像された画像を出力するよう時刻ｔ１３において指示された場合には、許容範囲Δｔ５２に比べて時間的により長い許容範囲Δｔ５１を設定する。そして、特徴音抽出部１０９４は、時刻（ｔ１１−Δｔ５１）から時刻（ｔ１１＋Δｔ５１）までの時間範囲内で録音された音声の中から音楽を抽出する。 Further, the allowable time setting unit 1043 sets the allowable time to be larger as the difference between the time when the captured image stored in the image storage unit 1010 is captured and the time when the instruction to output is received is larger. In the example of FIG. 11, the allowable time setting unit 1043 is longer in time than the allowable range Δt52 when instructed at time t13 to output an image captured at time t11 prior to time t12. An allowable range Δt51 is set. Then, the feature sound extraction unit 1094 extracts music from the sound recorded within the time range from the time (t11−Δt51) to the time (t11 + Δt51).

なお、許容時間設定部１０４３は、撮像された時刻と出力を指示された時刻との間の時間を予め定められた数で割って得られた期間を許容時間として設定してよい。この場合、特徴音抽出部１０９４は、例えば１０日前に撮像した画像を出力するときには、撮像時刻の前後１日の間に録音された音声から音楽を抽出する。また、特徴音抽出部１０９４は、１０年前に撮像した画像を出力するときには、撮像時刻の前後１年の間に録音された音声の中から音楽を抽出する。 The allowable time setting unit 1043 may set a period obtained by dividing the time between the imaged time and the time when the output is instructed by a predetermined number as the allowable time. In this case, the feature sound extraction unit 1094 extracts music from the sound recorded during one day before and after the imaging time, for example, when outputting an image captured ten days ago. Further, when outputting an image captured ten years ago, the feature sound extraction unit 1094 extracts music from voices recorded for one year before and after the imaging time.

以上説明した出力装置１０４０によれば、例えば観光地を移動する車内で聴いたＦＭ放送の音楽、観光地で立ち寄った店内を流れていた有線放送の音楽等、ユーザ１８０が訪れた場所で流れていた音楽を適切に判断して、その音楽をダウンロードしてきて画像とともに再生することができる。また、出力装置１０４０は、ユーザ１８０が訪れた場所の環境音と同種の環境音をダウンロードして画像とともに再生することができる。また、出力装置１０４０は、より過去に撮像された画像を出力するときには、撮像された時刻を含むより広い時間範囲で録音された音声の中から音楽、環境音を含む音声を選択するので、ユーザ１８０は撮像当時に最も流行していた音楽を思い出しながら楽しく画像を鑑賞することができる。 According to the output device 1040 described above, for example, FM broadcast music heard in a car moving in a sightseeing spot, cable broadcasting music flowing in a shop at a sightseeing spot, etc. It is possible to appropriately determine the music that has been downloaded, download the music, and play it with the image. Further, the output device 1040 can download the environmental sound of the same type as the environmental sound of the place visited by the user 180 and reproduce it together with the image. Further, when outputting an image captured in the past, the output device 1040 selects a sound including music and environmental sound from sounds recorded in a wider time range including the imaged time. 180 can enjoy the image happily while remembering the most popular music at the time of imaging.

図１２は、出力装置１４０、撮像装置６００、及び出力装置１０４０に係るコンピュータ１５００のハードウェア構成の一例を示す。コンピュータ１５００は、ホスト・コントローラ１５８２により相互に接続されるＣＰＵ１５０５、ＲＡＭ１５２０、グラフィック・コントローラ１５７５、及び表示装置１５８０を有するＣＰＵ周辺部と、入出力コントローラ１５８４によりホスト・コントローラ１５８２に接続される通信インターフェイス１５３０、ハードディスクドライブ１５４０、及びＣＤ−ＲＯＭドライブ１５６０を有する入出力部と、入出力コントローラ１５８４に接続されるＲＯＭ１５１０、フレキシブルディスク・ドライブ１５５０、及び入出力チップ１５７０を有するレガシー入出力部とを備える。 FIG. 12 illustrates an example of a hardware configuration of a computer 1500 related to the output device 140, the imaging device 600, and the output device 1040. The computer 1500 includes a CPU peripheral unit having a CPU 1505, a RAM 1520, a graphic controller 1575, and a display device 1580 connected to each other by a host controller 1582, and a communication interface 1530 connected to the host controller 1582 by an input / output controller 1584. An input / output unit having a hard disk drive 1540 and a CD-ROM drive 1560, and a legacy input / output unit having a ROM 1510, a flexible disk drive 1550, and an input / output chip 1570 connected to the input / output controller 1584.

ホスト・コントローラ１５８２は、ＲＡＭ１５２０と、高い転送レートでＲＡＭ１５２０をアクセスするＣＰＵ１５０５、及びグラフィック・コントローラ１５７５とを接続する。ＣＰＵ１５０５は、ＲＯＭ１５１０、及びＲＡＭ１５２０に格納されたプログラムに基づいて動作し、各部の制御を行う。グラフィック・コントローラ１５７５は、ＣＰＵ１５０５等がＲＡＭ１５２０内に設けたフレーム・バッファ上に生成する画像データを取得し、表示装置１５８０上に表示させる。これに代えて、グラフィック・コントローラ１５７５は、ＣＰＵ１５０５等が生成する画像データを格納するフレーム・バッファを、内部に含んでもよい。 The host controller 1582 connects the RAM 1520, the CPU 1505 that accesses the RAM 1520 at a high transfer rate, and the graphic controller 1575. The CPU 1505 operates based on programs stored in the ROM 1510 and the RAM 1520 and controls each unit. The graphic controller 1575 acquires image data generated by the CPU 1505 and the like on a frame buffer provided in the RAM 1520 and displays the image data on the display device 1580. Alternatively, the graphic controller 1575 may include a frame buffer that stores image data generated by the CPU 1505 or the like.

入出力コントローラ１５８４は、ホスト・コントローラ１５８２と、比較的高速な入出力装置であるハードディスクドライブ１５４０、通信インターフェイス１５３０、ＣＤ−ＲＯＭドライブ１５６０を接続する。ハードディスクドライブ１５４０は、コンピュータ１５００内のＣＰＵ１５０５が使用するプログラム、及びデータを格納する。通信インターフェイス１５３０は、ネットワークを介して出力装置１４０、撮像装置６００、又は出力装置１０４０と通信し、出力装置１４０、撮像装置６００、又は出力装置１０４０にプログラム、及びデータを提供する。ＣＤ−ＲＯＭドライブ１５６０は、ＣＤ−ＲＯＭ１５９５からプログラムまたはデータを読み取り、ＲＡＭ１５２０を介してハードディスクドライブ１５４０、及び通信インターフェイス１５３０に提供する。 The input / output controller 1584 connects the host controller 1582 to the hard disk drive 1540, the communication interface 1530, and the CD-ROM drive 1560, which are relatively high-speed input / output devices. The hard disk drive 1540 stores programs and data used by the CPU 1505 in the computer 1500. The communication interface 1530 communicates with the output device 140, the imaging device 600, or the output device 1040 via a network, and provides a program and data to the output device 140, the imaging device 600, or the output device 1040. The CD-ROM drive 1560 reads a program or data from the CD-ROM 1595 and provides it to the hard disk drive 1540 and the communication interface 1530 via the RAM 1520.

また、入出力コントローラ１５８４には、ＲＯＭ１５１０と、フレキシブルディスク・ドライブ１５５０、及び入出力チップ１５７０の比較的低速な入出力装置とが接続される。ＲＯＭ１５１０は、コンピュータ１５００が起動時に実行するブート・プログラムや、コンピュータ１５００のハードウェアに依存するプログラム等を格納する。フレキシブルディスク・ドライブ１５５０は、フレキシブルディスク１５９０からプログラムまたはデータを読み取り、ＲＡＭ１５２０を介してハードディスクドライブ１５４０、及び通信インターフェイス１５３０に提供する。入出力チップ１５７０は、フレキシブルディスク・ドライブ１５５０や、例えばパラレル・ポート、シリアル・ポート、キーボード・ポート、マウス・ポート等を介して各種の入出力装置を接続する。 The input / output controller 1584 is connected to the ROM 1510, the flexible disk drive 1550, and the relatively low-speed input / output device of the input / output chip 1570. The ROM 1510 stores a boot program executed when the computer 1500 is started up, a program depending on the hardware of the computer 1500, and the like. The flexible disk drive 1550 reads a program or data from the flexible disk 1590 and provides it to the hard disk drive 1540 and the communication interface 1530 via the RAM 1520. The input / output chip 1570 connects various input / output devices via a flexible disk drive 1550 and, for example, a parallel port, a serial port, a keyboard port, a mouse port, and the like.

ＲＡＭ１５２０を介して通信インターフェイス１５３０に提供されるプログラムは、フレキシブルディスク１５９０、ＣＤ−ＲＯＭ１５９５、またはＩＣカード等の記録媒体に格納されて利用者によって提供される。プログラムは、記録媒体から読み出され、ＲＡＭ１５２０を介して通信インターフェイス１５３０に提供され、ネットワークを介して出力装置１４０、撮像装置６００、又は出力装置１０４０に送信される。出力装置１４０、撮像装置６００、又は出力装置１０４０に送信されたプログラムは、出力装置１４０、撮像装置６００、又は出力装置１０４０においてインストールされて実行される。 A program provided to the communication interface 1530 via the RAM 1520 is stored in a recording medium such as the flexible disk 1590, the CD-ROM 1595, or an IC card and provided by the user. The program is read from the recording medium, provided to the communication interface 1530 via the RAM 1520, and transmitted to the output device 140, the imaging device 600, or the output device 1040 via the network. The program transmitted to the output device 140, the imaging device 600, or the output device 1040 is installed and executed in the output device 140, the imaging device 600, or the output device 1040.

出力装置１４０にインストールされて実行されるプログラムは、出力装置１４０を図１から図５において説明した出力装置１４０として機能させる。また、撮像装置６００にインストールされて実行されるプログラムは、撮像装置６００を、図６から図９において説明した撮像装置６００として機能させる。また、出力装置１０４０にインストールされて実行されるプログラムは、出力装置１０４０を、図１０及び図１１において説明した出力装置１０４０として機能させる。 The program installed and executed in the output device 140 causes the output device 140 to function as the output device 140 described with reference to FIGS. Further, the program installed and executed in the imaging apparatus 600 causes the imaging apparatus 600 to function as the imaging apparatus 600 described with reference to FIGS. Further, the program installed and executed in the output device 1040 causes the output device 1040 to function as the output device 1040 described with reference to FIGS. 10 and 11.

以上に示したプログラムは、外部の記憶媒体に格納されてもよい。記憶媒体としては、フレキシブルディスク１５９０、ＣＤ−ＲＯＭ１５９５の他に、ＤＶＤやＰＤ等の光学記録媒体、ＭＤ等の光磁気記録媒体、テープ媒体、ＩＣカード等の半導体メモリ等を用いることができる。また、専用通信ネットワークやインターネットに接続されたサーバシステムに設けたハードディスクまたはＲＡＭ等の記憶装置を記録媒体として使用し、ネットワークを介してプログラムをコンピュータ１５００に提供してもよい。 The program shown above may be stored in an external storage medium. As the storage medium, in addition to the flexible disk 1590 and the CD-ROM 1595, an optical recording medium such as a DVD or PD, a magneto-optical recording medium such as an MD, a tape medium, a semiconductor memory such as an IC card, or the like can be used. Further, a storage device such as a hard disk or a RAM provided in a server system connected to a dedicated communication network or the Internet may be used as a recording medium, and the program may be provided to the computer 1500 via the network.

以上、実施形態を用いて本発明を説明したが、本発明の技術的範囲は上記実施形態に記載の範囲には限定されない。上記実施形態に、多様な変更又は改良を加えることができる。そのような変更又は改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. Various modifications or improvements can be added to the above embodiment. It is apparent from the scope of the claims that the embodiments added with such changes or improvements can be included in the technical scope of the present invention.

音声提供システムの一例を示す図である。It is a figure which shows an example of an audio | voice provision system. 出力装置１４０のブロック構成の一例を示す図である。3 is a diagram illustrating an example of a block configuration of an output device 140. FIG. 説明音データベース１７０が格納するデータの一例をテーブル形式で示す図である。It is a figure which shows an example of the data which the description sound database 170 stores in a table format. 画像が撮像された位置の分布の一例を示す図である。It is a figure which shows an example of distribution of the position where the image was imaged. 画像が撮像された時刻の分布の一例を示す図である。It is a figure which shows an example of distribution of the time when the image was imaged. 撮像装置６００のブロック構成の一例を示す図である。FIG. 25 is a diagram illustrating an example of a block configuration of the imaging apparatus 600. 音楽データベース１７２が格納するデータの一例をテーブル形式示す図である。It is a figure which shows an example of the data which the music database 172 stores in a table format. 撮像装置６００が録音する音声と時間範囲の一例を示す図である。It is a figure which shows an example of the audio | voice and time range which the imaging device 600 records. 音楽を取得する手順の一例を示す図である。It is a figure which shows an example of the procedure which acquires music. 出力装置１０４０のブロック構成の一例を示す図である。5 is a diagram illustrating an example of a block configuration of an output device 1040. FIG. 許容時間の一例を示す図である。It is a figure which shows an example of allowable time. コンピュータ１５００のハードウェア構成の一例を示す図である。2 is a diagram illustrating an example of a hardware configuration of a computer 1500. FIG.

Explanation of symbols

１００撮像装置
１４０出力装置
１５０通信回線
１７０説明音データベース
１７２音楽データベース
１７４環境音データベース
１８０ユーザ
２１０画像格納部
２２４出力部
２６２音声取得部
２７８画像選択部
２８２撮像領域判断部
２８４撮像期間判断部
２８６撮像位置分布算出部
２８８撮像枚数算出部
６００撮像装置
６０２撮像部
６５０録音部
６６０条件格納部
６７０表示部
６９２モード設定部
６９４特徴音抽出部
６９６音声取得部
６９８データ格納部
１０１０画像格納部
１０２０音声格納部
１０２４出力部
１０４０出力装置
１０４３許容時間設定部
１０４８出力要求取得部
１０６０条件格納部
１０９４特徴音抽出部
１０９６音声取得部
１０４４出力時刻検出部
DESCRIPTION OF SYMBOLS 100 Imaging device 140 Output device 150 Communication line 170 Explanation sound database 172 Music database 174 Environmental sound database 180 User 210 Image storage part 224 Output part 262 Sound acquisition part 278 Image selection part 282 Imaging area judgment part 284 Imaging period judgment part 286 Imaging position Distribution calculation unit 288 Number of imaging calculation unit 600 Imaging device 602 Imaging unit 650 Recording unit 660 Condition storage unit 670 Display unit 692 Mode setting unit 694 Feature sound extraction unit 696 Audio acquisition unit 698 Data storage unit 1010 Image storage unit 1020 Audio storage unit 1024 Output unit 1040 Output device 1043 Allowable time setting unit 1048 Output request acquisition unit 1060 Condition storage unit 1094 Feature sound extraction unit 1096 Audio acquisition unit 1044 Output time detection unit

Claims

An imaging unit;
A recording unit for recording sound around the imaging unit;
A feature sound extraction unit that extracts a predetermined type of sound from the sound recorded by the recording unit;
A voice acquisition unit that acquires a voice of the same type as the voice extracted by the feature sound extraction unit from a voice database that stores a plurality of types of voice;
An imaging apparatus comprising: a data storage unit that stores the audio acquired by the audio acquisition unit and the image captured by the imaging unit in association with each other so as to be output in synchronization.

The imaging according to claim 1, wherein the characteristic sound extraction unit extracts a predetermined type of sound from the sound recorded by the recording unit within a preset time from the time when the image capturing unit captures an image. apparatus.

A display unit for displaying an image of light received by a light receiving element included in the imaging unit;
A mode setting unit that sets the imaging apparatus to an imaging mode that is an operation mode in which the display unit is displaying an image, or a non-imaging mode that is an operation mode in which the display unit is not displaying an image; Further comprising
The recording unit records sound around the imaging unit both when the mode setting unit is set to the imaging mode and when the mode setting unit is set to the non-imaging mode. The imaging device according to claim 2.

The characteristic sound extraction unit includes a time set by the mode setting unit in the imaging mode, and is longer than a time set by the mode setting unit in the imaging mode within a preset time. The imaging apparatus according to claim 3, wherein a predetermined type of sound is extracted from the sound recorded by the recording unit.

The voice database stores a plurality of music,
The feature sound extraction unit extracts music from the voice recorded by the recording unit,
The imaging apparatus according to claim 1, wherein the voice acquisition unit acquires the same music as the music extracted by the feature sound extraction unit from the voice database.

The voice database stores a plurality of music by era,
The feature sound extraction unit extracts music from the voice recorded by the recording unit,
The imaging apparatus according to claim 1, wherein the voice acquisition unit acquires music of the same era as the music extracted by the feature sound extraction unit from the voice database.

The voice database stores a plurality of music by genre,
The feature sound extraction unit extracts music from the voice recorded by the recording unit,
The imaging apparatus according to claim 1, wherein the voice acquisition unit acquires music of the same genre as the music extracted by the feature sound extraction unit from the voice database.

A condition storage unit that preliminarily stores conditions for specifying each type of environmental sound extracted by the feature sound extraction unit;
The voice database stores a plurality of environmental sounds for each type of environmental sound,
The characteristic sound extraction unit extracts an environmental sound that matches the condition stored in the condition storage unit from the sound recorded by the recording unit,
The voice acquisition unit acquires the same kind of environmental sound as the environmental sound extracted by the feature sound extraction unit from the voice database,
The imaging apparatus according to claim 1, wherein the data storage unit stores the environmental sound acquired by the audio acquisition unit and the image captured by the imaging unit in association with each other so as to be output in synchronization.

An imaging stage in which an image is captured by an imaging unit;
A recording stage for recording sound around the imaging unit;
A feature sound extraction step of extracting a predetermined type of sound from the sound recorded in the recording step;
A voice acquisition stage for acquiring a voice of the same type as the voice extracted in the feature sound extraction stage from a voice database storing a plurality of types of voice;
An imaging method comprising: a data storage step of storing the audio acquired in the audio acquisition step and the image captured by the imaging unit in association with each other so as to be output in synchronization.

A program for an imaging apparatus that captures an image, wherein the imaging apparatus is
An imaging unit for capturing an image;
A recording unit for recording sound around the imaging unit;
A feature sound extraction unit for extracting a predetermined type of sound from the sound recorded by the recording unit;
A voice acquisition unit that acquires a voice of the same type as the voice extracted by the feature sound extraction unit from a voice database that stores a plurality of types of voice;
A program that functions as a data storage unit that stores the audio acquired by the audio acquisition unit and the image captured by the imaging unit in association with each other so as to be output in synchronization.

An image storage unit for storing an image captured by the imaging device;
An audio storage unit for storing audio recorded by the imaging device;
A feature sound extraction unit that extracts a predetermined type of sound from the sound stored in the sound storage unit;
A voice acquisition unit that acquires a voice of the same type as the voice extracted by the feature sound extraction unit from a voice database that stores a plurality of types of voice;
An output device comprising: an output unit that outputs the audio acquired by the audio acquisition unit and the image stored in the image storage unit in synchronization.

The image storage unit stores the imaging time of the image in association with the image,
The voice storage unit stores the recording time of the voice in association with the voice,
The output device according to claim 11, wherein the characteristic sound extraction unit extracts a predetermined type of sound from sound recorded within a preset allowable time from the time when the image was captured.

An output request acquisition unit for acquiring an output request for an image stored in the image storage unit;
And a permissible time setting unit that sets the permissible time longer when the difference between the time when the output request acquisition unit acquires the output request and the image capture time of the image stored in the image storage unit is larger. The output device according to claim 12.

An image storage stage for storing an image captured by the imaging device;
A voice storage step for storing voice recorded by the imaging device;
A feature sound extraction step of extracting a predetermined type of sound from the sound stored in the sound storage step;
A voice acquisition stage for acquiring a voice of the same type as the voice extracted in the feature sound extraction stage from a voice database storing a plurality of types of voice;
An output method comprising: an output step of outputting the sound acquired in the sound acquisition step and the image stored in the image storage step in synchronization.

A program for an output device for outputting an image, wherein the output device is
An imaging unit for capturing an image;
A recording unit for recording sound around the imaging unit;
A feature sound extraction unit for extracting a predetermined type of sound from the sound recorded by the recording unit;
A voice acquisition unit that acquires a voice of the same type as the voice extracted by the feature sound extraction unit from a voice database that stores a plurality of types of voice;
A program that functions as a data storage unit that stores the audio acquired by the audio acquisition unit and the image captured by the imaging unit in association with each other so as to be output in synchronization.